Thursday, July 23, 2015

Java 8 Tiered Compilation, big pro and small con

Recently I moved some of cloud platform automated performance testing applications to Java 8 (1.8.0_45-b4) from Java 7 (1.7.0_80).  Not knowing what was in Java 8, I was surprised when the start time of our Karyon based microservice went from 31 seconds to 20 seconds.  After some research and pointers from our cloud perf team, we looked into the new feature (actually was there in Java 7, but not default) called tiered compilation.  You can read more about it here.

To quote the docs:

"Tiered compilation, introduced in Java SE 7, brings client startup speeds to the server VM. Normally, a server VM uses the interpreter to collect profiling information about methods that is fed into the compiler. In the tiered scheme, in addition to the interpreter, the client compiler is used to generate compiled versions of methods that collect profiling information about themselves. Since the compiled code is substantially faster than the interpreter, the program executes with greater performance during the profiling phase. In many cases, a startup that is even faster than with the client VM can be achieved because the final code produced by the server compiler may be already available during the early stages of application initialization. The tiered scheme can also achieve better peak performance than a regular server VM because the faster profiling phase allows a longer period of profiling, which may yield better optimization."

So basically we get client vm (or faster) startup with server vm long running performance.  Cool.

With the great results, I started to port all applications at Netflix using Prana (our Java sidecar that helps non-Java applications use our cloud platform services through a side-car process) to Java 8 as well.  The roll out was uneventful to some of our largest Prana user clusters.  However, I noticed one metric that went in a negative direction.  Specifically our non-heap memory usage (mbean java.lang.Memory:NonHeapMemoryUsage used).  From our metrics it looked like our non-heap memory went from 80M to 110M.  For most of our users, this 30M increase wouldn't be an issue, but there are use cases for Prana (like our EVCache nodes) where any increase in memory of Prana means less memory for the side-managed process which can affect their scaling factors.

I did a quick experiment to confirm that tiered compilation was the culprit.  Here is what I saw:

Java 8 default startup time for Prana - 29 seconds
Java 8 default non heap mem usage - 109 MB

Java 8 disabled TieredCompilation (-XX:-TieredCompilation) - 35 seconds
Java 8 disabled TieredCompilation non heap mem usage - 80 MB

As a quick compare, here is what I saw in Java 7:

Java 7 default startup time for Prana - 36 seconds
Java 7 default non heap mem usage - 80 MB

At least now I know a way to tune users who want maximum memory (with the penalty of not realizing as fast of a startup).  With around 120 users of Prana at Netflix, I need to support many ends of this tradeoff.  Hopefully this helps you if you start to see a similar issue in your Java 8 apps.

PS.  I am still working through all of the nuances of the Java 8 re-arranged memory segments for non-heap as well as heap.  There is a chance that some of this gain in non-heap was offset by less heap given the removal of PermGen and the fact that Prana uses groovy.  If anyone knows more on this, please do comment.

PS.  Good conversation on Twitter with more info here.


  1. I'm not sure if it helps in your case, but I've found out that using these environment variables reduces the memory allocation overhead on Linux:

    # tune glibc memory allocation, optimize for low fragmentation
    # limit the number of arenas
    export MALLOC_ARENA_MAX=2
    # disable dynamic mmap threshold, see M_MMAP_THRESHOLD in "man mallopt"
    export MALLOC_MMAP_THRESHOLD_=131072
    export MALLOC_TRIM_THRESHOLD_=131072
    export MALLOC_TOP_PAD_=131072
    export MALLOC_MMAP_MAX_=65536

    There is an IBM article about setting MALLOC_ARENA_MAX

  2. I also tried tweaking these JVM parameters to tune code cache and metaspace allocation on Java 8 to reduce possible native memory fragmentation:
    -XX:InitialCodeCacheSize=16M -XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M -XX:ReservedCodeCacheSize=200M

    -XX:MinMetaspaceExpansion=1M -XX:MaxMetaspaceExpansion=8M

    I'm not sure if this does what I expected it to do. There is some Oracle docs at .