Wednesday, May 08, 2013

More Zing for Java

From LMAX Exchange Getting Up To 50% Improvement in Latency From Azul's Zing JVM:

The results LMAX Exchange are seeing are remarkable: a 10-20% improvement in the mean latency, increasing to around a 50% improvement at the 99th percentile. Moreover

At the max/99.99th percentile with HotSpot the number would jump all over the place so it is hard to produce a relative comparison, except to say that the Zing values are much more stable. A bad run with HotSpot could easily be an order of magnitude worse.

In terms of throughput

Zing gives us the ability to lift what we call our "red-line" - the throughput value at which the latency starts to drop off a cliff. This effect often manifests as a second order effect of GC pauses. If we get a stall that is sufficiently long, we will start to drop packets. The process of packet redelivery can create back-pressure throughout the rest of the system sending client latencies through the roof. Having a more efficient collector with very short predictable pauses should allow us to increase our "red-line".

Most eye popping for me:

Whilst these figures are impressive, the variations, caused primarily by stop-the-world pauses in the CMS collector that is part of HotSpot, are becoming a significant problem. LMAX Exchange tried upgrading to the CMS version in JDK 7, but encountered around a 20% increase in the length of GC pauses for the same work load. The reasons for this weren't entirely clear, but Barker suggested it was probably down to a need to re-tune the collector. That Zing's collector (C4) typically requires little or no tuning was a major selling point for LMAX Exchange.

I think that we really needed to do retuning of our GC setting and investigating whether JDK 7 specific options like -XX:+UseCondCardMark and -XX:+UseNUMA should be applied. One of the other big reasons to go with Azul is the reduced need to tune the collector. The general recommendation is that you should re-tune from scratch on each new version of the JDK, which sounds fine in theory, but can be impractical. Collector tuning in Oracle JDK is essentially walking through a large search space for a result that meets your needs. Experience, knowledge and guess-work can crop significant chunks off that search space, but even then an extensive tuning exercise can take weeks. For example, our full end-to-end performance test takes 1 hour (10 minutes build & deploy, 10 minutes warm-up, 40 minutes testing), so I could reasonably run 8 runs a day. If you consider the number of different collectors (CMS, Parallel, Serial,...) and all of their associated options (new and old sizes, survivor spaces, survivor ratios,...) how many runs do I need to do to get effective coverage of that search space: 20, 30, more? With Zing the defaults work significantly better than a finely tuned Oracle JDK. We still have some investigation over whether we can get a bit more out of the Zing VM through tuning (e.g. fewer collector threads as our allocation rate is relatively low). However, tuning Zing is just that, i.e. looking to eke out the very best from the system; compared to the Oracle JDK where tuning from the defaults can be the difference between usable and unusable. The effort involving in tuning does come with an opportunity cost. I would much rather have the developers that would typically be involved with GC tuning (they are probably the ones that have the best working knowledge of software performance) be focusing on improving the performance of other areas of the system.

Concluding:

Part of the reason Zing is so attractive to these companies is that it remains the only collector that eliminates stop-the-world pauses from the young generation as well as the old generation. Whilst young generation pauses are shorter, where an application is particularly performance sensitive they still matter. As a result, Tene told us, "All we have to do is point to newgen pauses in other JVMs and say: 'those too will be gone'."

...Furthermore, the fact that it can handle multi-GB-per-sec allocations without worsening latencies or pauses, makes it very appealing for developers who have been trying hard not to allocate things because "it hurts". With Zing, you can use plenty of memory to go fast, instead of trying to avoid using it so that things won't hurt and jitter.

For production use Zing is priced on an annual subscription/server. Unsurprisingly the vendor is reluctant to reveal pricing information, though it is in line with a supported Oracle or IBM JVM.

No comments: