Sunday, May 16, 2010

High Priority Threads can extend Server GC Latencies

Recently, I have done a perf investigation that revealed interesting interactions between high priority threads and Server GC mode

In this post, I will explain the problem. During the next couple posts I will walk you thought how I carried the investigation, what tools I used, and how you can leverage these tools for similar investigations.

The problem

Before going into the problem details, I want to highlight some key characteristics about Server GC mode.

ServerGC mode is designed with scalability in mind.  In this mode, the CLR tries to take advantage of every CPU available to your process.  The CLR creates a dedicated, hard affintized GC thread for each available logical CPU

Hard affinity is achieved through the SetThreadAffinityMask API.  The reason GC threads are hard affinitaized is to avoid thread migration between cores, which could be very expensive on multi-core machines.  Avoiding thread migration enhances the throughput of the Server GC.

In ServerGC mode, all GC threads work in parallel during the GC activity to collect the GC heap, however as many parallel algorithms, sometimes there is a need to join all the threads at some safe points to synchronize some state.

If one or more threads don’t reach the safe point for some reason, the other GC threads will wait for them.  All the managed threads in the process are suspended waiting for the GC to finish during this time.

Recently, I investigated a customer scenario where a GC thread was holding the rest of GC threads at a safe point. The issue was that one of the GC thread was taking much longer (3-5 seconds) than the other threads to reach the safe point and join with other GC threads. This affected the overall application throughput, and increased the per-request latency.  Latency is a VERY important factor for server applications and it was critical to address this problem.

The challenge was to find out what was preventing this thread from joining the other GC threads.

It turned out that a real-time priority thread - sharing the same logical CPU with the affected GC thread - was running for an extended period of time, preventing the GC thread from getting scheduled.

Luckily enough, the real-time priority thread was created from one of the customer’s components, so the customer had control to eliminate the need for this thread.

Setting the high-priority thread back to normal priority allowed the GC thread to get scheduled promptly, eliminating the long latency per request.

The overall latency introduced by GC dropped from 3-5 seconds, to few hundred milliseconds.

Read more: CLR and Framework Perf Blog

Posted via email from jasper22's posterous