Monday, September 21, 2015

Yoda conditions

This blog post is not about performance, but ranting. Ranting about too many developers that apply blindly some "rules" without knowing why there are those practices in the first place.
Recently, I have read this article from Lukas Eder which in example number 4, recommends to inverse the natural way to write conditions by putting constant on left side of the operator instead of right side. The reason behind that is to avoid accidental assignment if you miss one equal sign.

This practice (often called Yoda Conditions) is quite old now. I encountered this the first time when doing C++. The language (in fact, this is C effectively) allows to do inline assignations inside a condition because conditions are in fact an expression evaluation (no boolean type in earlier version of C), and based on this evaluation if the result is 0 it means false, otherwise it is true. So any value (1, 2,3 -1, 0x012345f...) is considered as true.
Putting constant on the right side prevents (in case of a missing equal) the expression to compile so it helps to catch the mistake quickly.
If in C/C++ language this practice makes sense (discussing if it's is good or bad practice in C/C++ is out of the scope of this post) because there is rationale behind it.

In Java or C#, the story is totally different: Conditions are strongly typed to boolean. Assigning a reference to null or an integer variable inside a condition leads to compile error. Therefore, it does not make sense to follow this "rule" in those languages as the the mistake is inherently caught by the compiler thanks to the type system.

Takeaway from this: Do not follow blindly "rules" without knowing their origin and what are they trying to address as issues. Know also your language and its type system. Rules like Yoda Conditions fall by themselves when leveraging correctly your language type system.

Monday, September 14, 2015

Why Bios Settings matter (and not size)!

Recently, we have received new server machines based on Xeon E5v3 (Haswell). I have heard from different people that this CPU generation is very good, and figures are really impressive.
So I was pretty excited to test those new beasts on our standard benchmark to compare to the previous E5v2 (Ivy Bridge) we have.

Let's go for the first run:

Honestly, this is pretty baaaaad compared to what we've got with E5v2. This is so bad that I emailed to my Production System Manager to find out what's going wrong. Usually when we receive new machines we apply a procedure to configure them correctly for our requirements. As we work in low latency space we need to avoid many pit falls like power management, NUMA, isolcpus, in order to get the best performance.
When I checked at the OS level, I noticed, for example, cpuidle was active, which is not expected. With a well configured/tuned machine, cpuidle is not enabled. My suspicions went to a misconfigured BIOS. My PSM asked me to check with a command line the BIOS (which pretty handy, not need to reboot the machine).
The usual features that we change are the following:
  • C states: disabled
  • C1E state: disabled
  • Power management profile: Maximum performance
  • Collaborative CPU performance: disabled

Bingo, BIOS was not reconfigured following our standards! I applied them and re-run our benchmark:

Latencies divided by 2 (or more), that's really better! but still slower than E5v2. Let's recheck those BIOS settings one more time.

Why power management features are so bad for low latency ? The thing is for a component to be woken up, it takes time, can be hundreds of microseconds. For a process that we are measuring under 100 microseconds, it is huge.
For CPU there is C states that are different sleep modes. On Linux with C states enabled at BIOS level, you can see the latency associated:

cat /sys/devices/system/cpu/cpu0/cpuidle/state3/latency

It means here that to wake up a core from C3 to C0 (running) it takes 200 microseconds!

With new server generation comes new features and options, maybe there is one that make the difference.
I identified one that sounds pretty "bad" from low latency POV: "uncore frequency"=dynamic
Available options: dynamic/maximum.
Let's set it to maximum and run our benchmark:

Now we are talking! Results are better than E5v2 (roughly +30%), which is REALLY impressive!
We have tested on a E5 2697 v2 @ 2.7Ghz and on a E5 2697 v3 @ 2.6Ghz. There is only 100Mhz less on the v3 but still 30% better on our benchmark.

Finally, there is some fun features we can play in BIOS settings: we can enabled Turbo Boost and make the turbo frequency static by reducing the number of cores available into the CPU.
The E5v3 has 14 cores, let's cut this to only 2 cores and make the frequency permanently to 3.6GHz (yeah this is overclocking for servers!):

Compared to the default setup we divided our latency by 4! Just by well tuning the BIOS!
My PSM asked me to email those results to make sure everybody in his team is aware of the importance to apply BIOS settings correctly on production servers.