Wednesday, December 30, 2015

PrintAssembly output explained!

If you are a regular reader of my blog, you may have noticed that I am (ab)using (of) the PrintAssembly options from the JVM to examine the code generated by the JIT compiler. It helps me a lot to understand how my code is executed, and how the JIT compiler works and optimize the code.
Even if from time to time I use also JMH, I am not a big fan of benchmarking and especially micro-benchmarking.

Why? Because micro-benchmarking is an idealization of how the production code is executed: tight loops, all data into the L1 cache and hot, few branch misses, best case for aggressive JIT optimizations (like monomorphic calls, etc.)

The thing is, the execution context in production is totally different from micro-benchmarks, so what's the point in exercising a code that will not be executed in the same condition? Are the conclusions that I can draw from the micro-benchmark still valid or still beneficial for my production cases?
All of this, push me away from micro-benchmark as much as possible and try to find another way to evaluate the performance like performance counters inserted directly into the application or reading the assembly generated by the JIT compiler. Note also that it is not perfect either as nowadays CPU are out-of-order in execution and also perform Instruction Level Parallelism. So benchmarking in some situations are the only way to assess performance.

Printing assembly helps me also to back assertions about how JIT optimizes instead of relying on some folklore and urban legends (reordering of instructions, memory barriers, ...).

With all of that, PrintAssembly is one of my favorite tools. But I can understand the output of this may be difficult to read. Nowadays, not all developers are familiar with assembly, unfortunately, but with some basic knowledge and with the help of comments inserted, it can be less cryptic.

For those who have never used PrintAssembly please refer to my previous posts about it: How to print dissassembly from JIT code and How to build hsdis-amd64.dll. Chris Newland, creator of JITWatch tool, has also some useful tips for Mac OS X. Nitsan Wakart wrote an article on this.

Your setup is done? Perfect let's read some assembly, yeah!

Assembly 101

First of all, I am using intel syntax, not AT&T one. I am used to this syntax, and because we are talking about x86 instruction set made by Intel let's stick to their convention.
Reminder: To get this syntax with the disassembler plugin, use the JVM option:

Instruction lines are decomposed as the following:

mnemonic parameters
  • mnemonic is the instruction name (mov, call, add, jmp, cmp, ...)
  • parameters can be register, memory accesses, immediate values

mov rax, 0x2A
mov rdx, QWORD PTR [rbx+0x571c418]

mov instruction is a data movement. The first line move the constant value 0x2A into the register rax.
the second line, move the memory content at the address computed from the value of regiser rbx and the constant value 0x571c418 into the register rdx. Note that order is reversed for AT&T syntax.

push/pop instructions move data to/from the stack
add/sub/imul/idiv instructions perform addition/subtraction/multiplication/division on integers
inc/dec instructions increment/decrement value in registers or memory
and/or/xor/not/shl/shr instructions perform bitwise operations
jmp instruction performs a unconditional jump to the specified address
jxx instructions perform a conditional jump based on the result of the related last operation
cmp instruction performs a comparison between 2 operands
call/ret instruction perform call to /return from a subroutine

For more information see this guide for example or the official Intel documentation.

Disassembler comments

Hopefully, disassembler plugin does not spit raw instructions but annotate them with useful information.
Let's take an example with the method ArrayList.add(Object) and analyze it:


In the header we can find the following information:
The first line is the name of the method disassembled: 'add' with its signature: one parameter of type Object returning a boolean from the class java.util.ArrayList
But as this is a instance method there is in fact 2 parameters as mentioned in the rest of the header:
Parameter this which is stored in register rdx, and the Object parameter in register r8.

Verified Entry Point

After the header the first instructions of the methods begins after the [Verified entry point]section. Assembly before this mark is here for alignment (padding). Starting from this section, we will look at comments that are after the semi-colon. Comments that are starting with the star (*)  indicates the associated byte code.

Synchronization Entry

The following comment: ; - java.util.ArrayList::add@-1 (line 458)gives us the mapping to the Java code: class name, method name and bytecode offset into the method, and finally the line number into the original Java source file. For this prologue, as we do not have a specific bytecode associated we've got the -1 offset. For the first one: ;*synchronization entry, it indicates the prologue of the function: some instructions that are necessary to prepare the execution (stack allocation or stack banging, saving some registers, ...)

Get size field

Next comment retrieves the field named from the current instance (ArrayList). It is translated to the following assembly line: mov r9d,DWORD PTR [rdx+0x10]
it moves into r9 register the content of the address rdx (this instance, cf method parameter) + 0x10 offset where the size field is located.

Get elementData field

The following comment is interesting because we have the same type of bytecode getfield but the mapping to the Java code involved 2 methods: java.util.ArrayList::ensureCapacityInternal@1 (line 223)and java.util.ArrayList::add@7 (line 458)Implicitly, it means that the JIT has inlined the first method mentionned and the byte code come from this method.

Empty array test

{oop(a 'java/lang/Object'[0])}indicates an instance (oop) with the following type 'java/lang/Object'[0]'. It means object array. This is in fact the constant instance empty array against which we are comparing inside the inlined method ensureCapacityInternal.

More inlining

Here we have an additional level of inlining for the ensureExplicitCapacity method.

Implicit null check

New kind of comment: Here we have an implicit null check because we are dereferencing the object array elementData to get the length of it. (Java code: elementData.length). If elementData is null, JVM must throw a NullPointerException in this case. But, too avoid generating code for each object dereferenced, JIT relies on OS signal handling for segfault to handle this rare case. See my article on this technique. 

Type Check

Let's skip some regular comments to stop on this one
We are verifying the current instance elementData class (metadata) is an object array ('java/lang/Object'[]). For performing this, we are getting from the instance the class pointer that we compare to the address of the class loaded by the JVM.

Card marking

Sometimes the comments are wrong: Here this is not a synchronization entry, but a special operation called 'card marking' that is performed after a write of a reference into a field or a reference array (elementData in our case). Card marking generated assembly is analyzed in this article. In this case we have card marking for element in an array, but for regular instance field, the generated assembly is different.

Safepoint poll

Finally, the comment {poll_return}indicates that the instruction performs a safepoint check. You will see this at the end of all methods. For more details about safepoints, please read my article and, a more detailed exploration of safepoints and impact here.

VoilĂ ! You have the basics to understand the disassembly output from PrintAssembly options. I strongly recommend, again, if you want to go further to use the wonderful JITWatch tool.


From this blog:

1 comment:

  1. Whilst Now i'm preserving each and every chord on a completely solution, That i thought we would divided these in to 2 1 / 2 insights as opposed to a single complete observe (generally mainly because mTooth won't be able to conveniently prepare complete information).