Letter the eighth – Bedevilling Benchmarks

01Jul11

My Dear Malware,

Summer is here, and with it come the possibilities of making our patients even more hot and bothered than they would normally be. For example I understand that one of your human charges has been tasked by his “boss” (strange how humans choose to mirror our own admirable hierarchy!) with benchmarking the differences between indexed for-loops and C++ iterators, because “iterators are probably more efficient, and we need to update our coding standards”. Oh, you should be able to have some rare fun here! Benchmarking is among the most difficult (and arguably most pointless) of programming activities. To illustrate just a few of the nice little nuances that can cause pain and misery for the humans, let us consider this simple program, which your patient might well use to benchmark a for-loop:

int main() {
    const int BIG = 1000000000;
    for( int i = 0; i < BIG; i++ ) {
        // empty
    }
}

and compile it without optimisations (of course, when benchmarking the humans would use different compilers, but I will stick with one here, for pedagogical purposes):

g++ bm.cpp -o bm

and time it:

time bm

producing the the result:

real    0m3.986s

or something like it.

But then your patient realises that if he is measuring performance, he should really apply some optimisations, so he tries:

g++ -O2 bm.cpp -o bm

which produces the startling figure:

real    0m0.442s

Impressive. Incidentally, if you can persuade your patient to publish his figures without using optimisation (not difficult, humans easily forget this step), you will have the joy of seeing him being mocked by his more knowledgeable peers, with sins of thought, if not commission, occurring on both sides!

You should now try to misdirect your patient from investigating what the compiler has actually done to achieve this near order of magnitude speed-up. But of course, we need to know, even if the humans must be kept in the dark. To do this you need to fire up one of our greatest creations – gdb (you didn’t think that user interfaces like that came about by accident, I hope!):

gdb bm

At this point I must point out one of the most cunning features that our demonic UI designers built into gdb – despite the fact that you are extremely likely to be debugging on an Intel platform, it does not use Intel mnemonics by default! A master-stroke! At the (gdb) prompt, you need to type:

set disassembly-flavor intel

in order to get something sensible displayed. It is of these little annoyances that our task is comprised!

All that to the side, at the (gdb) prompt, type:

disas main

to disassemble the machine code for the main function. If you do, you will see something like this (I have trimmed some code for “clarity”, that awful thing):

   0x0040134a :    call   0x401980 
   0x0040134f :    xor    eax,eax
   0x00401351 :    leave
   0x00401352 :    ret

The call to __main is to do things like handle the argc and argv parameters of main(), and the remainder of the code zeros the eax register and returns. There is no sign of the for-loop. And why should there be? The loop did nothing, so the compiler has removed it!

And here we are at the nub of the difficulty of comparing loops and iterators (the same issues bedevil – ha! other comparisons) – the compiler may, silently, remove all sorts of calls, constructs and calculations, so long as the observed behaviour of the the program is the same. Unfortunately, in the “real world” humans are not in the habit of writing empty for-loops, so your patient needs to get the compiler to actually generate some code which addresses those “real world” concerns. You should, of course, do your best to prevent him from doing so, and instead get him to promulgate his bogus optimisation as proof that for-loops are incredibly efficient! But if he persists, you might try this:

int main() {
    const int BIG = 1000000000;
    int n = 0;
    for( int i = 0; i < BIG; i++ ) {
        n++;
    }
    return n & 1;
}

Hopefully, he will be satisfied with this – it forces the loop to do something, and has a side-effect which the compiler surely cannot ignore.

Happily (for us) this code produces exactly the same machine code as the empty loop – the compiler is clever enough to analyse the for-loop and realise that the result of 1000000000 (the summed result) & 1 is zero. This illustrates the difficulties the humans will experience in getting the compiler to produce something that can be meaningfully compared – I will touch on how to do this later, but we don’t want this information getting out to the patients.

If you really want the loop to execute (and in this case, for obfuscational purposes, you don’t), you will need something like this:

#include <cstdlib>
int main() {
    const int BIG = 1000000000;
    int n = 0;
    for( int i = 0; i < BIG; i++ ) {
        n += rand();
    }
    return n & 1;
}

which emits this code for the loop itself:

0x0040135c :    call   0x401c10 <rand>
0x00401361 :    add    esi,eax
0x00401363 :    dec    ebx
0x00401364 :    jne    0x40135c

Note that you cannot simply replace the call to rand() with an inline function in your code like this:

int f() {
   return 1;
}

as the compiler is clever enough to look inside the called function in this case.

I hope from this, my dear Malware, that you can see what we must forever hide from the human scum – that meaningful benchmarks are extremely difficult (almost impossible) to produce, as the compiler can simply elide code in benchmarks which cannot be elided in actual applications. Which is why we want the scum to keep on producing them! You should put all your effort into hiding the fact that the only sensible performance figures are those produced by running real code that addresses a specific business problem with the different algorithms/data structures being tested. Anything else is meaningless, and thus grist to our demonic mills!

Your loving uncle,

PUNCHTAPE
Under Consultant
Demonic Department of Obfuscation, Standardisation and Benchmarking

Filed under: programming | 3 Comments

3 Responses to “Letter the eighth – Bedevilling Benchmarks”

Feed for this Entry Trackback Address

1 Matthieu M. on August 17, 2011 said:

Ah sweet! I had missed those letters, and I discover 3 at once.

Let us remember that there are 3 kinds of lies: Lies, damned lies, and benchmarks [1].

[1] http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

Reply
2 Raoul Duke on May 21, 2012 said:

having just been trying to use gdb on mac os x 10.7 of late, these words ring oh too true. i normally take the route of doing a swear word filled blog post about it. your approach is the better, and i’ll never have the patience to do it.

Reply
3 erniecordell on June 29, 2012 said:

Very clever! Almost too clever for me: I was looking for a new compiler to produce assembly for me to clearly understand constructs and was led to your compiler installation pages where I questioned what sort of polemics you may write. My curiosity was rewarded, but until some ancient French literature courses awakened a part of my consciousness that had long been idle, I was unable to follow the letter format. Having just stumbled on the actual structure of the ‘for loop,’ I followed the code, but read the text until I understood the relationship between sender and receiver. Then I enjoyed it thoroughly.

Reply