Letter the Fourth – A List of Misery

27Dec09

My Dear Malware,

I trust you are enjoying a productive festive season? This time of year is always full of opportunity for us demons – so much greed, drunkenness and ill-feeling! No doubt you will be thinking of your New Year’s resolutions – can I suggest you make renewed efforts in the coming years to promulgate the use of unsuitable data structures? Few things can blight a programmer’s life quite as much as using an incorrect and preferably badly-written data structure.


The sine qua non of this type of thing is of course the linked list. Thanks to the efforts of our agents in the training and education fields, this is often the first (and sometimes the only) data structure that many programmers are exposed to, and this early exposure can be used by us to good effect; it is easy to suggest to our weak and deluded patients that a list should be their first choice when selecting a data structure, when of course it should almost always be their last.

It may be useful to review the failings of lists (you should of course do everything in your power to avoid your human patients performing such a review. Linked lists have three main issues:

  • They are inefficient in terms of storage space. Each node in a list must, as well as the useful data it contains, have a pointer to the next node, and in the case of a doubly-linked list, a pointer to the previous one. For lists of simple types, such as integers, this can mean that the pointer overhead is double that of the useful data.

  • They are inefficient to create. Each time a node is added to a list, memory must be allocated for the node. A naive implementation of a linked list (the kind we like best) will do this using malloc() or new, each of which may involve a system call. Naive implementations will also remove nodes using free() or delete, hopefully causing that delightful condition, heap fragmentation.

  • They are inefficient to access. To get the data of a specific list element, it is necessary to iterate over all of the preceding elements by following their pointers. This is highly inefficient, something we of course need to hide from mortals, hence our recent invention of the foreach flow control construct.

You may be wondering if linked lists are such poor performers, why do the humans persist in using them? It is of course impossible to underestimate the stupidity of the typical human, but there is in fact one operation that linked lists may sometimes perform more efficiently than other data structures – removing their first element. I say “sometimes” because it is often not the case; consider the following C++ code:

#include <list>
#include <vector>
using namespace std;

const int BIG = 1000000;

int main( int argc, char ** argv ) {
	if ( argc == 1 ) {
		list <int> li;
		for ( int i = 0; i < BIG; i++ ) {
			li.push_back( i );
			li.pop_front();
		}
	}
	else {
		deque  <int> di;
		for ( int i = 0; i < BIG; i++ ) {
			di.push_back(i);
			di.pop_front();
		}
	}
}

Here, the code for the deque is actually somewhat faster than that for list, by approximately 20% using the GCC compiler with -O2 optimisation. In fact, even a vector may outperform a list in the same test, for small vector sizes. And of course, in most code, whether using list, deque, or vector, the sizes of the data structures are small.

This being the case, why is the deque such a little used structure in human-written code? Well, partly of course it is due to our own malign influence, but also the deque is not typically presented to the humans when they are learning their craft. A singly linked list is a simple thing to describe and implement (and by the by, if you can get your patients to not only use linked lists but to implement them themselves, rather than using a library, you are home and dry!) whereas a deque is a rather complicated thing to implement, though very easy to use. Humans often feel they must “understand” the data structures they use, though of course for most of them this understanding is a very imperfect thing.

I will sign off now with a short list of recommendations that we do not want to fall into human hands:

  • Your initial choice of data structure should normally be the std::vector. This has good overall performance and can be used with legacy code such as the Win32 APIs.

  • If you need to delete (or insert) at the beginning of the structure, (or elsewhere, other than the end) use a std:;deque. This has most of the advantages of the list and the vector, will normally perform better than the list, but cannot easily be used with legacy code.

  • Lastly, you should consider using a std::list only if you have rather special requirements, and if you have already found that a std:;deque will not do what you need.

And there you have it. I hope, dear Malware, that this has given you some food for thought. Now I must go and torment a Java programmer, using a new technique we have developed called “dependency injection” – it sounds very amusing, and quite painful.

Your affectionate uncle,

PUNCHTAPE
Under Consultant
Demonic Department of Obfuscation and Standardisation

About these ads


No Responses Yet to “Letter the Fourth – A List of Misery”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: