Puzzle 3

So now you know about structs, you know about dynamic memory allocation, and you know about pointers! Think yourself a C expert? We shall see.

Since you’re all geniuses, surely you know the sizeof function. Remember the other day when we discussed GDB? We had a line of code that went something like this: #define TOTAL_ELEMENTS (sizeof(array)/sizeof(array[0])). This puzzle will explain all this and more.

Okay, quick quiz. Assume you’re on a 64-bit architecture. What do each of these return?

sizeof(int);

sizeof(long);

sizeof(short);

sizeof(char);

Did you guess 4, 8, 2, and 1? If so, give yourself a cookie - you were awake during the second week of classes.

Did you get tricked by that “64-bit architecture” stuff? That just determines the size of your pointers. A 64-bit architecture machine is one where every address in memory takes 64 bits, or 8 bytes. Pointers are nothing but addresses, so that means the size of your pointers are also 8 bytes. If you’re on a 32-bit machine, then your pointers are 4 bytes. And if you’re on a 16-bit machine, then you’re probably busy discovering that the world is not in fact flat.

Great, now that you got those correct, we don’t have to review the sizeof stuff anymore… right?

You remember structs - the rough C translation of objects? Okay then, here’s round 2.

struct foo {
    int a;
    int b;
    int c;
};

struct foo bar;

printf("%d\n", sizeof(bar));

What does that print? Did you guess 12? Are you really sure? Well yes, it is. Unlike in Java, C does not track extra things about structs, and so it the size of structs is many times close to the elements inside. Not bad, not bad at all. What about this then?

struct foo {
    int a;
    char b;
    int c;
    short d;
};

struct foo bar;

printf("%d\n", sizeof(bar));

Did you think to yourself 4 + 1 + 4 + 2? Nope! Here’s a brilliant case of Joel’s leaky abstraction. Thought you didn’t have to do the class readings? Wrong!

Turns out, the answer is 16! Oh dear, what’s happened? As it turns out, the x86 and ARM architectures self-align on the widest or largest element in a struct, often called the stride address of a structure. Some of you have taken architecture, so you already know what this means. For those that do, please forgive this brief review.

We’re assuming we’re on a 64-bit architecture; that means the basic word size is 64 bits, or 8 bytes. When we look into addresses, each address is 8 bytes long - that’s why pointers are the same size as integers. There’s a distinct advantage to having your data begin at a multiple of 8. It means that the data can be accessed in a single operation. Imagine your data began at address 5. Well then, the CPU would use one instruction to go to address 0, and then another to increment by 5 (remember, it’s an 8-byte word, so it can’t go directly to address 5). This is why we self-align - to speed up memory lookup into a single operation for the CPU.

Not everything aligns to word boundaries, however. This is where the stride address of a structure comes in. Imagine you had an array of characters - each element here is 1 byte. Would you really want to take up 8 bytes for every one of those elements? Of course not! That’s horribly wasteful. In the same vein, C determines padding by using the stride of a struct rather than the word size of a machine.

Let’s take this example:

struct foo {
    int a;
    short d;
    char a;
    int c;
}

What’s the largest element in foo? An int, of course. We know that’s 4 bytes, so the stride of foo is 4 bytes as well. This means all elements in this struct are self-aligned to 4 byte boundaries. So what’s the total size of this struct? The compiler re-writes it as the following:

    struct foo {
        int a; // 4 bytes
        short d; // 2 bytes
        char pad[2]; // 2 bytes of padding
        char a; // 1 byte
        char pad[3]; // 3 bytes of padding
        int c; // 4 bytes
  };

So the real size of this struct is 16 bytes and not 11. Now you may already be thinking, “Hmm, that’s quite a lot of wasted space!” And you’d be right. How can we reorder things to minimize wasted space? Well, that’s not too difficult.

struct foo {
    int a;
    int c;
    short d;
    char a;
};

Can you guess the size of this newly re-written struct? It’s now just 12! We reduced the amount of padding by 80% and the space by 25%! By ordering it this way, only 1 byte of padding is necessary! Not bad for less than two minutes of thinking.

What we’re discussing here is called struct packing - how to efficiently order your struct elements so that you minimize wasted space.

On the other hand, many of you might be thinking - hey now, it’s just 5 bytes! Who cares? Well, imagine you’re making 200,000 of these. Those bytes start to add up - and fast. All advanced C programmers are well-versed in struct packing, and it’s essential to minimizing your memory footprint.

The first time I heard of this, I immediately thought “If it’s so damned important, then why doesn’t the compiler do it for me?” That’s a fair question.

It’s because of the purpose of C. C is designed to be extremely close to the machine and very low level. The programmer is meant to have total control of the ordering - imagine you had to do it in the inefficient way because you’re adhering to some hardware protocol, etc. Ultimately, the philosophy of C is to leave the power and smarts in the programmer’s hands because unlike Java, we assume you’re intelligent. :)

But we’re not done yet! What about structs within structs? Turns out, those are self-aligned too! But they’re independently self-aligned. Here’s an example:

struct foo {
    char stuff;
    struct bar {
        char *other_stuff;
        char omg[7];
   } inner;
};

Think you can re-order this struct to save space?

    struct foo {
        char stuff;
        struct bar {
            char omg[7];
            char *other_stuff;
     } inner;
   };

Nope. You can’t. The size of both structs is 24. Inner structs are self-aligned independently, which means even though you’d like to put omg first and match with the 1 byte character above, you can’t. No matter what we do, this struct will take up 24 bytes of space! And that’s just awful. If you run into a problem like this, you have a bad design and need to restructure your code.

The other benefit to this is fewer cache-misses, which also speeds up your program.

If you’re interested in other advanced ways to optimize your C programs, do take a look at this fabulous article.