Tips 5

The C PreProcessor

Thought CPP was C Plus Plus? Nope. Not this time, at least. It’s the C Pre-Processor.

Preprocessing directives are something that are unique to C - they can technically be used in C++, but they should be avoided, as there are safer and more technically efficient ways to do the same things in a more evolved language like C++.

So what’s a pre-processing directive? Let’s take a step back first. What the heck is a pre-processor? The pre-processor modifies the source code before it hands it over to the compiler, which creates the actual executable (with help from the linker and the loader, and some other steps, but forget about those for now).

Define Statements

There are three main things that the pre-processor takes care of - and you’ve already seen two of them. Here is the first example.

#define PI 3.14159

Define statements! Specifically, define statements for constants. The source code is directly modified so that, in this case, every instance of PI is replaced with 3.14159. Or take this one for example.

#define PI_PLUS_ONE (3.14 + 1)

Is this good practice? There’s nothing wrong with define statements per se, but the way this statement is written may lead to some mysterious results. For example, look at the following situation:

int x = PI_PLUS_ONE * 5;

This is actually re-written as the following:

int x = 3.14 + 1 * 5;

Oops! The first set of parenthesis in the actual define statement are not carried over. Define statements can certainly be useful, and should be used, but the moral of this story is simply to be careful with them so that they don’t create unexpected bugs in your code.

Hmm, how about – oh I don’t know – testing your define statements and macros? Sounds like a good idea…

Macros

Speaking of macros, they are another good use of the pre-processor.

#define MULT(x, y) (x * y)

This creates a little function called MULT which takes 2 parameters, and returns the output of the two multiplied together. Or does it? Macros introduce a new level of complexity – and with complexity, comes new bugs. Take a look.

int x = MULT(3 + 2, 4 + 2);

What’s the output? 30? Nope! This is how it actually expands.

int x = 3 + 2 * 4 + 2;

Uh oh. We get 13 instead. So we have to change our macros.

#define MULT(x, y) ((x) * (y))

You can actually define multi-line macros for more complex functions. Simply separate the lines by semicolons (;). Remember our old swapping trick with the xor function? A macro is a perfect place to use them because you don’t have the luxury of temporary variables.

#define SWAP(a, b) a ^= b; b ^= a; a ^= b;

Assume both a and b are ints. Can you think of a case where this doesn’t work? How about this.

int x = 4;
int y = 3;
int z = 2;
if (x > 0)
    SWAP(x, y);
else
    SWAP(x, z);

Because the if statement doesn’t have braces, only the first statement a ^= b will be executed. What can we do? Surrounding the expression with parentheses doesn’t work. To really get this macro down, we have to use braces ({ }, not [ ]). If you read K&R, aka the C Bible, you remember that braces can be used to group statements together, which is exactly what we do.

#define SWAP(a, b) {a ^= b; b ^= a; a ^= b;}

What’s the general pattern? All pre-processor related activities begin with the pound sign (# - no, not the “hash-tag”…).

But there are some more tricks to macros! By now you’ve realized that the C Pre-processor doesn’t do any evaluation - just rote changes to the source code (copy-paste commands, if you will). Which means macros cannot evaluate their arguments. This can get us into some sticky situations. Take this seemingly innocuous macro:

#define MAX(a, b) ((a) < (b) ? (b) : (a))

Can we break it? Of course we can - I wouldn’t be asking otherwise. But how?

If you guessed side-effects then you should leave this class because you’re a genius. (Just kidding, don’t leave… please.) Yes, side effects! Let’s give an example.

int x = 5, y = 4;
int z = MAX(x++, y++);

Uh oh… x++ and y++ just get pasted in wherever a and b are found respectively. Which means they’ll be incremented far too many times.

int z = (x++ < y++ ? y++ : x++)

That’s not what we want at all. Macros are incredibly useful, but they are not functions. They have their limitations, and it’s very important to be aware of those limitations.

What about multi-line macros? Of course we can. Here is the familiar swap function.

#define SWAP(a, b) {            \
    a ^= b;                     \
    b ^= a;                     \
    a ^= b;                     \
}

Notice that the last line does not have a backslash. The backslash simply tells the preprocessor that the macro continues onto the next line - which is false in the case of final line. This is also why pre-processor commands are not ended with semicolons - because they’re not normal lines of code that the compiler deals with. Pre-processor directives are assumed to be single-line statements, unless specified otherwise with a backslash.

Another useful feature of macros is converting tokens into a string. Ever thought the structure of printf statements was annoying? Especially when you had to do it over and over again? Use macros!

Putting a pound sign as the prefix to your macro parameter turns it into a string.

#define PRINT_TOKEN(token) printf(#token " is %d", token)

In your code, that gets converted to the following:

printf("<token> is %d", token);

For debugging purposes, you can actually use the fact that macros do not evaluate parameters as an advantage. For example,

PRINT_TOKEN(x+y);

would expand to

printf("x + y is %d", x + y);

Pretty cool for debugging!

Have a complex structure which is a pain to write out ever single time? Use macros here as well. This is a common problem in crawler, where we have a hash table that contains DNODE structs, which then contain URL structs.

#define BUILD_FIELD(index, field) hash_table[index]->url_struct->##field

Now, if you want to access a URL struct that is contained within a DNODE struct at index index within the hash table (assuming no collisions,) you can simply use a handy macro. I’ll leave it to you to generalize this macro for other cases, like the ones that include collisions.

This is known as pasting tokens, and is done by the double pound sign prefix.

Directives

The third use of the pre-processor is what is called pre-processing directives, the subject of today’s tip. Technically speaking, the first two are just subsets of pre-processing directives, but it’s an overloaded term, so usually people mean this third category when they speak of directives (but you should always make sure!).

Directives are commands that tell the pre-processor to either skip parts of a file or include parts of a different file. Here’s an example of something in the third category - you might have already guessed this was coming.

#include <stdio.h>

Yes, include statements are pre-processing directives. It tells the pre-processor to go and fetch certain libraries. Besides the simple case of include statements, there are two main uses for this third category:

Conditional compilation is extremely useful if you want to set certain flags at compile time and have your program behave differently. Let’s say for example you want to combine using printf and GDB together for debugging. Conditional compilation is a great way to do that. Look at the file flag.c.

int main() {
    #ifdef DEBUG
    printf("Debugging is totally on.\n");
    #endif
    printf("Normal, non-debugging output.\n");
    
    return 0;
}

How do we utilize this flag? (Note that these are different from the flags used to mimic boolean values in C - usually ints holding values of either 0 or 1.) The -D option is used to define certain flags, so we do gcc -DDEBUG flag.c. Now, when we run the executable, both statements will be printed instead of just the latter. That’s a very useful way of getting certain parts of your code to run!

The second function, include guards, you’ve already seen in your crawler skeleton code. They look something like this:

#ifndef _FILE_NAME_H_
#define _FILE_NAME_H_

/* lots of header file code */

#endif

What’s the purpose of this? Imagine you include a large number of header files. What happens if two of these libraries name a function the same? Remember, unlike Java, there is no notion of namespaces or even public or private functions. Even worse, C has no overloading or polymorphism - so it simply wouldn’t compile. Header guards prevent all of these problems.

Something similar can and should be done for constants as well.

#ifndef NULL
#define NULL (void *)0
#endif

C++ No Macros?

Why no macros in C++? You have templates, classes, generalizable code, and anonymous and in-line functions. You’ve already seen how buggy and unpredictable macros can be. While great for C, more evolved languages certainly needn’t use them.

Further Reading

Reading is nice, isn’t it? So do some.

C Programming Guide - much shorter, more readable; where most of the examples for this recitation came from.

C PreProcessor Guide - the real deal; full documentation, complete with every trick in the book.