Notice how "x * dimy * 4 + y * 4" is used over and over. My initial code was full of stuff like that as well. But probably half my optimization gains came from simply calculating values like that into a local variable ahead of time:
Whatever compiler you're using is amazingly horrible if what you said is indeed true. Compilers are supposed to do *exactly* what you describe on their own. You'd have to post the assembly code associated with the C code piece as proof. I'm pretty certain that if you're using Intel, Microsoft (VC), or gcc 3.2.x+ that the code in question would be optimised to do what you describe (very likely shoving the contents of the "base" calculation into a register if possible/available, otherwise into a temporary variable). This is, of course, assuming you're building with -O or -O2. And likely with gcc, there's probably an -f argument which affects the optimisation behaviour in cases like this (can't be bothered to look it up).
The relevant -f option here may well be -fstrict-aliasing. The problem is that if, say, any of
x,
y and
dimy are globals and
screen is a pointer, then the compiler cannot be entirely sure that
screen[x * dimy * 4 + y * 4] = 0;
will not change the values of
x,
y or
dimy (since
screen might point to the same area of memory they occupy), which means it will
have to refetch them from memory the next time they're used. In a tight loop, this can really kill performance.
(Rest of longish post spoilered for brevity, feel free to skip.)
A partial solution, as I noted, is to turn on strict aliasing, which lets the compiler make more assumptions about how arrays can overlap with each other (and with other variables); specifically, strict aliasing basically says that (quoted from Wikipedia) "it is illegal (with some exceptions) for pointers of different types to reference the same memory location." Unfortunately, strict aliasing has two drawbacks: first, it can easily break otherwise valid code not written with such rules in mind, and second, it won't help at all if the variables involved in fact do have the same type (for example, if x is an int and screen is an int *).
Another solution is the C99 restrict keyword, which basically extends strict aliasing by letting you promise to the compiler that the area of memory referenced through a particular pointer won't overlap anything else. In Linknoid's example, as well as mine, if screen had been declared as, say, int * restrict screen, then the problem would go away entirely. Of course, it might be replaced by a bigger problem if one did have another pointer, used in the same function, pointing to the same area of memory: then the promise given by the restrict keyword would be violated, and broken code could be generated.
A third approach is to follow the "load-use-store" paradigm (which is basically what Linknoid did): always load any individual values you might be using more than once into local variables (which the compiler knows won't be aliased, and which it will usually be able to store in registers), preferably before any speed-critical loops. Then do whatever you need to do using those variables and, if necessary, store them back into their original locations at the end. This is essentially a manual way of doing the kind of optimizations the compiler could do if you used strict aliasing and restrict diligently, but the manual approach also works with strict aliasing turned off; and since it's your code and you know what it's supposed to be doing, you can sometimes make such optimizations even in cases where it would be extremely difficult or impossible to coax the compiler into doing the same thing.
Also, IMHO using the "load-use-store" style even when it might not be strictly necessary is often good practice; it rarely if ever hurts performance, since modern compilers are also very, very good at optimizing register usage, and with well chosen variable names it can make your code cleaner and more readable. Also, I've seen many programmers (including myself, once or twice) get bitten by simple aliasing and/or variable reuse bugs (in the simplest cases, just trivial things like
struct node *pop_first (struct node **list) {
if (*list) *list = (*list)->next;
return *list; /* oops, should've returned the _original_ value */
}
) that could've been easily avoided just by always using copious temporary variables and letting the compiler decide when those variables are no longer needed. It makes your code faster, cleaner and safer; what's not to like?