Basically, it has to do with what the CPU itself has to do in order to evaluate a statement.
If you tell the CPU to evaluate attribute gain on a dwarf, as it is currently structured, it has to first set up the check for the maximum skill threshold, which is unique for each dwarf, (each dwarf appears to get randomly placed on the slider based on the distribution percentiles for attributes defined in the raws, which then determines what its top attribute threshold is, and is unique to each dwarf) then it has to evaluate if the added skill point gain is possible or should be ignored, (is the dwarf already at max?) Or if the added amount should be truncated (applying 50 pts, but only 20pts from max? Etc.) Those are ach and every one a logic operation, needed to process "evaluateSkillGain(DwarfVectorPtr, attrib, pts)" function.
Now, let's say toady was more clever: instead of assigning a hard datastore for each dwarf, and then using lots and lots of operations to validate and decide how to proceed, he gives the dwarf a single precision float value for their current score position between minimum and maximum, as a value between 0 and 1. He then also assigns an 8 bit flag array as an integer, to define what percentile the dwarf is in. He can now evaluate the dwarf almost exclusively with pure math operators, and deal with the "over threshold" situation with a single logic check that then truncates the float with a modulo divide and a subtraction.
(Essentially, he sets binary flags in the 8 bit array by assigning it an integer value, unsigned. He then uses that value to mathematically transform the ratio position from the min-max single float value into the actual attribute score, and to determine how much to add or subtract from the ratio fr a given integer experience point calculation, say, using it as a factor for multiplication, or a divident for division. If it exceeds the maximum, (1), then a modulus division by 1 will give the exact remainder, which you subtract from the score. Or, you could just set it to 1. Either way.)
It serves the exact same purpose, but does so in a more clever fashion, that computers are much better at, and which could be accellerated with SSE flags on the compiler. It could easily be a 100% speedup or more.
The reason, is because math operators are more easily cached by the processor, and are exactly the kind of operation that SSE is made for, which frees the other ALUs on the chip to do logic ops, and on modern chips, the compiler will divvy up such operations with branch prediction to do both the math op and the logic op at the same time, where if they were all logic ops, it would have to wait for the general purpose ALUs to become available.
If you can abstractify the algorithm as pure math, it is *always* in your best interests to do so, if the function gets called frequently, and can pose a performance bottleneck.