Seems a little faster to me.
However on a complex fort in all modes if I maximise the window the frame-rate drops of massively. (big monitors, about 12 * the number of tiles)
On a complex fort with all the trimmings, When paused its framerate is maxed out at the default window size, when I maximise the screen it drops to a rather poor 10FPS.
I'm assuming pausing shuts off most of the simulation. It doesn't seem like there should be anything like enough drawing to cause so significant a slowdown, I've not been source diving but it seems calculating what to display is looking overly expensive. Needs profiling really, but for my money there may be much more to be gained from things like spacial partitioning of the object data than reworking the drawing.
(if that is correct running the display in a different thread to the sim would have a potentially serious impact too. - sorry)
There are a few undrawing related things happening that I'd not noticed in 40d. Some jobs don't seem to get done anymore, e.g. carpenters and smiths won't build animal traps.
Regarding the spatial portioning:
The map is basically a big 3D array of 16x16 blocks of tiles. I can only guess how the other objects are put into the tiles at the rendering stage (I haven't really looked that hard), but considering the fact that creatures, buildings, plantlife and constructions are all packed into vectors, it probably has to walk all those vectors for every game tick. A big question: is the frame put together as part of the game-loop tick, or does it walk the vectors twice? How about using some different container? (no matter how inconvenient that would be for me and my hacking efforts to reverse-engineer).
This of course uncovers the reason why having too many items in the item vector slows down the game so much. A dwarf trying to find a path to the closest stone has to find that stone. Does it walk the vector in that case? Probably has to.
Designations work like this: Each of those map blocks points to a different object (haven't named those yet). This object has some flags, including what I call a 'dirty bit'. Only blocks with a set 'dirty bit' are used for designated jobs (digging, felling trees, plant gathering, etc.). Seems pretty good, right? Now a question: is the bit cleared when it's impossible to path to those designated blocks? What happens when I designate 20k tiles in 78 blocks and only 78 tiles are ready to be mined (an extreme case, I know)? 20k jobs to check? I bet this is the case.
A door hooked up to a lever is the bane of FPS. I bet DF has to recalculate the pathing data (part of the map blocks AFAIK) in that case. Give an order to pull the lever continuously (doors are instantly opened/closed) and watch the FPS counter plummet towards 0.
I could continue. When people say 'fix pathfinding', they actually mean THESE problems. The pathfinding algorithm is probably very good all things considered. It just has to do much more work than it should.
This brings us back to the graphics. The biggest part there is putting the 'things' into a square grid and then later blasting that to the screen using whatever method implemented in libgraphics. Which part is the slower one? I don't really see that big differences in rendering speeds in this thread. But what I do see is that 40d was compiled with an older version of the MS compiler. So, these numbers are all bogus. Any Improvements could be just as well from using a better compiler. If you want to measure something, have a control - plain DF 40d compiled with MSVC 2008.
Now, how much time is spent by the CPU waiting for the memory instead of actually processing things? Are STL objects passed by reference? Allocating and freeing many small objects is expensive!
Etc.