To anyone who tests - how large is the practical performance improvement?
From hardly noticeable to insignificant unless you have some real slow but recent hardware.
You see, DF currently has two threads - renderer and simulation (which is one that's CPU limited).
Processing goes like that:
simulation -
- calculate game frame ( mainloop() function )
- if FPS-capped, wait some time
- pause if requested to pause.
renderer:
- process input events
- if GFPS-capped, wait until it's time to render next graphics frame;
- pause simulation thread
- submit input events to the simulation code
- call game's render_things() function which prepares the tiles/interface to be shown
- copy results
- unpause simulation thread
- actually draw stuff
The perfomance bottleneck is in the mainloop() function. Everything else eats much less CPU time.
In fact, if you've got dualcore or better cpu, the only thing you can do is minimize time that simulation thread is paused while the renderer fetches stuff to render. Which is small to begin with.
I did some optimizations in the print_shader so that there's no excess copying of data around while simulation is paused, and only absolutely required processing is done before handing off data to the gpu, so that cpu isn't held up much by rendering on single-core machines.
Beyond that there's isn't much that can be done to cut the time simulation is paused. Moving out the interface entirely (getting rid of render_things() call) might help, but this will be offset to some degree by inreased amount of data that the new interface would require to be copied out ( and this can't be done while simulation thread is running).
I thought about some clever dirty hacks with snapshotting DF's data so that renderer can read it while simulation works on, but those are hacks, and linux-specific at that. And they won't give much speed-up anyway.
If the map data was contigously allocated AND there wasn't that much C++ glue between values and their meaning, then I'd gladly just upload needed parts of it directly to the GPU, spending almost no time on cpu (for that last little bit of speed), and it'd allow prototyping gpu-based pathfinding too which as I've heard is the culprit nowadays.
Does this alleviate the FPS death problem of many forts?
No. It gives you warm fuzzy feeling that you did as much as possible in this aspect.
Will it work with Phoebus tileset?
It was developed while using it.