What is involved in converting DF from 32 bit to 64 bit? Isn't it just a recompile with a 64 bit compiler?
In theory, yes. In practice, a whole lot more than that.
Among other things:
- Hidden assumptions in pointer arithmetic
- Sudden potential over/underflows due to lengths (array and otherwise) being stored as 32-bit values
- Casting pointers / pointer offsets to 32-bit values
- Hidden assumptions about sizes of structs
- External libraries that do any / all of the above
- Other hisenbugs that become more common due to compilation differences
Most of these are (relatively) easy to avoid, if you are trying to avoid them from the start. But it's a whole lot harder to retrofit into an existing codebase, especially as many of the above aren't always reproducible.
With the former I'd worry that the locks become a bottleneck, particularly given that some parts of the main thread are doing nothing but world updates (e.g. the water/magma simulation step). That said, I imagine large amounts of the main thread aren't doing updates, so you may well be able to get away with locks.
You should, of course, tune the number of operations / amount of time between grabbing and releasing the lock. E.g. grab the lock, do all water / magma / cavein updates, then ungrab the lock.
The latter is hilariously dangerous, changing the world while the algorithm is running on another thread could cause all sorts of crashes if the data it reads contains pointers or indices and you get partial writes/reads (e.g. due to unaligned data) or stale reads. Having an atomic counter that tells you the world has changed is no good if the code has already crashed!
Of course it'd crash if you potentially followed invalid pointers, and I am aware of amusements with partial / stale / unexpected reads. I wasn't explicit, whoops. The pathfinding threads never follow (those) pointers. You have an array of bits (one / tile / passibility type) indicating passibility of each tile. The main thread (atomically) increments counter A, does a store fence, does (an) update(s) to the passibility array, then does another store fence, then (atomically) increments B. (If B wraps, you also need to do a store fence here at the end as well.) The pathfinding threads (atomically) load counter A, then issue a load fence, then work with the passibility array, then issue another load fence, then (atomicially) loads counter B and retries if they aren't the same.
Give or take, at least. I'd have to sit down to actually figure out the exact memory barriers required. (All that's required is that the write to A is visible before any writes to the passibility array, and the write to B is visible only after said writes. And ditto for pathfinding but for reads.) And "atomic" here for incrementing just means "no intermediate values can be visible". So more like a read of <value>, then an atomic set to <value + 1> than an actual atomic increment. (As only one thread is setting it anyways.)
I think personally the best approach would be an incremental algorithm, or even just terminating the pathfinding after a fixed amount of exploration to avoid the terminal case of flood-filling the entire map with the pathfinder.
Agreed with incremental. With termination - again, the connected components thing prevents that worst-case. The problem is that bugs happen, and occasionally a bug gets through that makes the connected components not match up with actual pathfinding.
Another possibility that may work well would be to queue all pathfinding requests and then execute them across multiple threads at a single point of the main thread. No concurrent updates of the world data from the main thread, and pathfinding still gets a lovely speed-up. The pathfinding itself wouldn't modify the world data, and read-only sharing is perfectly safe
If pathfinding's really that big of a bottleneck, you could make decent gains from that.
It'd certainly be a whole lot more simpler than the full version. I worry about the overhead, however. 60FPS is ~16.7ms / frame.
Actually, there are a number of things you could do in that same manner. For instance, power calculations and cavein detection (not the actual cavein itself). Maybe even creature decisionmaking. Although I wonder how much of a bottleneck those are.