hi all, long time no see. unfortunately i'll neither be able to invest a lot of time in this project, nor am i able to contribute code (heavens, C++! halp!). moreover, i'm not very skilled in the aracane arts of algorithmicism, optimifluxcompensating and low-level/close-to-the-metal hacking.
but after skimming this thread, here's my opinion:
dreiche2 already mentioned it, and i wholeheartedly agree: the single most important thing would be to provide a testing and benchmarking framework, so a contributor would have a rough clue if his improvements actually work. this may be (a lot!) harder than it sounds; benchmarking is an art of its own. a careful selection of maps, paths and populations would be needed, so solutions wouldn't specialize too much. long paths, short paths, unreachable destinations, ... ideally, this framework should also account for changing environments.
the closer it'd be to the real thing the more accurate, but also less usable. *sigh* benchmarking is hard.
at least the benchmarking framework wouldn't have to be written in C++. use something simpler. java (noooo!), python, javascript, Lua, whatever. it's about the idea, not the implementation. absolute performance doesn't really matter at this point.
i don't even know how pathfinding is done currently; afaik there's A* calculated on-demand, stored in the dwarves memory and followd until it invalidates because the dwarf runs into an obstacle that wasn't there when the path was calculated. if no path can be found the algorithm terminates after a certain number of iterations to avoid flooding and uses the most promising solution found til then.
this approach is actually pretty nice, really. most importantly, it's simple. also, by storing and evaluating-on-the-go it allows for inaccuracy. on the one hand that looks realistic (why should the dwarf know a wall on the other side of the fortress vanished since he was there a week ago?), on the other hand it greatly reduces complexity.
i already described what i'd do about it: caching. just store the already calculated paths and keep them in memory. of course, this adds a bit complexity, but not as much as, say, machine learning (which sounds nice, but i'm afraid it may be a bit too hardcore) or automatic zoning and waypointing or future prediction magic.
how path caching works:
1. a dwarf needs a path between A and B
2. look at the cache if the path is already there
* if not, calculate as usual
* if it is, take it from there, and increment some use-counter
3. follow the path; if it's blocked, mark for garbage collection.
* every X cycles, garbage collect:
* throw out old and invalidated paths
* if memory's sparse, throw out least used paths
the pros/cons/limitations of path caching, from the top of my head:
- ~ trades memory for speed
- + scalable: you can control the cache size. the more memory, the more cache.
- + optional (gracful degradation)
* no memory available for caching? no problem - it then behaves exactly as it does now.
* also, this could be a bit of a limitation, because it may not speed up machines that'd needed it most. though i don't really know; it may very well be that path caching speeds up even those, depending on how it's influenced by virtual memory/page swapping/etc.
- + algorithm independent. doesn't care if A* or HHA* or whatever runs in the background. in fact, the current A* implementation should work without much adaption.
- - in the worst case, if no path is used more than once, there would be a slight cache-lookup penalty. this could be reduced by data structures optimized for fast lookup (hashes, trees, ...)
- possibility for partial paths: if one path is a sub-path of another one, use this one
- possible to implement as part of some kind of "pathfinding-manager" that also handles threading/concurrency, alternative algorithms (one for short paths, different one for long paths, ...)
finally, i already implemented path caching in a very simplified manner to see if it's actually possible.
my findings (if i remember correctly):
- caching is possible without adding too much complexity
- in my very simplified environment it worked very well. between 50% and 90% cache hits
- i have no clue if it would work in an environment as complex as DF (my guess: not as good, but still)
back to work now.