shadow_slicer gave a very good summery of the problem back at the end of November
If you decide to devote 8 bits for each tile (not for each edge, each tile!) just to determine is this tile passable by movement type x, a 256x256x16 map would require 1MB of memory. If we add ceilings, it goes to 2MB. Note that this is a really small map. A 6x6 I am currently using is 288x288x32, which is 2.5 or 5 MB. Note that this does not include any cost information, and only supports 8 movement types. If you add terrain types or other features then this only gets worse. And, as you said the number of tiles on the map is only going to increase. If you get up to hundreds of z-levels with larger embark sizes, things get nasty very quickly.
This was talked about earlier, but I don't remember the verdict. Is this actually a problem, though? Dwarf Heaven, mentioned on the last page, was apparently 6x6 gridsize, with tile size around 288x288x50, going off the same numbers for 6x6 that shadow_slicer mentioned. That's 4 - 8MB of RAM. That's a pittance. Most computers in use have 1GB of RAM min, and most current computers come with 3-4GB. 8MB of RAM won't be missed, especially if it helps speed up pathing.
Which I think was the main concern. That it wouldn't actually have any improvement. Is that right?
Yes, my point was that this (1) was a lower bound/ conservative estimate for memory usage under that architecture and (2) would not significantly increase pathfinding speed.
I later demonstrated that a single bit/two bits per tile was too low to implement the current behavior exhibited by DF. To implement the current behavior you need at least 13 bits/tile (or even 26bits if bidirectionality is not assumed). This pushes the total to 52-208MB, just for the connectivity map. If instead of simple connectivity you want to include movement costs (let's just use 3 bits so we have 8 possible costs), we have 156MB-624MB of memory. And this is simply necessary to achieve the same behavior DF currently exhibits (while supporting 8 movement types).
Further this method is likely to be slow. That 624MB of data is competing with DF data and code for space in the L1 and L2 cache of the processor. It is likely that whenever pathfinding functions are called, the necessary data will not be in the on-CPU caches so we will have to wait at least 100 clock cycles for each read to non-cached data. Then, when pathfinding has finished (or reached a point where it decides to return), DF is likely to be completely purged from the cache so we have another ~100cycle/read delay until the cache fills again.
So yes, representing the connectivity/costs in an explicit table has serious scalability problems. If, however, the pathing library instead had DF calculate the connectivity/costs on demand, this would access the same data that DF normally uses (which is stored much more compactly than is possible in a general-purpose library), keeping it in the cache and avoids both the additional delay and the extra memory consumption. This leaves us with a whole lot of memory left over to do useful things like path caches, connected zones, and multiple hierarchy levels, etc. which will have a positive impact on pathfinding speed.
As a sidenote, there was a link, to github I think, with current development. Does anyone have it, and is it actually up-to-date? I'd like to look at the code and pretend I know what I'm doing before the feeling of being in way over my head sinks in. <_<
The current code base is part of Khazad, so you can find it on
Khazad's sourceforge page.