Introduction and disclaimerI'd like to begin this post by mentioning that I have no formal background in any of the related fields nor do I have the proper understanding of those fields, and I have not done a lot of groundwork in preparing this post, but instead I am mostly writing it spontaneously without preparation. In addition, I only discovered Dwarf Fortress recently, two weeks ago or so. Despite that it is somewhat insolent to assume that I could contribute to this challenging topic, my intuition is that there is a fair chance the ideas discussed here might be useful in improving the pathfinding system. I heard that Dwarf Fortress is using a pathfinding system called A-star or A*, and have not looked at how it actually works. Instead, I merely assumed that it works a certain way, given some performance issues that rumouredly are prevalent in Dwarf Fortress, although Dwarf Fortress seems to work well, given the complexity of it. It's a bit absurd that I also included a brief introduction to some related fields despite not having any formal training nor knowledge of them. My intuition regarding them is, that the readers might be able to sense that they could be relevant to pathfinding.
Vagueness of the postI'd like to emphasize that the primary idea of this post is to convey approaches and ways of thinking which I think would be essential to solving the problem. I'm not actually trying to write a concrete solution. I think that if this problem is not already thought of in the terms of graph theory, then merely beginning to think about the problem in those terms might make a relevant difference. I'm kind of hoping someone else (Who is primarily the developer, and of course everyone trying to contribute to this issue) will be inspired and think `
I see a way to solve this problem in these terms.`, if that happens the solution the person comes up with, is probably different from what I'm thinking about. For one, I have no idea how the code of DF actually works. Second, the solution someone else envisions as possible, could perhaps be much better than the ones I have in mind. I'm writing this, because if someone directly attempts to follow the approach I'm suggesting, they might miss out on their own interpretation which could be better.
I. Assumption regarding A* and related performanceThe assumption is that A* is not optimized towards a somewhat persistent environment with a massive amount of pathfinding queries, but is instead geared towards low amount of difficult queries.
II. General motif of improvement based on assumptionRelying on the previous assumption it might be possible to increase the performance of the pathfinding system by
increasing the fixed costs as a tradeoff to lower the
variable costs. I.e. by modifying the system to perform analysis and abstraction of the environment in which the pathfinding queries take place, expecting a lower calculation cost per query. How exactly this might be done, will be discussed in the later sections.
III. Topology, graph theory and computer science; The related fieldsI think this problem could be tackled and solved by a handful of collaborating experts from various fields, such as those mentioned in the heading, but assuming that they are not available for the work (although I also heard that the developer has a doctorate in mathematics, which is kind of surprising, but also cool.), and then the problems need to be solved without, thus making it possible this type of layman post might actually be helpful.
IV. Graph theory, brief introductionGraph theory is about pathfinding. There are three elements that graph theory works with:
Nodes (
Vertices)*, elements whiches interconnectivity is represented as a graph.
Connections (
Edges)*, often visually represented as lines/arcs between vertices
Paths, sequences of connected nodes
Paths, then, are an element of graph theory. A graph is defined as a set of nodes, and connections linking some of the nodes. At the basic level, this could be described by a doublet of nodes and an associated boolean value indicating whether the nodes are connected. E.g. (a,b,true). Which is also invertible to a list of connections, by selecting all true values. Paths can also be directional. Some concepts in graph theory are: Trees, Cycles and Euler Walks. Trees are acyclical graphs, while graphs that are not trees have cycles in them. An Euler Walk is a is a sequence of connected nodes that is only possible in a graph that has cycles, an Eulerian path traverses each connection (edge) of a graph only once. Nodes have a property called `degree` which is the number of connections from the node, or to the node (incase of directional edges).
*: A `node` is formally called a `
vertex`. A connection is formally called an `
edge`. Updated thread for correction.
https://en.wikipedia.org/wiki/Graph_theoryhttps://en.wikipedia.org/wiki/Eulerian_pathhttps://en.wikipedia.org/wiki/Glossary_of_graph_theory_termsV. Topology, brief introductionI've even less knowledge regarding topology than the previous subject, which is bordering on non-existant. Still, from the impression that I have, I can see that it is closely related to the pathfinding issues, as it involves the analysis of forms and shapes, by examining their continuity and the fundamental characteristics of those shapes. These, as far as I understand, are often described by means of continuous transformations, with the motif, that if there exists a continuous transformation between two shapes, then they're topologically similar, at least in some sense. Regarding Dwarf Fortress, however, it has to be noted that as we are working with a grid of cells, and not over the domain of the real numbers, most of the standard concepts are not applicable. It could be said that while topology works with real or complex numbers, the Dwarf Fortress equivalent would be working with integers.
How is this related then to pathfinding? Somekind of analysis of the shapes of spaces and methodical assignment of continuity, should allow distinguishing between, for one, cyclical graphs and acyclical graphs, or trees. And to help with determining whether a space should be considered as a union of subspaces or as a standalone space.
VI. Computer science, brief introductionComputer science is essentially programming, with an organized approach, and as far as I understand anyway, attempts to approach the challenges of programming by utilizing primarily mathematics, and to some extent physics. In computer science a central theme is the assessing of performance of algorithms by using bounding functions, such as, determining how much the cost of an algorithm increases with respect to increasing some related variable, number of tiles to search for paths for an example. These then are using notations like `Big-O`to classify the type of bounding function. The task at hand seems to fit this scheme quite well; As the problem to be solved involves trying to reduce the calculation cost of an algorithm, with the end goal of reducing the CPU load caused by the pathfinding system.
https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann%E2%80%93Landau_notationsVII. Virtual escalator/conveyor simulator, by analogyOne of the ways I'd think might be an interesting way to approach this problem, was to instead of thinking about how to solve pathfinding queries, would be to create a conveyor simulator. Kind of like some factory production 'game', where conveyors have to take items to processes, where they get refined, and then transported by additional conveyors out of the factory. The player in that game would be deciding where to lay those conveyor tracks. Instead of a player, we'd like an algorithm to automatically set up a network of conveyors. With this hypothetical approach, anyway. Once a virtual system of conveyors is set up, it can be queried for paths between objects. This way, you could see how it follows at least the intended optimization motif of section II, it might be more costly to setup the system of conveyors than simply solve the path between two objects, but once it has been setup, you can also kind of guess that maybe the cost per query remains low, and if there was a lot of queries, the tradeoff might be effective. (Or a lot of objects to be moved around on the conveyors). Instead, the complexity of the conveyor system would seem to be the primary source of costs. If the game was entirely persistent, this would probably work out quite well, but unfortunately there are a lot of moving parts and changing environments, the objects that are transported on the conveyors themselves can be endpoints to the conveyors (such as a dwarf carrying an item and dropping it off someplace else, so while the dwarf would be merely an object on the system, their travelling on it, ends up creating a new endpoint, or moving an old one to a new place). This is probably not the way to solve this problem, but I think it's a good thought experiment that is easy to visualize and therefore make thinking about the `virtual system to be queried` easier.
VIII. Creating a graph of connected spacesI think that a system that creates a graph of spaces as connected areas, with the capacity to also be recursive and create subspaces within spaces, might be a way to tackle this efficiency problem. So using the graph theory type jargon, you could say there are at some given time 3 main spaces, such as A,B,C. And each of them contain some variable number of subspaces, such as A_1, A_2, A_3, B_1, B_2.. And so forth. And then you would have a graph of connections for the spaces which are part of the same `set`. That is, a notation that explains whether A,B,C are connected to each other, but not whether A is connected to B_1. Instead the set or space B contains it's own graph, which explains how they're internally connected, like B_1 - B_2 is a connection, B_1 - B_3 is not a connection, etc.
Created an image to try and demonstrate the type of approach I'm trying to envision, and uploaded it to pasteboard:
https://pasteboard.co/J34olQY.pngExample. Graph could encode the connectivity between unions of tiles designated by a vertex(This is merely one possible approach)
The path in such a graph is not the path literal, but rather a cue that can be used for caching a path or to know where to solve pathfinding queries. It needs to be supplemented with additional information.
For an example, there could be a section of a cavern or a tunnel, designated by `vertex A` neighbouring another area designated by `Vertex B`, and it is resolved that it's possible to move from the area under `A` to area under `B`. The vertices can then be marked as connected. Now if you imagine an embark area, for an example, split to for an example 80 such vertices, and their connectivity is recorded. Then it might be possible to first perform a pathfinding task on the graph itself, to find out a sequence of connected areas, within which the actual path traversed by a dwarf will take place, such as A, B, C, G. There may also be multiple possible sequences of connected vertices starting from and ending with the same pair of vertices. This then, could allow splitting the query into subsegments, which might be cached.
Recursion and splitting toplevel areas into subsetsAn approach which tries to record the cues to possible paths between vertices can also be used inside those vertices, when, under this tentative approach, they're areas of joined tiles. If area `A` is some walkable space of size 60x60, it could be split to further smaller areas, labeled such as `A₁` and `A₂`, which could be places like rooms or multiple rooms. Then the same type of approach is possible using the subsets, e.g. kitchen is connected to living room, living room is connected to hallway, hallway is connected to entrance. I.e. A₁ - A₂ - A₃ - A₄.
Minimum size of subset?The recursion could be stopped by minimum size, for an example; If the area under A is less than 25 tiles, then no recursion.
Designating transition points (preferably as unions?)This approach then doesn't actually record the path literals which need to be taken by the dwarves as they travel around. It merely points approximately to which spaces they must go through in order to reach some distant place. To couple the graph with the actual paths, an additional element could be introduced, here called `
transition point` which is a tile that is adjacent/connected/walkable to a neighbouring area. E.g. If a kitchen is area A, and there is a door in the kitchen, and outside the kitchen there is hallway, then the tile of the door is a transition point. It is on A, but from it it's possible to step on to area B. This then is paired by another transition point belonging to the neighbouring region, e.g. the tiles in front of the doorway in the hallway. Now if Area B has 2 transition points, the path between them can be cached. E.g. the path literal that needs to be taken to go through B when traveling from A to B to C.
Designation points could be `vertically` shared between toplevel vertices and their subvertices or subsetsE.g. a transition point would belong to the set/vertex in which it's placed to. But if the vertex is split into subspaces, then some of those subspaces must contain that transition point. E.g. If a transition point from A to B is contained in A, then it must also be in A₁ or A₂, if they're the subdivisions of A. If these are further subdivided, such as A₁₋₁, A₁₋₂, then again, the transition point must be in one of those. Also there is a transition point between the subspaces, too, e.g. from A₁ to A₂. By `vertical` sharing, then, I mean that the transition point is present from the toplevel to the lowest subdivision that contains the point (Though that's a bit of a tautology, but if A₂ is subdivided, and A₁ isn't subdivided, and the point is in A₁, it's not in the furthest subdivision level of A).
Now I'm exhausted again, and I'll stop here. I'll probably come back to update the thread again, later.