Topic: Ideas for improving pathfinding performance (Rewritten) (Read 5541 times)

eurydike · « **on:** April 07, 2020, 03:01:14 pm »

Introduction and disclaimer
I'd like to begin this post by mentioning that I have no formal background in any of the related fields nor do I have the proper understanding of those fields, and I have not done a lot of groundwork in preparing this post, but instead I am mostly writing it spontaneously without preparation. In addition, I only discovered Dwarf Fortress recently, two weeks ago or so. Despite that it is somewhat insolent to assume that I could contribute to this challenging topic, my intuition is that there is a fair chance the ideas discussed here might be useful in improving the pathfinding system. I heard that Dwarf Fortress is using a pathfinding system called A-star or A*, and have not looked at how it actually works. Instead, I merely assumed that it works a certain way, given some performance issues that rumouredly are prevalent in Dwarf Fortress, although Dwarf Fortress seems to work well, given the complexity of it. It's a bit absurd that I also included a brief introduction to some related fields despite not having any formal training nor knowledge of them. My intuition regarding them is, that the readers might be able to sense that they could be relevant to pathfinding.

Vagueness of the post
I'd like to emphasize that the primary idea of this post is to convey approaches and ways of thinking which I think would be essential to solving the problem. I'm not actually trying to write a concrete solution. I think that if this problem is not already thought of in the terms of graph theory, then merely beginning to think about the problem in those terms might make a relevant difference. I'm kind of hoping someone else (Who is primarily the developer, and of course everyone trying to contribute to this issue) will be inspired and think `I see a way to solve this problem in these terms.`, if that happens the solution the person comes up with, is probably different from what I'm thinking about. For one, I have no idea how the code of DF actually works. Second, the solution someone else envisions as possible, could perhaps be much better than the ones I have in mind. I'm writing this, because if someone directly attempts to follow the approach I'm suggesting, they might miss out on their own interpretation which could be better.

I. Assumption regarding A* and related performance
The assumption is that A* is not optimized towards a somewhat persistent environment with a massive amount of pathfinding queries, but is instead geared towards low amount of difficult queries.

II. General motif of improvement based on assumption
Relying on the previous assumption it might be possible to increase the performance of the pathfinding system by increasing the fixed costs as a tradeoff to lower the variable costs. I.e. by modifying the system to perform analysis and abstraction of the environment in which the pathfinding queries take place, expecting a lower calculation cost per query. How exactly this might be done, will be discussed in the later sections.

III. Topology, graph theory and computer science; The related fields
I think this problem could be tackled and solved by a handful of collaborating experts from various fields, such as those mentioned in the heading, but assuming that they are not available for the work (although I also heard that the developer has a doctorate in mathematics, which is kind of surprising, but also cool.), and then the problems need to be solved without, thus making it possible this type of layman post might actually be helpful.

IV. Graph theory, brief introduction
Graph theory is about pathfinding. There are three elements that graph theory works with:
Nodes (Vertices)*, elements whiches interconnectivity is represented as a graph.
Connections (Edges)*, often visually represented as lines/arcs between vertices
Paths, sequences of connected nodes

Paths, then, are an element of graph theory. A graph is defined as a set of nodes, and connections linking some of the nodes. At the basic level, this could be described by a doublet of nodes and an associated boolean value indicating whether the nodes are connected. E.g. (a,b,true). Which is also invertible to a list of connections, by selecting all true values. Paths can also be directional. Some concepts in graph theory are: Trees, Cycles and Euler Walks. Trees are acyclical graphs, while graphs that are not trees have cycles in them. An Euler Walk is a is a sequence of connected nodes that is only possible in a graph that has cycles, an Eulerian path traverses each connection (edge) of a graph only once. Nodes have a property called `degree` which is the number of connections from the node, or to the node (incase of directional edges).

*: A `node` is formally called a `vertex`. A connection is formally called an `edge`. Updated thread for correction.

https://en.wikipedia.org/wiki/Graph_theory
https://en.wikipedia.org/wiki/Eulerian_path
https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms

V. Topology, brief introduction
I've even less knowledge regarding topology than the previous subject, which is bordering on non-existant. Still, from the impression that I have, I can see that it is closely related to the pathfinding issues, as it involves the analysis of forms and shapes, by examining their continuity and the fundamental characteristics of those shapes. These, as far as I understand, are often described by means of continuous transformations, with the motif, that if there exists a continuous transformation between two shapes, then they're topologically similar, at least in some sense. Regarding Dwarf Fortress, however, it has to be noted that as we are working with a grid of cells, and not over the domain of the real numbers, most of the standard concepts are not applicable. It could be said that while topology works with real or complex numbers, the Dwarf Fortress equivalent would be working with integers.

How is this related then to pathfinding? Somekind of analysis of the shapes of spaces and methodical assignment of continuity, should allow distinguishing between, for one, cyclical graphs and acyclical graphs, or trees. And to help with determining whether a space should be considered as a union of subspaces or as a standalone space.

VI. Computer science, brief introduction
Computer science is essentially programming, with an organized approach, and as far as I understand anyway, attempts to approach the challenges of programming by utilizing primarily mathematics, and to some extent physics. In computer science a central theme is the assessing of performance of algorithms by using bounding functions, such as, determining how much the cost of an algorithm increases with respect to increasing some related variable, number of tiles to search for paths for an example. These then are using notations like `Big-O`to classify the type of bounding function. The task at hand seems to fit this scheme quite well; As the problem to be solved involves trying to reduce the calculation cost of an algorithm, with the end goal of reducing the CPU load caused by the pathfinding system.

https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann%E2%80%93Landau_notations

VII. Virtual escalator/conveyor simulator, by analogy
One of the ways I'd think might be an interesting way to approach this problem, was to instead of thinking about how to solve pathfinding queries, would be to create a conveyor simulator. Kind of like some factory production 'game', where conveyors have to take items to processes, where they get refined, and then transported by additional conveyors out of the factory. The player in that game would be deciding where to lay those conveyor tracks. Instead of a player, we'd like an algorithm to automatically set up a network of conveyors. With this hypothetical approach, anyway. Once a virtual system of conveyors is set up, it can be queried for paths between objects. This way, you could see how it follows at least the intended optimization motif of section II, it might be more costly to setup the system of conveyors than simply solve the path between two objects, but once it has been setup, you can also kind of guess that maybe the cost per query remains low, and if there was a lot of queries, the tradeoff might be effective. (Or a lot of objects to be moved around on the conveyors). Instead, the complexity of the conveyor system would seem to be the primary source of costs. If the game was entirely persistent, this would probably work out quite well, but unfortunately there are a lot of moving parts and changing environments, the objects that are transported on the conveyors themselves can be endpoints to the conveyors (such as a dwarf carrying an item and dropping it off someplace else, so while the dwarf would be merely an object on the system, their travelling on it, ends up creating a new endpoint, or moving an old one to a new place). This is probably not the way to solve this problem, but I think it's a good thought experiment that is easy to visualize and therefore make thinking about the `virtual system to be queried` easier.

VIII. Creating a graph of connected spaces
I think that a system that creates a graph of spaces as connected areas, with the capacity to also be recursive and create subspaces within spaces, might be a way to tackle this efficiency problem. So using the graph theory type jargon, you could say there are at some given time 3 main spaces, such as A,B,C. And each of them contain some variable number of subspaces, such as A_1, A_2, A_3, B_1, B_2.. And so forth. And then you would have a graph of connections for the spaces which are part of the same `set`. That is, a notation that explains whether A,B,C are connected to each other, but not whether A is connected to B_1. Instead the set or space B contains it's own graph, which explains how they're internally connected, like B_1 - B_2 is a connection, B_1 - B_3 is not a connection, etc.

Created an image to try and demonstrate the type of approach I'm trying to envision, and uploaded it to pasteboard:

https://pasteboard.co/J34olQY.png

Example. Graph could encode the connectivity between unions of tiles designated by a vertex
(This is merely one possible approach)
The path in such a graph is not the path literal, but rather a cue that can be used for caching a path or to know where to solve pathfinding queries. It needs to be supplemented with additional information.

For an example, there could be a section of a cavern or a tunnel, designated by `vertex A` neighbouring another area designated by `Vertex B`, and it is resolved that it's possible to move from the area under `A` to area under `B`. The vertices can then be marked as connected. Now if you imagine an embark area, for an example, split to for an example 80 such vertices, and their connectivity is recorded. Then it might be possible to first perform a pathfinding task on the graph itself, to find out a sequence of connected areas, within which the actual path traversed by a dwarf will take place, such as A, B, C, G. There may also be multiple possible sequences of connected vertices starting from and ending with the same pair of vertices. This then, could allow splitting the query into subsegments, which might be cached.

Recursion and splitting toplevel areas into subsets
An approach which tries to record the cues to possible paths between vertices can also be used inside those vertices, when, under this tentative approach, they're areas of joined tiles. If area `A` is some walkable space of size 60x60, it could be split to further smaller areas, labeled such as `A₁` and `A₂`, which could be places like rooms or multiple rooms. Then the same type of approach is possible using the subsets, e.g. kitchen is connected to living room, living room is connected to hallway, hallway is connected to entrance. I.e. A₁ - A₂ - A₃ - A₄.

Minimum size of subset?
The recursion could be stopped by minimum size, for an example; If the area under A is less than 25 tiles, then no recursion.

Designating transition points (preferably as unions?)
This approach then doesn't actually record the path literals which need to be taken by the dwarves as they travel around. It merely points approximately to which spaces they must go through in order to reach some distant place. To couple the graph with the actual paths, an additional element could be introduced, here called `transition point` which is a tile that is adjacent/connected/walkable to a neighbouring area. E.g. If a kitchen is area A, and there is a door in the kitchen, and outside the kitchen there is hallway, then the tile of the door is a transition point. It is on A, but from it it's possible to step on to area B. This then is paired by another transition point belonging to the neighbouring region, e.g. the tiles in front of the doorway in the hallway. Now if Area B has 2 transition points, the path between them can be cached. E.g. the path literal that needs to be taken to go through B when traveling from A to B to C.

Designation points could be `vertically` shared between toplevel vertices and their subvertices or subsets
E.g. a transition point would belong to the set/vertex in which it's placed to. But if the vertex is split into subspaces, then some of those subspaces must contain that transition point. E.g. If a transition point from A to B is contained in A, then it must also be in A₁ or A₂, if they're the subdivisions of A. If these are further subdivided, such as A₁₋₁, A₁₋₂, then again, the transition point must be in one of those. Also there is a transition point between the subspaces, too, e.g. from A₁ to A₂. By `vertical` sharing, then, I mean that the transition point is present from the toplevel to the lowest subdivision that contains the point (Though that's a bit of a tautology, but if A₂ is subdivided, and A₁ isn't subdivided, and the point is in A₁, it's not in the furthest subdivision level of A).

Now I'm exhausted again, and I'll stop here. I'll probably come back to update the thread again, later.

Sarmatian123 · « **Reply #1 on:** April 09, 2020, 05:16:59 pm »

Floyd-Warshall algorithm (all pairs shortest paths matrix), expanded with next-step matrix, on GPU runs at O(n) time.
To find path you check next-step matrix in O(n) time.

Thing is Toady is not using GPU or parallel programming. DF is old mill single thread program. You get 2 threads on 1 CPU these days. So Toady is using some improved or optimized version of Dijkstra algorithm running around O(n^3). At least so much I have been told on forums for last few years.

https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm#Implementations

I could myself provide OpenCL implementation for Floyd-Warshall algorithm, if Toady wished me to.

eurydike · « **Reply #2 on:** April 15, 2020, 06:24:05 am »

Quote from: Sarmatian123 on April 09, 2020, 05:16:59 pm

Floyd-Warshall algorithm (all pairs shortest paths matrix), expanded with next-step matrix, on GPU runs at O(n) time.
To find path you check next-step matrix in O(n) time.

Thing is Toady is not using GPU or parallel programming. DF is old mill single thread program. You get 2 threads on 1 CPU these days. So Toady is using some improved or optimized version of Dijkstra algorithm running around O(n^3). At least so much I have been told on forums for last few years.

https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm#Implementations

I could myself provide OpenCL implementation for Floyd-Warshall algorithm, if Toady wished me to.

Looking at the algorithm wikipedia page, it appears this algorithm is intended to run on graphs which can be weighted. How efficient it is for that purpose? I might have misunderstood, but I think to utilize this algorithm, it is first necessary to generate a graph on which the algorithm can work. What do you think, is it perhaps useful as it is?

Made a couple of images crudely with gimp, they are not particularly significant, but might be of some inspiration to someone, possibly maybe.

Image 1:

Spoiler (click to show/hide)

Image 2:

Spoiler (click to show/hide)

Moeteru · « **Reply #3 on:** April 15, 2020, 09:43:01 am »

Quote from: Sarmatian123 on April 09, 2020, 05:16:59 pm

https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm#Implementations

That Floyd-Warshall algorithm is interesting, but look at the "comparison with other shortest path algorithms" section of the article you posted. It only performs better than Dijkstra's algorithm for dense graphs in which the number of edges, |E|, is comparable to the number of vertices, |V|, squared. Any large grid-based map such as the one used in DF is very sparse, so |E| << |V|². In simpler terms, most tiles are more than one step away from most others.

I think a solution based on hierarchical path-finding is the way forward. Break the map up into smaller chunks, calculate and cache paths between them, then use standard A* for navigating within any single chunk. The hard part is deciding where to draw the chunk boundaries, but a lot of that logic would have side-benefits for features like digging invaders. It could even be used to implement a default "Get Inside!" alert by essentially auto-detecting the extent of the fortress. The player could be given tools similar to the current burrow system to give hints to the computer about where boundaries should be drawn in case the computer gets it wrong, but it's a fairly safe bet to place boundaries at any doors, hatches, and lever-linked bridges.

Sarmatian123 · « **Reply #4 on:** April 15, 2020, 02:40:25 pm »

GPU does vector calculations. Not matrix calculations. So all matrices have to be translated into vectors.

Floyd-Warshall algorithm parallelization reduces its cubic Θ(n^3) run-time to linear O((n^3)/p)+O(n).
Interestingly practical implementation is surprising with some minor side-effects from hardware, which gives tests slight variations in results.

Theory snippet
A shortest path from vertex u to vertex v is then defined as any path p with weight w(p)=d(u,v).

There are following variants of this problem:
– Single-source shortest-path problem: given graph G=(V,E), we seek after shortest path from a given vertex s ∈ V to every vertex v ∈ V .
– Single-destination shortest-path problem: Find a shortest path to a given destination vertex t from every vertex v. By reversing the direction of each edge in the graph, it can be reduced to single-source shortest-path problem.
– Single-pair shortest-path problem: Find a shortest path from u to v for given vertices u and v. Solving single-source shortest-path problem solves this problem.
– All-pairs shortest-path problem: Find a shortest path from u to v for every pair of vertices u and v. This problem is solved faster by other algorithms, then by running single-source algorithm from each vertex.

The most important algorithms solving Shortest Paths problem are:
– Dijkstra’s algorithm, for single-source shortest path problem.
– Bellman–Ford algorithm, for single source problem.
– A* search algorithm, for single-pair shortest path problem.
– Johnson’s algorithm, for all pairs-shortest path problem.
– Viterbi algorithm, for all pairs-shortest path problem.
– Floyd-Warshall Algorithm, for all pairs-shortest path problem.

Data dependencies in Floyd-Warshall Algorithm.
Floyd-Warshall Algorithm has an obvious data dependency, which can be used in its parallelization. This is matrix multiplication. A value of a cell in a given matrix is calculated using only values from cells, which lie on the same row and column as the calculated cell. Therefore matrices can be turned into vectors, row by row, which are the data format used by GPU for quick concurrent mass-calculations.

Parallel All-Pairs Shortest Path Algorithm.
Parallel All-Pairs Shortest Path Algorithm is simply a parallelization of Floyd-Warshall Algorithm. It turns matrix multiplications into parallel computations on data vectors. Because of an excessive number of processors contained in a single GPU, the speed up is significant, unless matrices are so huge, that their vertices exceed number of available processors by some large factor.

1. for k:=1 to n do (complexity O(n))
2. - Each process p ij that has a segment of the k row of D k−1 , broadcasts it to the p ∗j processes; (complexity O(1))
3. - Each process p ij that has a segment of the k column of D k−1 , broadcasts it to the p i∗ processes; (complexity O(1))
4. - Each process waits to receive the needed segments; (complexity from O(1) to ?)
5. - Each process computes its part of the D (n) matrix; (complexity O(1))

Performance depends on p amount of available processors on given GPU/CPU.
Worst case performance is O((n^3)/p), when there is far more edges then available processors.
Best case performance is O(n), when processors > edges.
Overall pipeline version run-time is O((n^3)/p)+O(n).

Moeteru · « **Reply #5 on:** April 15, 2020, 03:46:10 pm »

Quote from: Sarmatian123 on April 15, 2020, 02:40:25 pm

Worst case performance is O((n^3)/p), when there is far more edges then available processors.
Best case performance is O(n), when processors > edges.
Overall pipeline version run-time is O((n^3)/p)+O(n).

The edges in a typical DF pathfinding graph will massively outnumber the cores in any commercially available GPU.
One map tile is 48x48 voxels in size, ignoring the Z dimension. If we consider a unit just walking around on a flat surface without any digging, that gives us a graph with 6721 edges. Multiply by 16 for a default 4x4 embark, then multiply by 4 to account for the three cavern layers, and you're looking at around 400,000 edges. If any fliers enter the map then you can increase that number by at least another factor of 10.
For comparison, a state-of-the-art and very expensive Titan V GPU has 5120 CUDA cores. My GPU has a little over 600. It should be obvious that we're not even close to the O(n) case.

That said, the idea of using an all-pairs shortest-path algorithm and caching the result isn't a bad one. The faster lookup times might justify the initial cost of generating the lookup table. The problem then becomes how you deal with the frequent map changes produced by mining out tunnels, changes in water level, changes in temperature, cave-ins from forest fires, etc. It's not uncommon to be in a situation where the pathfinding graph is changing every few ticks. How would you deal with a map with a waterfall where the pathfinding graph is never stable for the whole life of the fortress?

Sarmatian123 · « **Reply #6 on:** April 16, 2020, 05:04:08 am »

I did run on 2060gtx dense matrix with 1000 interconnected vertices d(v,u)=d(u,v), which had no negative weights and d(v,v)=0.
I did not use CUDA, but OpenCL with CUDA-kit.
I modified for my purpose a public OpenCL kernel (kernel is a GPU program executed by GPU) from GIT.
1000x1000 = 1000000 edges.
The measured GPU run time was O(n^1.2) so almost O(n). Overage time from 10+ attempts is 0.035 seconds on my hardware.
In comparison, the sequential version of Floyd-Warshall algoritm took entire 20 seconds to accomplish the same. Measured O(n^2.9), which is probably a result from some CPU optimizations for matrix multiplications.

The thing is, there are far less warps (those "CUDA cores" executing kernels) on GPU then actual processors.
Modern GPUs have processors literally in millions and edges are executed not on warps/"CUDA cores", but on processors.

The matrices (distance and next-neighbor) need to be re-calculated only upon changes in terrain. Else they are exactly the same.

To find shortest path between any pair of points on such map takes linear O(n) time. It searches through already calculated next-neighbor matrix. In practice way less then O(n) and this is optimal shortest path to target. If there is no path, then it is even O(1) search. The only issue here is that Toady is not using any second process to run GPU programs. I think there is some issue how Toady is storing the maps, so he can not exactly send them to process in other programs. Though DFHack can access DF maps and modify them from memory, into memory.

Moeteru · « **Reply #7 on:** April 16, 2020, 05:55:33 am »

Wow, okay, those results are pretty impressive. I didn't realise there was a difference between the number of logical "CUDA cores" and the number of physical processors.
If it still scales that well on older and less expensive GPUs then it's definitely a huge improvement over other algorithms.

Sarmatian123 · « **Reply #8 on:** April 16, 2020, 06:23:47 am »

Warps are not exactly CUDA cores. They are in levels above them. However it doesn't matter, as you feed kernels (GPU programs) into warps for execution. And yes, I believe, there are 2^n processors per CUDA core. I heard on lectures about Nvidia confusing naming schedule though.

Due GPU architecture all data that can be reformated into vectors, will be executed very efficiently. There are of course additional data optimizations for every special GPU.

I used OpenCL with default options for GPU. OpenCL works on all modern GPUs and all modern CPUs. I wrote for example the program for 2060gtx and installed CUDA-kit for OpenCL, but in my laptop in low energy mode is for example used integrated Intel GPU and my OpenCL program still did run on it quite well. I was pretty shocked by this discovery.

eurydike · « **Reply #9 on:** April 18, 2020, 03:33:14 am »

Interesting Sarmatian and Moeteru. This is educational.

Game map and graph correspondence
The game map is not quite a graph, but there exists a corresponding graph which has a node for every tile, or a node for every non-impassable tile, with edges representing the walkability - or potentially accessibility to fliers or climbers - to the adjacent tiles. Finding any solution that is more efficient at processing pathfinding in the game is then equivalent to finding a solution that solves a graph of this type more effectively than the existing algorithms. Depending on how the maps are stored, they're not quite graphs yet necessarily.

Differences to arbitrary graphs
The map for the most part represents an euclidian space with integers and 3 orthogonal directions, and in many cases working on the two dimensional plane is sufficient.

This means that there is potentially a shorthand to a metric in the very way the game map is stored, though I don't know how the maps are stored.
For an obvious example, if a tile has the coordinate 1,1,1 and another tile has a coordinate 1,1,2 then the distance between the tiles corresponds to some minimum unit of distance.

Single destination shortest path problem is also related to this, as the tiles which have the shortest distance to a given destination tile, are (almost) always the adjacent tiles. To find the tiles which have the shortest path to a given destination doesn't yet require generating a graph, as they can be directly accessed by the map datatype format (Again I don't know how they are stored). That is, they're all the tiles with only changed coordinate, and when the difference is only 1 unit. E.g. given a tile at 3,3,3 to find the tile with the shorest path to that tile, you would simply check if (2,3,3) (4,3,3) (3,2,3) (3,4,3) (3,3,2) (3,3,4) are walkable, ignoring flight related issues temporarily. Note: I'm not sure if the game considers diagonal movement to as covering a longer distance (i.e.√2)

Attempting to generate a graph in a way that pre-solves some of the pathfinding queries
As you would presumably have to traverse the map with somekind of an algorithm in order to convert it to a graph or to a solvable format, it might be possible to simplify the graph in the process, e.g. by attempting to link continuous spaces of adjacent tiles into single nodes, perhaps in the tentative way suggested in the initial post.

Usefulness of the resultant graph or virtual abstracted map of the game map
Another potential benefit is related to the utility of the virtual map. If instead of feeding the entire corresponding graph to a pathfinding algorithm, you attempt to do some simplifying via unions, cropping etc, and end up creating groups for nodes which correspond to somekind of sensible ingame spaces, these groups could be useful. For an example if the graph you're doing the top level pathfinding on contains fairly large spaces, then it is also an index for contained items and creatures. E.g. You can decide which item to retrieve using the toplevel graph alone, instead of looking at each individual item. (The game seems track all the distances to individual items currently?)

The GPU techniques mentioned in the comments are a bit too advanced for me to follow properly, but it's interesting to read regardless. Hopefully this thread, thanks to your contribution, ends up helping the DF project a bit.

Zero665 · « **Reply #10 on:** April 24, 2020, 09:28:08 am »

Hi all, I honestly don't understand half of the technical stuff your talking about, you guys are too smart

Sorry my bad english

I imagine the final game with a different system, I will try to explain, for me anyone that enters in the map without previous information (or a path that we builder, or even the incorporation of sings) or the wild animals / beast shouldn't know where the entrance or base is ... They should star walking or searching till they find something and check it out, or smell, see something that triggers, and the same for dwarff, if you send them to a new location they should discover the maze first, and then if you build a path they will remember the way, just a raw idea, but for me would make the game much more interesting, I don't like that a enemy in the other side of the map knows that I have close a door 10 lvls underground, also would be nice that enemies try to enter the place, with the incorporation of Rams / miners, but that is another subject

eurydike · « **Reply #11 on:** June 15, 2020, 07:39:18 am »

After discussing this matter further on Dwarf Fortress Discord with a user named `Kelly` I found out that it probably is the case that:

- "Temperature calculations" is the largest single subcomponent in terms of performance cost
(I was a bit surprised to find out that there is no diffusion of heat between tiles, and according to Kelly, the temperature calculations are entirely based on bodytemperature?)
- Pathfinding has already been revised to contain elements comparable to those suggested in the thread, so that pathing works using larger grids like 16x16, and items are also indexed to them.
- Item vector might be relevant to performance costs

Here are some additional points regarding what I think might improve performance:

Temperature calculations, lookup table generation and simplification

-For creatures of particular bodysizes and shapes it might be possible to determine a value like `surface area` to use as a simple coefficient for temperature calculations.
-For clothing and insulation, it might be possible to generate another simple value such as insulation.
-Temperature calculations for temperature difference between environment and dwarf, could be based on a pregenerated lookup table with some granularity,
so there would effectively be minimal amount of calculation, and instead insulation and surface area would form one table,
and the resulting value would be paired with another table containing temperature differences.
The table lookup algorithm could potentially be optimized depending on number of entries, but I'm kind of expecting this format to have so few rows that it's unnecessary.

Temperature calculations in the real world are overall very complicated, in so far as I understand. Trying to simulate temperature perfectly, wouldn't make sense due to the complexity.
This part of the discussion doesn't really relate to DF, but I'm trying to explain my impressions about how temperature calculations work, which might serve as some additional
inspiration for how to do this efficiently, or not.

There are 3 primary modes of heat transfer, conduction, convection and radiation. Radiation is based on the 4th exponent of temperature using somekind of blackbody radiation formula,
while conduction and convection in so far as I remember, are modelled to somewhat linear, or directly relative to temperature difference.
Convection refers to transfer of heat at the boundary surface of two different substances or the motion of particles which transport the heat.
Conduction in so far as I can remember and just attempted to verify, refers to heat transfer within a particular substance based on collision of molecules, which
tends to average the speed of the moving molecules and thus reduces to averaging of temperature.

I'd like to add that in a gas or liquid - a fluid - there's also diffusive mixing of molecules, which under the above definition would be based mostly on convection.
But that might be a bit misleading, as convection is also used as a term to refer to an air current or motion of particles in the fluid, such as river flowing or air flowing.

Temperature changes are a differential system and that also complicates matters. To illustrate why, I'll try and use the following minimal example:
Imagine there are four particles A, B, C, D which are remain in the same place, but exchange heat with their neighbours.

A - B - C - D

If we look at only two particles, B and C, noting their difference of temperature, it might be simple to calculate their temperature change over time as a function
of the initial temperature difference and time. The problem in my opinion mostly arises from that the neighbours' temperature changes over time are based on their other neighbours'
temperature changes over time: in other words, the temperature exchange between B and C can not be modeled precisely over time without knowing also about A and D.
As the rate of change (derivative) of a B is dependent on the temperature difference between B and C, then it's dependent on the current values of both B and C.
So the value and the derivative are related.

In my experience this typically causes problems in systems that use simple and straightforward linear approximation. For an example consider the following grid:

D
|
A - B - C

If one were to model the temperature changes of this system simply by checking the temperature difference between each neighbour, and choosing an approximated linear change,
and then applying that, it would result to instability in several cases. For an example if A, D and C are all hot, and B is cold, and one would approximate the temperature exchange over time,
without taking into account that each other neighbour of B is also heating it up, this could (depending on the step size of the approximation), result to B ending up with a temperature,
higher than A, D or C initially. And then in the next iteration, the same problem would occur in reverse, B would cool excessively and A, D and C would heat excessively.

The same type of problem can occur with other similar systems, such as waterflow or water levels. If we imagine B is a dry tile, and A,D,C have lots of water, B can end up with excess water.

I think this might be possible to model with a linear system that would take into account the temperatures of neighbours and using some pre-selected coefficients or weights:
For an example in the case of the chain A,B,C,D for A you could say it's temperature is primarily based on B, and secondarily based on C, and tertiarily based on D.
Each separation level in practical terms is somewhat comparable to a derivative. E.g. The rate of change of the temperature of B is dependent over time on the temperature of C,
And As temperature is dependent on B, then C with respect to A is somewhat comparable to the derivative of B. Using an approach loosely comparable to a Taylor Series,
for each cell, it might be possible to take into account the temperatures of distant neighbours. The problem is then, that the number of cells to to consider for the calculations,
would increase rapidly. E.g. If each cell is approximated by looking at 2nd order neighbours and 3rd order neighbours, then even to generate a tablelook up coefficient,
you'd have to look at 36 cells for each cell.

This problem has probably been solved craftily in some context by someone already, and hopefully there's even no need to think about a new way to solve it.
Is someone perhaps aware of an efficient way to calculate such matters, in some context?

Item vector? Mergeable and countable items?

If the game could internally handle some items as countable stacks of the same item, it could decrease the number of items that need to be tracked of.
This could be done by creating container classes such as:
- A pile
- A stack
- A disorderly mess typical of dwarven behavior

If the "item vector" had an item such as "My Item [50]" similar to drinks and such, it could shorten the "item vector".
A general prefix container class which requires no storage item, could help with this. So if dwarves put several items on the same tile, it could be called a pile.
A pile could have a specific storage volume, and maybe incur some rummage costs for getting stuff out, if there are items of several types in the pile.
Tracking items inside storage areas as groups might also work that way, potentially. E.g. "This stockpile contains 50 logs." would be easier to track than the precise
location of each log individually. Problem then is that there needs to be info about which tile they're on and so forth... Which would need individual tracking.

Maybe there is a method something along these lines that could ease the item burden?

PatrikLundell · « **Reply #12 on:** June 15, 2020, 12:23:03 pm »

There are several ways of stacking items into "virtual container" items. One method is to merge all "sufficiently similar" items on the same tile into a stack object (essentially the same as the current stacks of arrows, booze, fruit, etc. that can currently be split only), destroying the original objects and replacing it with a stack object that can then accept new members (destroying them, increasing the stack size) or release existing ones (creating new objects with the properties of the parent stack object). The problems here stem from two issues:
- DF isn't that geared towards piling multiple items in the same tile, although quantum stockpiles and regular containers exist.
- What makes items "sufficiently similar". If you decide it means all relevant fields are exactly the same then two otherwise identical bolts made by two different smiths at the embark won't stack, or even two items made by the same one if time of creation is stored: only bolts from the same construction batch would restack. If you decide that some data isn't really important, then you'd have to decide, on an item per item type per item type basis, which ones that are relevant, and which information you'd be willing to sacrifice (and you'd probably have to document the criteria decided upon or people would either detect it and issue bug reports, or ask about why things work differently in different cases).

You'll also have other stack issues that aren't pressing currently, such as the handling of stacks that are too large to fit in the target container (which ought to be possible to solve with stack splitting).

Note that DF already stores different items in different vectors (in addition to the general one), and I would expect e.g. clothing wear to iterate only over the ones containing clothing, for instance.

Whether volume item vector arrays would be worth the CPU to administrate them depends on how much CPU it saves on searching, and the relevant part here is where DF has to search for something to perform a job, as any searches that end up in the player UI happens at a human time scale anyway. If stockpiles are the norm, most items are already essentially gathered in a few locations, although the vectors don't know that, but stockpiles could be seen as as sparsely placed volume containers (and I don't know to what extent DF uses stockpiles for item location purposes). If there are only a few items in the category searched, having to look at many volume containers in a shell by shell search (i.e. further and further away in 3D) until a match is found will cost more than a straight vector search of every item in that category, and for some items (rocks) it seems a last in-first out approach is used (at least when stockpiles aren't used. Note that's a casual observation, not a study result).

hanni79 · « **Reply #13 on:** July 01, 2020, 07:14:37 am »

Wow, this is really interesting to read !

ayy1337 · « **Reply #14 on:** September 09, 2022, 07:42:10 pm »

I have an idea as well, to use staircases as nodes in a transport map, sort of like how google maps uses intersections on a road. To figure out how to get somewhere on the road you just have to check which intersections you can go to, which those connect to, and so on, until you get to your destination. Basically graph search.
The motivating idea is that basically to get from point A to point B in a fort, a dwarf will almost always be going through various staircases. So if you know what you can get to and how from your nearest staircase, you don't have to waste time pathing to it.

Basically the core of the approach is extremely simple; whenever a staircase is constructed, cache a list of paths to other staircases and interesting landmarks on the same floor like workshops, stockpiles, etc.

Possible issue: Imagine a floor where you have two non-connected regions, how do you avoid trying to generate a path to the landmarks in the other one?
Possible solution: Maybe, the first landmark you find when searching around will have cached paths and that will tell you which landmarks you can path to

Possible issue: How do you know which staircase a dwarf should go to to 'enter' this transport network? There might be unreachable staircases on the same floor as he is.
Possible solution: Have each dwarf remember which landmark he last visited and how to get back there

Possible issue: What do you do when a wall is made that blocks a cached path?
Possible solution: If the obstacle is permanent (wall) then invalidate the path and recalculate it. If it's toggleable (bridge, door, etc) then keep the cached path but add a second one to use when that obstacle is present.

Possible issue: How do you know when a new path between two nodes has been created, like digging out a wall between two previously unconnected areas?
Possible solution: This one is kinda tricky. Might need to track connected regions if you want to notice straight away. Another option is re-calculating the cached paths occasionally, or have a designation to manually tell the game to recalc the paths for waypoints in some area.

Possible issue: There are too many waypoints on a floor to keep calculating paths between them all
Possible solution: Even if there are many waypoints, calculating the path once for commonly used paths will save tens of thousands of pathfinding cycles.

Another piece of this is you can generate the cached paths in parallel using a thread pool.

News:

Author Topic: Ideas for improving pathfinding performance (Rewritten) (Read 5541 times)

eurydike

Ideas for improving pathfinding performance (Rewritten)

Sarmatian123

Re: Ideas for improving pathfinding performance

eurydike

Re: Ideas for improving pathfinding performance

Moeteru

Re: Ideas for improving pathfinding performance (Rewritten)

Sarmatian123

Re: Ideas for improving pathfinding performance (Rewritten)

Moeteru

Re: Ideas for improving pathfinding performance (Rewritten)

Sarmatian123

Re: Ideas for improving pathfinding performance (Rewritten)

Moeteru

Re: Ideas for improving pathfinding performance (Rewritten)

Sarmatian123

Re: Ideas for improving pathfinding performance (Rewritten)

eurydike

Re: Ideas for improving pathfinding performance (Rewritten)

Zero665

Re: Ideas for improving pathfinding performance (Rewritten)

eurydike

Re: Ideas for improving pathfinding performance (Rewritten)

PatrikLundell

Re: Ideas for improving pathfinding performance (Rewritten)

hanni79

Re: Ideas for improving pathfinding performance (Rewritten)

ayy1337

Re: Ideas for improving pathfinding performance (Rewritten)