Topic: Improving Pathfinding. (Computer sciency) (Read 65222 times)

NW_Kohaku · « **Reply #270 on:** April 09, 2011, 06:56:07 pm »

Quote from: Sunken on April 09, 2011, 11:35:52 am

We're still talking about pathfinding, right? Not about searching for a single node?
I quote:
Quote
Now, in order to search this tree, you start from the starting node, and search the hash list of branches and leaves connected to the starting node. If the destination node is not in that list, you go down a node on the tree, and search the lower node that connects to the starting node, and see if you can reach the destination node by only going up from there. If that doesn't work, you keep going down the tree all the way to the roots until you can find a path to the destination node.

Once you find a branch with a connection to the destination node, you just go up the tree, testing which branches will have connections to the destination node, until you finally strike the destination node, itself.
Now tell me how the searching of data structures doesn't impact performance in this algorithm.

I submit: Yes, if you restrict your pathfinding to only considering this star-shaped topology of your map, ignoring any cross-connections, then you can find a path faster like Kohaku says. But
a: Your paths can be arbitrarily bad and
b: Your time cost will be multiplied by a hash table (or similar) lookup (since it occurs at every step) and
c: Your space cost will be much increased

b and c would be mitigated if we had something like the square-in square subdivision you suggested earlier - then a couple of comparisons would do and no extra storage would be needed. But that carries the extra problem that not all space in each super-square is connected. That's been discussed earlier in this thread; you could split each node into multiple ones representing connected subcomponents. This throws the best-case complexity out the window however.

You're getting a fairly close, yes.

A. is probably not as bad as you think it might be, and it may simply be worth the loss. As long as the nodes are of generally similar sizes, you should generally see nodes that are equidistant from wherever the root node is should be roughly similar tiers on the tree, unless the nodes have to take some very winding detours. In that case, the path that took the long detours will naturally already be fairly high on the tree, while the straighter path will be low on the tree. The low parts of the tree should take precedence in gaining leaves and branches for itself in the purpose of building the tree. Using the last method I mentioned ("duplicate-branches" let's call it), the one that takes up extra memory, should potentially solve all the problems with un-optimized routes. (And remember, we want to optimize for performance and memory use more than we necessarily want to optimize for finding the shortest path, so long as it isn't an unreasonably long detour.) I haven't actually tried to come up with some sort of nightmare case that would actually produce a real problem for the duplicate-branches, so it may be optimal, anyway.

B is true, but again, it's probably not the limitation you think it is - you are looking up the hash table in every node you visit to see if there is a direct path up the tree to find the node you are searching for. This is, however, not a problem. This basically just means we are doing another memory fetch per node we look at, while at the same time we are cutting down a cross-map search from a sub-geometric complexity to a logarithmic complexity. In other words, we're turning O(n^3) into O(log(2n)) where the 2 in the 2n is the extra looking up in the hash table. That's completely worth it in anything but the shortest of all pathfinding equations.

C is the real threat. As you say, however, it may just be easier to make multiple "levels" or "hierarchies" of data structures altogether to try to keep the size of each individual data structure down. (I say "levels" instead of "tiers" so as not to confuse the terminologies, here.) Because the complexity of those hash tables increases geometrically with the number of tiers in the tree, (and again, this complexity is geometric growth of memory consumption giving us logarithmic growth of processor power,) you can save on memory by making multiple levels of nodes.

For example, in the "moving to another location on the Earth" example I was talking about earlier, you can organize individual nodes of buildings and streets, and conglomerate them together into a city, and you only have to do the tree search from the city level up to the planet level.

You can have fairly simple nodes that are conglomerated together on the next level of abstraction so that you have an area consisting of no more than some arbitrary number of nodes or some arbitrary amount of area where you can ensure relatively easy connectivity between that area's nodes, and then make that area of nodes the "city" which is the smallest unit in the tree data structure - hence, conglomerating 100 nodes into a "node area" that occupies one large-scale tree node.

The starting point and destination would then compare whether they are in the same node area or not, and if they are in the same area, you wouldn't have to use the large-scale tree at all.

You could work the small-scale area into being a tree configuration, as well. You simply need to make a list of known connections to the other areas, and pare them down to the ideal candidates. In a collection of smaller-scale nodes, where everything is conglomerated based upon being in proximity to one another, you will actually see your "A" concern a little less prominent, as well, since you can simply build the closest thing to a fractal "star" that you can get out of the nodes to make your small-scale tree.

You can also throw as many levels of abstraction into this as you ever deem necessary at that point - you can simply have some arbitrary cut-off of x-number of nodes in a tree that forces the jump to a new level of abstraction, allowing for arbitrary levels of abstraction, although this would make the address of each node become more complex, and require a vector to store it, rather than set data types.

Niseg · « **Reply #271 on:** April 10, 2011, 01:03:10 am »

Quote from: Mohican on April 09, 2011, 01:24:53 pm

Quote from: Niseg on April 09, 2011, 12:28:39 pm

Ok I looked at some of your code I think you should have used an array of structs instead of 3d array . Well doesn't matter.

my idea was to use this data type for node:
Spoiler (click to show/hide)
typedef struct
{
int dst;
int x,y;
} tEdge;

typedef struct
{
int id; /*logic id */
int x1,y1; /* upper left corner */
int x2,y2; /* lower right corner */
unsigned char bitvect[32];
tEdge edges[10];
int nEdges;
} tNode
/* some bit vector manipulation macros */
#define MASK_8BIT(bit) (1<<(7-bit))
#define BV_SEL_BYTE(bv,bit) bv[(unsigned int) (bit))>>3]
/* bit extract */
#define BV_BE(bv,bit) ((BV_SEL_BYTE(bv,bit))>>(7-bit))&1)
#define BV_SET(bv,bit) BV_SEL_BYTE(bv,bit)|=MASK_8BIT(bit)
#define BV_CLR(bv,bit) BV_SEL_BYTE(bv,bit)&=~MASK_8BIT(bit)
#define BV_NOT(bv,bit) BV_SEL_BYTE(bv,bit)^=MASK_8BIT(bit)

This can save nodes in any shape even overlapping rooms with max size of 256. There are more useful function like growing and shrinking it to the appropriate room size (adding rows and columns at the edges) .

I'm not sure I understand your struct , do you define a node by a (at most) 10 edges shape? if so, what are the 2 corners?
how can it store this shape for exemple? ~~(which is convex)~~ (edit: forget this , but still how do you represent this shape and whith how many nodes?)
Code: [Select]
################## ##+#+#+#+#+####### #++++++++++####### ##################
And more importantly how do you know if you can add a tile to a shape (does the new shape will still be convex?).

But my main goal was to make the function that dinamically change the graph when you add (dig) a new walkable tile not the room slicing function. I just tried to slice the map by digging 1 tile by 1 tile, and it was surprisingly not so bad and pretty fast for a first try.

believe me 10 edges is more than enough. rooms with too many can be merged or sliced.

let say our constraint is size 4X4 rooms or 16 nodes per room
_0123456789ABCDE 0################## 1##+#+#+#+#+####### 2#++++++++++####### 3##################

here we can split to 2 rooms because there are 24 nodes . first thing split it like a program into 4x4s
_0123 4567 89AB CDEF 0#### #### #### ###### 1##+# +#+# +#+# ###### 2#+++ ++++ +++# ###### 3#### #### #### ###### nodes id|ul |lr |bv16|0|1|2|3|4|5| --|---|---|----|-|-|-|-|-|-| 0 |0,0|0,3|02F0|1| | | | | | 1 |4,0|7,3|0AF0|0|2| | | | | 3 |8,0|B,3|0AE0|2| | | | | | 4 |C,0|F,3|0000| | | | | | |
room 4 can be discarded...

now with both nodes and tables we can start transforming for a merge or leave them this way to support digging. the merge is done by slicing the sides like bottom and top that have no edges and then transforming them into new room. in our case it would be real easy

merge 0 into 1 would make a room ul=1,1 lr=B,2 and the bitmask would transform from 4X4 to 2X8 bv=AAFE.

if you preserve the original structure and want to dig 1,1 all you need to do is change mask from 02F0 to 03F0.

the validation scheme is a bit vector output breadth first search.Which is xored with the original room's open slot bitvector which gives you rooms outside the array. you can then figure out what you need to chop out. This example isn't that useful for showing complex splitting.

Edit: PS

I'm considering openning another thread for path caching since we kinda split this thread into multiple dirrections. Some people want to talk about meshes( I think I got a way to implement it after looking at a few topics on the subject) , and others about room division (promising but take some work).

Path caching is by far the least complicated method to implement . Optimizing it is a complex topic though the algorithms are not high complexity. I think it's preferable the game use waypoints as a guide rather than to use them as a waypoint. I have a new search which rounds off the ends in my simulator. and I'm still experimenting with stuff (looked and how you do it in meshes) . It now let you draw lines too which would make it easier to draw your mazes. I think I'll add the splitter I mentioned above soon .

Sunken · « **Reply #272 on:** April 10, 2011, 02:13:24 am »

Quote from: NW_Kohaku on April 09, 2011, 06:56:07 pm

A. is probably not as bad as you think it might be, and it may simply be worth the loss. As long as the nodes are of generally similar sizes, you should generally see nodes that are equidistant from wherever the root node is should be roughly similar tiers on the tree, unless the nodes have to take some very winding detours. In that case, the path that took the long detours will naturally already be fairly high on the tree, while the straighter path will be low on the tree. The low parts of the tree should take precedence in gaining leaves and branches for itself in the purpose of building the tree. Using the last method I mentioned ("duplicate-branches" let's call it), the one that takes up extra memory, should potentially solve all the problems with un-optimized routes. (And remember, we want to optimize for performance and memory use more than we necessarily want to optimize for finding the shortest path, so long as it isn't an unreasonably long detour.) I haven't actually tried to come up with some sort of nightmare case that would actually produce a real problem for the duplicate-branches, so it may be optimal, anyway.

I sort of see before me a road system where all traffic between Pasadena and Seattle is routed through St. Louis...

I don't quite follow how the "duplicate-branches" algorithm works. What precisely are you storing for each node in addition to the simpler version? Which stores, if I understand it correctly, a list of all nodes above it and a pointer to its immediate successors.
(Your use of up and down confused me before; I usually have parent nodes above the child nodes but at least I get you now)

Quote

B is true, but again, it's probably not the limitation you think it is - you are looking up the hash table in every node you visit to see if there is a direct path up the tree to find the node you are searching for. This is, however, not a problem. This basically just means we are doing another memory fetch per node we look at, while at the same time we are cutting down a cross-map search from a sub-geometric complexity to a logarithmic complexity. In other words, we're turning O(n^3) into O(log(2n)) where the 2 in the 2n is the extra looking up in the hash table. That's completely worth it in anything but the shortest of all pathfinding equations.

I don't know enough about the complexity of hashes but won't the worst-case bite you occasionally? Anyway, you may be right on this one. Again, you could do without any hash table if the contents of a node were implicit (like with a quadtree, which is pretty much what Draco brought up before).

Quote

C is the real threat. As you say, however, it may just be easier to make multiple "levels" or "hierarchies" of data structures altogether to try to keep the size of each individual data structure down. (I say "levels" instead of "tiers" so as not to confuse the terminologies, here.) Because the complexity of those hash tables increases geometrically with the number of tiers in the tree, (and again, this complexity is geometric growth of memory consumption giving us logarithmic growth of processor power,) you can save on memory by making multiple levels of nodes.

For example, in the "moving to another location on the Earth" example I was talking about earlier, you can organize individual nodes of buildings and streets, and conglomerate them together into a city, and you only have to do the tree search from the city level up to the planet level.

This hierarchical clustering is really independent of your tree-based search. Any search method can be used with it.

One has to connect the different levels though - if I'm in St. Louis on the large-scale map, where exactly am I on the city map? How do I combine the costs from both levels? But this is for another subthread, really.

Quote

You can have fairly simple nodes that are conglomerated together on the next level of abstraction so that you have an area consisting of no more than some arbitrary number of nodes or some arbitrary amount of area where you can ensure relatively easy connectivity between that area's nodes, and then make that area of nodes the "city" which is the smallest unit in the tree data structure - hence, conglomerating 100 nodes into a "node area" that occupies one large-scale tree node.

The starting point and destination would then compare whether they are in the same node area or not, and if they are in the same area, you wouldn't have to use the large-scale tree at all.

You could work the small-scale area into being a tree configuration, as well. You simply need to make a list of known connections to the other areas, and pare them down to the ideal candidates. In a collection of smaller-scale nodes, where everything is conglomerated based upon being in proximity to one another, you will actually see your "A" concern a little less prominent, as well, since you can simply build the closest thing to a fractal "star" that you can get out of the nodes to make your small-scale tree.

You can also throw as many levels of abstraction into this as you ever deem necessary at that point - you can simply have some arbitrary cut-off of x-number of nodes in a tree that forces the jump to a new level of abstraction, allowing for arbitrary levels of abstraction, although this would make the address of each node become more complex, and require a vector to store it, rather than set data types.

Sounds OK in principle, but how would it all be done more concretely?

Niseg · « **Reply #273 on:** April 11, 2011, 10:07:33 am »

My work on path smoothing have shown major progress as you can see in this Maze(someone else drew it though). You can play with waypoint and do extra smoothing steps . The results are pretty good( you can even smooth A*) except for when there are loops which are caused by waypoints. I'll either use the loop searching function I'm working on and do smoothing afterward or improve the path smoothing to have a larger lookahead especially on the first iteration.

I think that after I get some kind of working setup it would be ready for its own thread

.

btw I opend a threads on the modding forum about my path finding simulator . I update that thread with my progress and plans.

edit:
I've been thinking about a new scheme after looking at some AI material . It's is based on path caching using something like hierarchical model with waypoints.. All waypoint would be double cross shaped with N,S,E,W,U,D edges and they would be placed to form a grid

flat example

Code: [Select]

A-B-C-D
| | | |
E-F-G-H
| | | |
I-J-K-L

They would be arbitrarily spread out through the map in a grid and the path between adjacent waypoints would be saved. Then to find a path all you need is use the old path cacher and then smooth it out to reach an optimal path.
this saves the headache of defining "rooms". it's not optimal but it's better than high complexity A*. In books I looked at they say that if you go 60X60 grid you better have a better plan than just using A*.

NW_Kohaku · « **Reply #274 on:** April 11, 2011, 11:12:33 am »

Quote from: Sunken on April 10, 2011, 02:13:24 am

I sort of see before me a road system where all traffic between Pasadena and Seattle is routed through St. Louis...

I don't quite follow how the "duplicate-branches" algorithm works. What precisely are you storing for each node in addition to the simpler version? Which stores, if I understand it correctly, a list of all nodes above it and a pointer to its immediate successors.
(Your use of up and down confused me before; I usually have parent nodes above the child nodes but at least I get you now)

This is why we then do this part:

Quote

In the first method, (let's use Wikipedia's page on trees for an illustration), let's assume that we found a path in that tree from the "5" node on the bottom left to the "9" node on the right. This would return a path of "5->6->7->2->5->9", but at the same time, 7 and 5 are connected, so you can cut out the path with 2 inside of it.

When making that original chain of nodes, you basically are only looking for going up and down the tree, not laterally - you go "down" the tree ("down" being towards the roots) until you find a route directly up the tree you can follow. To do this, you have a list of all the points "up the tree" from a given node. Once you have that chain, however, you can look for lateral shortcuts by simply comparing all the connections (up, down, or lateral on the tree) that allow a given node to bypass one or more steps in the chain.

This method has a complexity of an additional O(n^2/2) where n is the length of the chain you returned previously. It winds up with some extra memory calls, but should still be much faster than A*-ing the entire fort.

This will basically return you the previous path, but with a few shortcuts taken where they are found.

This would mean that, you would find a route that goes through St. Louis, and then you search for more direct paths by cutting out whatever "middlemen" you can, resulting in a much more direct route.

Quote from: Sunken on April 10, 2011, 02:13:24 am

I don't know enough about the complexity of hashes but won't the worst-case bite you occasionally? Anyway, you may be right on this one. Again, you could do without any hash table if the contents of a node were implicit (like with a quadtree, which is pretty much what Draco brought up before).

Hashes are very simple. If you aren't familiar with a hash, the idea is that, instead of building a linked list that you have to traverse down the list one by one, having to ask each item on the list where the next item on the list is, you generate a hash, which acts as a table of contents on the data. The hash table is an array with specific addresses (think of these as drawers in a filling cabinet), and objects get filed into the hash list based upon the number that comes up when you hash their reference data, the data you are searching for.

Hashing just means converting the data back into numbers, and adding and multiplying those numbers by a set amount so that you get a relatively unique integer number that can be compared to any other integer, no matter what sorts of data were originally thrown into the hash.

When you have a piece of data you want to throw into the hash table, you hash the reference data, the data you would search for, and then throw it in the resulting "drawer" of the hash table. When you want to look for that data again, you take the data you are searching for, hash that, and it will come up with the number of the hash table "drawer" you want to look for.

Hash tables are a way of looking for data in a list very quickly. They actually make the act of searching take up theoretically O(n/x) long, where x is the size of the hash table, and n is the size of the amount of data you are searching for. Technically, "x" should always be larger than n (how much larger depends on if you have linked lists off of the "filing cabinets" as well for overflow), so it really just comes down to O(1), which is the ideal complexity level - no matter what size the table is, it takes the same amount of time to find what you are looking for. (As compared to a linked list taking O(n), where if the data you are looking for is at the end of the list, you have to go through every single item on the list in order before you can find the one thing you are looking for.)

These "shortest of all cases" is literally something like "Urist, it's three feet to your right. You can see it. Just walk straight there." Something like that is so trivial, even with the loading of checks to see what zip code the target is in, you can do those all day long without a noticeable drop in FPS.

Quote from: Sunken on April 10, 2011, 02:13:24 am

This hierarchical clustering is really independent of your tree-based search. Any search method can be used with it.

One has to connect the different levels though - if I'm in St. Louis on the large-scale map, where exactly am I on the city map? How do I combine the costs from both levels? But this is for another subthread, really.

You can store tree-level paths "through" the tree. This means that you don't even have to path through entire lower-level trees you are pathing through, you just store the route, and always take that route.

If you're passing through St. Louis by taking the Interstate, you don't need to look at the individual city routes at all, you just need to stay on the Interstate heading on whatever direction you are heading.

Quote from: Sunken on April 10, 2011, 02:13:24 am

Quote
You can have fairly simple nodes that are conglomerated together on the next level of abstraction so that you have an area consisting of no more than some arbitrary number of nodes or some arbitrary amount of area where you can ensure relatively easy connectivity between that area's nodes, and then make that area of nodes the "city" which is the smallest unit in the tree data structure - hence, conglomerating 100 nodes into a "node area" that occupies one large-scale tree node.

The starting point and destination would then compare whether they are in the same node area or not, and if they are in the same area, you wouldn't have to use the large-scale tree at all.

You could work the small-scale area into being a tree configuration, as well. You simply need to make a list of known connections to the other areas, and pare them down to the ideal candidates. In a collection of smaller-scale nodes, where everything is conglomerated based upon being in proximity to one another, you will actually see your "A" concern a little less prominent, as well, since you can simply build the closest thing to a fractal "star" that you can get out of the nodes to make your small-scale tree.

You can also throw as many levels of abstraction into this as you ever deem necessary at that point - you can simply have some arbitrary cut-off of x-number of nodes in a tree that forces the jump to a new level of abstraction, allowing for arbitrary levels of abstraction, although this would make the address of each node become more complex, and require a vector to store it, rather than set data types.
Sounds OK in principle, but how would it all be done more concretely?

Well, you just start with whatever convex cubes you wind up having from the node-creating function (that's beyond this discussion). Then you have to start stitching these pieces together.

You have two major paths to follow from there.

The first option is that you can start from geometric isolation:
This means starting with splitting the map into cubes of geographic areas, and then looking to make sure that all nodes in those cubes connect with one another, and splitting of any cubes which do not have connectivity. You then stitch together trees out of the nodes that are still in those geographic areas have connectivity, and then you build the next level of hierarchy as a way to get from one geographic area to the others.

The other option is that you can start from an arbitrary point somewhere in the "middle" of the map:
This works best with the innate nature of a tree data structure, upon further thought, since it relies upon some of its basic principles, and assumes some dynamic ability to shift to changing nodes.

Pick a node, any random node on the map.

Now, you have to start connecting all its adjacent nodes to it as branches, and start building a tree.

During this process, the tree needs to "balance" itself, which is a basic function of many trees. This means that the "root node" will change to be the node with the most roughly-equidistant leaf nodes hanging off of it. (This means that you don't want a lopsided tree, if a branch becomes larger than the rest of the tree, the start of that branch becomes the new root, and the old root becomes a branch of the new root.)

Once you hit a certain point of complexity, say ten tiers on the tree structure, it is time to build the next level of hierarchy. In this, the root node becomes not only a root node for the level-1 tree that we were building before, but also the root node of the level-2 tree, which acts as the geographical equivalent of moving from a city-level to a state-level of travel. Each of those other level-1 trees becomes filled out with new root nodes for their level-1 trees, based upon the fastest connection to the root node of the level-2 tree. Each of these new level-1 trees is now a branch of the level-2 tree. You then keep on going, populating the level-1 trees until they have to simply split off, and make a new level-1 tree which they are connected to.

Eventually, you wind up building the whole tree by flood-filling the map of nodes until you have connected every node that has a connection.

If the level-2 tree becomes complex enough, it, too, would split into a level-3 tree. You can make this sort of method have theoretically infinite levels.

Sunken · « **Reply #275 on:** April 11, 2011, 12:21:12 pm »

Quote from: Niseg on April 11, 2011, 10:07:33 am

My work on path smoothing have shown major progress as you can see in this Maze(someone else drew it though). You can play with waypoint and do extra smoothing steps . The results are pretty good( you can even smooth A*) except for when there are loops which are caused by waypoints. I'll either use the loop searching function I'm working on and do smoothing afterward or improve the path smoothing to have a larger lookahead especially on the first iteration.

Nice interface.
I pressed A* and got a slightly "off" path though, so I'm wondering: what is the actual cost function you're using? one step costs 1? That would explain the path I saw but that makes several of the heuristics inadmissible.

Also, there's another heuristic that might be relevant:
sqrt(2)*max(abs(dx),abs(dy)) + (1-sqrt(2))min(abs(dx),abs(dy))
that is, the smallest distance possible if you may only walk at 45 degree increments (which is true for dwarfs in DF, though not for projectiles).

Sunken · « **Reply #276 on:** April 11, 2011, 12:31:27 pm »

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am

In the first method, (let's use Wikipedia's page on trees for an illustration), let's assume that we found a path in that tree from the "5" node on the bottom left to the "9" node on the right. This would return a path of "5->6->7->2->5->9", but at the same time, 7 and 5 are connected, so you can cut out the path with 2 inside of it.

When making that original chain of nodes, you basically are only looking for going up and down the tree, not laterally - you go "down" the tree ("down" being towards the roots) until you find a route directly up the tree you can follow. To do this, you have a list of all the points "up the tree" from a given node. Once you have that chain, however, you can look for lateral shortcuts by simply comparing all the connections (up, down, or lateral on the tree) that allow a given node to bypass one or more steps in the chain.

This method has a complexity of an additional O(n^2/2) where n is the length of the chain you returned previously. It winds up with some extra memory calls, but should still be much faster than A*-ing the entire fort.

Here you're letting the shortcut lookup be O(1) and the construction of that lookup table O(n), I take it. That may be reasonable on average, though not worst-case I guess.

Quote

This would mean that, you would find a route that goes through St. Louis, and then you search for more direct paths by cutting out whatever "middlemen" you can, resulting in a much more direct route.

Well, whether it's much shorter or merely cuts the very tip of the wedge next to St. Louis depends wholly on the coarseness of the map, doesn't it? If there are always at least two cities between the first and second "leg" of the trip, this modification won't make any difference.

Sunken · « **Reply #277 on:** April 11, 2011, 12:38:20 pm »

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am

Quote from: Sunken on April 10, 2011, 02:13:24 am
This hierarchical clustering is really independent of your tree-based search. Any search method can be used with it.

One has to connect the different levels though - if I'm in St. Louis on the large-scale map, where exactly am I on the city map? How do I combine the costs from both levels? But this is for another subthread, really.

You can store tree-level paths "through" the tree. This means that you don't even have to path through entire lower-level trees you are pathing through, you just store the route, and always take that route.

If you're passing through St. Louis by taking the Interstate, you don't need to look at the individual city routes at all, you just need to stay on the Interstate heading on whatever direction you are heading.

This works well if there are only a few entry points into a sub-map. There are only so many ways to enter St. Louis (on a major highway at least). But it will be worthless if there are loads and loads of entry points, such as will be the case for a general quadtree square over a DF map. And for anything larger-scale than individual rooms, which *may* have only a few doors out of it, it will be a monumental task coming up with an algorithm to decide the right way to subdivide the map into a large-scale clustering which minimizes the number of entry points.

Sunken · « **Reply #278 on:** April 11, 2011, 12:51:23 pm »

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am

Quote from: Sunken on April 10, 2011, 02:13:24 am
Sounds OK in principle, but how would it all be done more concretely?

Well, you just start with whatever convex cubes you wind up having from the node-creating function (that's beyond this discussion). Then you have to start stitching these pieces together.

You have two major paths to follow from there.

The first option is that you can start from geometric isolation:
This means starting with splitting the map into cubes of geographic areas, and then looking to make sure that all nodes in those cubes connect with one another, and splitting of any cubes which do not have connectivity. You then stitch together trees out of the nodes that are still in those geographic areas have connectivity, and then you build the next level of hierarchy as a way to get from one geographic area to the others.

Whoa, hold there. Let's assume without loss of generality that *all* the free space in the map is connected. What then? Just being connected can't be used as criterion for splitting space up - that would be meaningless! Space that isn't connected shouldn't be part of the same tree on *any* level!
You have to divvy space up on the basis of choke points, or maximum convex regions or some other heuristic. Which one though?

Quote

The other option is that you can start from an arbitrary point somewhere in the "middle" of the map:
This works best with the innate nature of a tree data structure, upon further thought, since it relies upon some of its basic principles, and assumes some dynamic ability to shift to changing nodes.

Pick a node, any random node on the map.

Now, you have to start connecting all its adjacent nodes to it as branches, and start building a tree.

During this process, the tree needs to "balance" itself, which is a basic function of many trees. This means that the "root node" will change to be the node with the most roughly-equidistant leaf nodes hanging off of it. (This means that you don't want a lopsided tree, if a branch becomes larger than the rest of the tree, the start of that branch becomes the new root, and the old root becomes a branch of the new root.)

Once you hit a certain point of complexity, say ten tiers on the tree structure, it is time to build the next level of hierarchy. In this, the root node becomes not only a root node for the level-1 tree that we were building before, but also the root node of the level-2 tree, which acts as the geographical equivalent of moving from a city-level to a state-level of travel. Each of those other level-1 trees becomes filled out with new root nodes for their level-1 trees, based upon the fastest connection to the root node of the level-2 tree. Each of these new level-1 trees is now a branch of the level-2 tree. You then keep on going, populating the level-1 trees until they have to simply split off, and make a new level-1 tree which they are connected to.

Could you draw a picture of that? *scratches head* Does every subtree become a single node in the L2 tree or multiple nodes? If multiple, how is the L1 tree subdivided to make L2 nodes?

Starver · « **Reply #279 on:** April 11, 2011, 01:38:25 pm »

Remember I mentioned Hash Tables in an earlier post. (Ok, maybe you don't, it was probably within a TL;DR; kind of post.)

Hash tables can be most inefficient for some searches and must ideally represent a much sparser map of points of interest than the totality of all possible points. Otherwise you might as well suffer the overheads of storing each (totality-set) point in an array (of however many dimensions) and get direct access to the data or reference (or lack, thereof), rather than having to wander through a list of randomly sorted reference points[1] to make your intended discovery.

A sorted hash-table index, however, would be very good for a situation where changes are few but queries many, by binary-search methods. Each time you have to change the table, though you would have to insert each and every new item (binary-search for the pair just-above and just-below, by the search criteria being used, then shuffle the top 'half' upwards to make room) and remove each and every removal (reverse, by shuffling down the greater-thans). You could make a compromise system in which a limited number of null data points get left behind after splicing out a point so that there's a chance that a spliced-in item can sit in a handy null-point, but that does slow the searching down[3], there's always going to be the possibility that a number of shifts[4], and certainly while increasing the number of hash-points being stored there's going to be a scarcity of null-spots and you're going to be increasing the table's size just as if it wasn't a null-filled version. Never mind that you may end up storing an N point hash table in a 2N or more length hash table, which may again eat into any of the efficiencies you get.

I'm not saying that Hash Tables are bad, and as a Perlite I tend to (over?-)use them myself for convenience alone, but the purpose you're putting the hash-table towards might not be particularly conducive to the structure. There isn't a magical 'black box' function that will let you store and retrieve hash values (including references) under a hash key in a straight O(c) number of CPU ticks. You may even be optimising something by using hash lookup that was actually more optimal, despite itself, than the hash-maintenance and using process ends up being.

Some intelligence may be able to counter the disadvantages, however. Perhaps integrating some sort of acknowledgement that the raising/lowering of a bridge might indeed split apart and reform into one areas on either side of it, according to state, would prevent you from trying to utterly collapse the hash-table entries regarding the separated zones into a single one (with all accompanying shuffling, if that's what's being done) when the bridge lowers and causes a reassessment as a combined zone, only to require the separation again (shuffling them the other way) each time the bridge raises and forces the zones to separate. Rinse and repeat many times, if the bridge is frequently toggled.

Perhaps a limited use of null-referencing 'placeholder' table entries could hold zones that are abandoned upon a reversibly state-change, so as to be quickly re-instated (and other placeholder entries being set in any equivalent abandonments when moving from the second state back to the first). Other than this, stick to the shuffling-without-nulls (creating or folding into) when (comparatively rare) landscape changes are made. It would take a little[5] extra coding to identify such obviously reversible states, but would not be quite so bad as having to keep null-references all over the place "just in case".

Note that I've not enumerated any of these ideas, mainly because the several different approaches being tried would give different amounts of overhead-bias for any given potential hash-using idea, and with at least three (and probably even more, if you consider how many different ways there are to fine-tune everything) hash-using ideas as well I don't see it worthwhile to do more merely expounding awareness.

[1] Either looking through the entire hash-set for the nearest point (possibly stopping immediately if you hit the exact point), or looking through a hash-set that contains within the hash-table[2] a 'coverage' for each set-point and stopping when the we find the appropriate container for the queried location.

[2] Or referenced from it, allowing variable-length structures, so perhaps taking less space if the data requiring storing is suitably compliant.

[3] May mean a significant amount of hunting up and/or down for a non-null table item to complete that round of the binary search, unless they're null referenced but maintain their original sorting values, ready for overwriting as deemed suitable.

[4] Into the next null space above only, of course, not necessarily for the whole top-end of the stack from that point.

[5] Or not a little, but I'm sure it could be done. Maybe whittle it down to "does <combined-zone> include <movable-landscape-components>?" and go from there, whenever transitioning either from combined-zone to separate-zones or vice-versa.

NW_Kohaku · « **Reply #280 on:** April 11, 2011, 02:23:30 pm »

Quote from: Sunken on April 11, 2011, 12:31:27 pm

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am
In the first method, (let's use Wikipedia's page on trees for an illustration), let's assume that we found a path in that tree from the "5" node on the bottom left to the "9" node on the right. This would return a path of "5->6->7->2->5->9", but at the same time, 7 and 5 are connected, so you can cut out the path with 2 inside of it.

When making that original chain of nodes, you basically are only looking for going up and down the tree, not laterally - you go "down" the tree ("down" being towards the roots) until you find a route directly up the tree you can follow. To do this, you have a list of all the points "up the tree" from a given node. Once you have that chain, however, you can look for lateral shortcuts by simply comparing all the connections (up, down, or lateral on the tree) that allow a given node to bypass one or more steps in the chain.

This method has a complexity of an additional O(n^2/2) where n is the length of the chain you returned previously. It winds up with some extra memory calls, but should still be much faster than A*-ing the entire fort.
Here you're letting the shortcut lookup be O(1) and the construction of that lookup table O(n), I take it. That may be reasonable on average, though not worst-case I guess.

No, not worst-case, really, but worst case can be very tricky to calculate.

Quote from: Sunken on April 11, 2011, 12:31:27 pm

Quote
This would mean that, you would find a route that goes through St. Louis, and then you search for more direct paths by cutting out whatever "middlemen" you can, resulting in a much more direct route.
Well, whether it's much shorter or merely cuts the very tip of the wedge next to St. Louis depends wholly on the coarseness of the map, doesn't it? If there are always at least two cities between the first and second "leg" of the trip, this modification won't make any difference.

Except it doesn't just trim off the very tip. It compares every step of the journey for a shorter route between every point along the line. This means that it will only path up to St. Louis if it has to go that far to find the correct relative distances in the tree for comparison, and then it could theoretically lop off that pathfinding all the way down to a simple path directly from one city to the other.

Let's take an example of two cities on opposite sides of a state boarder. When you want to go across that state border into the next town, you would be transitioning not just cities, but also states. This means you hop up from a street level to a city level to a state level to a national level of travel, then zoom back down to the city and street level. But you don't need to travel back to state capitals to get there, you just need to check for the best lateral connection on the tree, which would involve simply driving down the nearest road across the border.

In this example, the tree would be searched up to the national level, but then it would find the lateral pathway across the tree that lets it move at a street-to-street level or city-to-city level, rather than on a state-to-state level.

Quote from: Sunken on April 11, 2011, 12:38:20 pm

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am
Quote from: Sunken on April 10, 2011, 02:13:24 am
This hierarchical clustering is really independent of your tree-based search. Any search method can be used with it.

One has to connect the different levels though - if I'm in St. Louis on the large-scale map, where exactly am I on the city map? How do I combine the costs from both levels? But this is for another subthread, really.

You can store tree-level paths "through" the tree. This means that you don't even have to path through entire lower-level trees you are pathing through, you just store the route, and always take that route.

If you're passing through St. Louis by taking the Interstate, you don't need to look at the individual city routes at all, you just need to stay on the Interstate heading on whatever direction you are heading.
This works well if there are only a few entry points into a sub-map. There are only so many ways to enter St. Louis (on a major highway at least). But it will be worthless if there are loads and loads of entry points, such as will be the case for a general quadtree square over a DF map. And for anything larger-scale than individual rooms, which *may* have only a few doors out of it, it will be a monumental task coming up with an algorithm to decide the right way to subdivide the map into a large-scale clustering which minimizes the number of entry points.

This goes back to that hierarchical pathfinding thing I was talking about about ten pages earlier in the thread.

You can draw up all the pathways in and out of a node, but you really don't need all of them. As long as you are pathing "through" St. Louis, and not actually going to some point inside of it, it doesn't matter that there are routes through St. Louis that aren't the Interstate, as long as the Interstate is the BEST path, that's all you need to remember.

You can check for plenty of routes from one point to another, but you only need to store the best route from one exit to another exit. You can keep storing the different entry points, but these would only matter if you were looking for a specific point inside of St. Louis you wanted to reach where it might make a difference.

Quote from: Sunken on April 11, 2011, 12:51:23 pm

Quote from: NW_Kohaku on April 11, 2011, 11:12:33 am
Quote from: Sunken on April 10, 2011, 02:13:24 am
Sounds OK in principle, but how would it all be done more concretely?

Well, you just start with whatever convex cubes you wind up having from the node-creating function (that's beyond this discussion). Then you have to start stitching these pieces together.

You have two major paths to follow from there.

The first option is that you can start from geometric isolation:
This means starting with splitting the map into cubes of geographic areas, and then looking to make sure that all nodes in those cubes connect with one another, and splitting of any cubes which do not have connectivity. You then stitch together trees out of the nodes that are still in those geographic areas have connectivity, and then you build the next level of hierarchy as a way to get from one geographic area to the others.
Whoa, hold there. Let's assume without loss of generality that *all* the free space in the map is connected. What then? Just being connected can't be used as criterion for splitting space up - that would be meaningless! Space that isn't connected shouldn't be part of the same tree on *any* level!
You have to divvy space up on the basis of choke points, or maximum convex regions or some other heuristic. Which one though?

Of course they aren't part of the same tree if they aren't connected.

This isn't a problem at all - if you cannot find a connection for some reason, you just start building another tree. Only connected nodes are connected to the same tree. Connection on a tree IS the implication of connectivity on the map, after all.

If the trees get connected again, it's fairly simple to put them together - that's the great thing about trees, they're wonderfully easy to merge and disconnect and mess with connections. They are better about being dynamic than other types of data structures.

Quote from: Sunken on April 11, 2011, 12:51:23 pm

Quote
The other option is that you can start from an arbitrary point somewhere in the "middle" of the map:
This works best with the innate nature of a tree data structure, upon further thought, since it relies upon some of its basic principles, and assumes some dynamic ability to shift to changing nodes.

Pick a node, any random node on the map.

Now, you have to start connecting all its adjacent nodes to it as branches, and start building a tree.

During this process, the tree needs to "balance" itself, which is a basic function of many trees. This means that the "root node" will change to be the node with the most roughly-equidistant leaf nodes hanging off of it. (This means that you don't want a lopsided tree, if a branch becomes larger than the rest of the tree, the start of that branch becomes the new root, and the old root becomes a branch of the new root.)

Once you hit a certain point of complexity, say ten tiers on the tree structure, it is time to build the next level of hierarchy. In this, the root node becomes not only a root node for the level-1 tree that we were building before, but also the root node of the level-2 tree, which acts as the geographical equivalent of moving from a city-level to a state-level of travel. Each of those other level-1 trees becomes filled out with new root nodes for their level-1 trees, based upon the fastest connection to the root node of the level-2 tree. Each of these new level-1 trees is now a branch of the level-2 tree. You then keep on going, populating the level-1 trees until they have to simply split off, and make a new level-1 tree which they are connected to.
Could you draw a picture of that? *scratches head* Does every subtree become a single node in the L2 tree or multiple nodes? If multiple, how is the L1 tree subdivided to make L2 nodes?

I'm going to make my diagram "upside down", just for consistency in making people understand the concept of a "tree", where "higher tier" means "up", for those who aren't familiar with the customary way of dealing with trees in Computer Science.

Spoiler: large image (click to show/hide)

Anyway, it's not that much more complex than a regular tree - instead of a tree node containing a single map node's data, it contains a link to another tree (the red lines). It's just using a minor bit of recursion in the data structure.

If you understand trees at all, it's just a fairly simple extension of the concept.

Niseg · « **Reply #281 on:** April 11, 2011, 03:57:29 pm »

Quote from: Sunken on April 11, 2011, 12:21:12 pm

Nice interface.

Thanx, It could be better but it's functional (does what it supposed to do with little gimmicks).

Quote from: Sunken on April 11, 2011, 12:21:12 pm

I pressed A* and got a slightly "off" path though, so I'm wondering: what is the actual cost function you're using? one step costs 1? That would explain the path I saw but that makes several of the heuristics inadmissible.

there is currently no weights or adjustment to A* to fit DF exactly it's not really the point but I can add it along with traffic zones if I put my mind to it

The current default heuristic function is a vector magnitude sqrt(dx² +dy² ) it should be admissible i might want to Math.floor it to avoid floating point .

Quote from: Sunken on April 11, 2011, 12:21:12 pm

Also, there's another heuristic that might be relevant:
sqrt(2)*max(abs(dx),abs(dy)) + (1-sqrt(2))min(abs(dx),abs(dy))
that is, the smallest distance possible if you may only walk at 45 degree increments (which is true for dwarfs in DF, though not for projectiles).

I can easily add that one it's pretty simple to add heuristic functions to my program.

You can do a path smoothing after A* should give you a better result . I also improved path smoothing by drawing reverse lines - helps me avoid hitting walls but it's generally a work-around that seem to work .

The smoothing functions tries to shoot lines from node n to node n+2 if it has no obstructions it saves that line it then shoots to n+3 and so on until it's blocked. when a line is blocked it uses the longest line it found and replace the path with it. after that it advance to the end of that path and continue shooting vectors forward. It's not that bad in terms of complexity.

I'm thinking about corrective vector Pathing method which is a greedy method and it would probably won't work well. It will help flying creatures though. I just did something silly to see what happens when I do some spoilerific experimentation.

Spoiler (click to show/hide)

After my experiment I think Toady let A star run to worst case when there is no path . any hierarchical model would solve the problem.

And about hash tables there are two kinds: bucket list and flat array.

The advantage of bucket list is that the table doesn't need to change or be very big initially and it's easy to insert elements even on collisions. It's major disadvantage is memory thrashing . A link list structure that might be spread across wide memory areas. The array method's advantage is that it's easy to cache and you pay a lot less on collision the disadvantage is that with insertions and deletions it gets dirty with "removed" and it also has a limit. This means that sometimes you may want to to get "rehash the table" or grow it. picking a good hash is also usesful I think the preferable method is a combination of shifts and XORs (very cheap for a processor) even though a remainder function is commonly use it's very expensive in terms of processing. The size is commonly a power of 2 to take advantage of binary & (can be used as remainder for powers of two used for cyclical arrays).

So if someone wants to use a hash function they would usually prefer the complex but trusty flat array though I managed to do fine with the other variation and it gave me extremely good results (5us lookup on a 200Mhz core /33mh bus processor isn't bad). Complexity isn't everything in data structure and sometimes you can just do a linear search and it would be as fast as a hash table especially on a small N.

NW_Kohaku · « **Reply #282 on:** April 11, 2011, 07:05:58 pm »

Quote from: Starver on April 11, 2011, 01:38:25 pm

A sorted hash-table index, however, would be very good for a situation where changes are few but queries many, by binary-search methods. Each time you have to change the table, though you would have to insert each and every new item (binary-search for the pair just-above and just-below, by the search criteria being used, then shuffle the top 'half' upwards to make room) and remove each and every removal (reverse, by shuffling down the greater-thans). You could make a compromise system in which a limited number of null data points get left behind after splicing out a point so that there's a chance that a spliced-in item can sit in a handy null-point, but that does slow the searching down[3], there's always going to be the possibility that a number of shifts[4], and certainly while increasing the number of hash-points being stored there's going to be a scarcity of null-spots and you're going to be increasing the table's size just as if it wasn't a null-filled version. Never mind that you may end up storing an N point hash table in a 2N or more length hash table, which may again eat into any of the efficiencies you get.

I'm not saying that Hash Tables are bad, and as a Perlite I tend to (over?-)use them myself for convenience alone, but the purpose you're putting the hash-table towards might not be particularly conducive to the structure. There isn't a magical 'black box' function that will let you store and retrieve hash values (including references) under a hash key in a straight O(c) number of CPU ticks. You may even be optimising something by using hash lookup that was actually more optimal, despite itself, than the hash-maintenance and using process ends up being.

Sorry, I didn't remember that post. (How long ago was it? This thread has kind of gone on for a couple months...)

I guess I didn't think about the cost of updating a hash table explicitly, I just went for the fastest way to search a list of nodes...

I'll try to think of a more dynamic method of sorting and retrieving data, then.

Starver · « **Reply #283 on:** April 12, 2011, 04:30:47 am »

Quote from: NW_Kohaku on April 11, 2011, 07:05:58 pm

Sorry, I didn't remember that post. (How long ago was it? This thread has kind of gone on for a couple months...)

Well, it turns out that I mentioned it in the relatively short post http://www.bay12forums.com/smf/index.php?topic=76278.msg2132502#msg2132502 (thought I'd written a lot more about it... maybe I did and edited it down... you may not believe it, but I do frequently edit my posts down, so that they're only massively lengthy, rather than their original state of being enormously long

)

And I'd forgotten, for the moment, the linked-list version of hash-tables. Yes, saves a bit of time shuffling elements up and down in a reserved area. Which is why it's sometimes good to use assembler-level control (apart from the pain of debugging!) when it comes to trying to use one form of efficiency instead of another. Perl, as mentioned, has Hash Tables but, without looking it up, I wouldn't know what back-end methods it uses to maintain them so they could quite unsuited for the efficiencies we seek. (Not that I'm suggesting we write this stuff in Perl, that's just my own favoured prototyping language, after which I'd consider which lower-level coding approach could best implement all those handy (but often expensive!) short-cuts Perl provides.

)

Niseg · « **Reply #284 on:** April 12, 2011, 06:10:37 am »

Tell me what hash table you want and i'll find you the source code in google

any language.

the optimal hashtable is so easy I can write it down from scratch.
simple table with key as first int element of struct :

Code: [Select]

#define EMPTY NULL
int removed=0;
#define REMOVED (void *) (&removed);
int size=0;
void ** table;
init(int n)
{
int i=0;
size=1<<n;
table=malloc(size);
for (i=0 ;i<size;++i) table[i]=EMPTY;
}
#define HASHFUNCTION(a) (a) /* find something better than this and replace*/
hash(int val)
{
return (HASHFUNC(val))&(size-1);
}
int insert(void * val)
{
int i,j;
i=*((int  *) val);
i=hash(i);
for(j=0;j<size;++j)
{
if(table[i]==EMPTY|| table[i]==REMOVED)
{table[i]=val;return 1;}
i=(i+1)&(size-1);
}
return 0;
}
void *  find(int key)
{
int i,j;
i=hash(key);
for(j=0;j<size;++j)
{
if(table[i]==EMPTY)
 return NULL;
if(table[i]!=REMOVED)
return table[i];
i=(i+1)&(size-1);
}


}

I happen to remember most of the common data structures and I hope my coworkers at least understand how to use them

.

News:

Author Topic: Improving Pathfinding. (Computer sciency) (Read 65222 times)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

Sunken

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

Sunken

Re: Improving Pathfinding. (Computer sciency)

Sunken

Re: Improving Pathfinding. (Computer sciency)

Sunken

Re: Improving Pathfinding. (Computer sciency)

Sunken

Re: Improving Pathfinding. (Computer sciency)

Starver

Re: Improving Pathfinding. (Computer sciency)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

Starver

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)