I sort of see before me a road system where all traffic between Pasadena and Seattle is routed through St. Louis...
I don't quite follow how the "duplicate-branches" algorithm works. What precisely are you storing for each node in addition to the simpler version? Which stores, if I understand it correctly, a list of all nodes above it and a pointer to its immediate successors.
(Your use of up and down confused me before; I usually have parent nodes above the child nodes but at least I get you now)
This is why we then do this part:
In the first method, (let's use Wikipedia's page on trees for an illustration), let's assume that we found a path in that tree from the "5" node on the bottom left to the "9" node on the right. This would return a path of "5->6->7->2->5->9", but at the same time, 7 and 5 are connected, so you can cut out the path with 2 inside of it.
When making that original chain of nodes, you basically are only looking for going up and down the tree, not laterally - you go "down" the tree ("down" being towards the roots) until you find a route directly up the tree you can follow. To do this, you have a list of all the points "up the tree" from a given node. Once you have that chain, however, you can look for lateral shortcuts by simply comparing all the connections (up, down, or lateral on the tree) that allow a given node to bypass one or more steps in the chain.
This method has a complexity of an additional O(n^2/2) where n is the length of the chain you returned previously. It winds up with some extra memory calls, but should still be much faster than A*-ing the entire fort.
This will basically return you the previous path, but with a few shortcuts taken where they are found.
This would mean that, you would find a route that goes through St. Louis, and then you search for more direct paths by cutting out whatever "middlemen" you can, resulting in a much more direct route.
I don't know enough about the complexity of hashes but won't the worst-case bite you occasionally? Anyway, you may be right on this one. Again, you could do without any hash table if the contents of a node were implicit (like with a quadtree, which is pretty much what Draco brought up before).
Hashes are very simple. If you aren't familiar with a hash, the idea is that, instead of building a linked list that you have to traverse down the list one by one, having to ask each item on the list where the next item on the list is, you generate a hash, which acts as a table of contents on the data. The hash table is an array with specific addresses (think of these as drawers in a filling cabinet), and objects get filed into the hash list based upon the number that comes up when you hash their reference data, the data you are searching for.
Hashing just means converting the data back into numbers, and adding and multiplying those numbers by a set amount so that you get a relatively unique integer number that can be compared to any other integer, no matter what sorts of data were originally thrown into the hash.
When you have a piece of data you want to throw into the hash table, you hash the reference data, the data you would search for, and then throw it in the resulting "drawer" of the hash table. When you want to look for that data again, you take the data you are searching for, hash that, and it will come up with the number of the hash table "drawer" you want to look for.
Hash tables are a way of looking for data in a list very quickly. They actually make the act of searching take up theoretically O(n/x) long, where x is the size of the hash table, and n is the size of the amount of data you are searching for. Technically, "x" should always be larger than n (how much larger depends on if you have linked lists off of the "filing cabinets" as well for overflow), so it really just comes down to O(1), which is the
ideal complexity level - no matter what size the table is, it takes the same amount of time to find what you are looking for. (As compared to a linked list taking O(n), where if the data you are looking for is at the end of the list, you have to go through every single item on the list in order before you can find the one thing you are looking for.)
These "shortest of all cases" is literally something like "Urist, it's three feet to your right. You can see it. Just walk straight there." Something like that is so trivial, even with the loading of checks to see what zip code the target is in, you can do those all day long without a noticeable drop in FPS.
This hierarchical clustering is really independent of your tree-based search. Any search method can be used with it.
One has to connect the different levels though - if I'm in St. Louis on the large-scale map, where exactly am I on the city map? How do I combine the costs from both levels? But this is for another subthread, really.
You can store tree-level paths "through" the tree. This means that you don't even have to path through entire lower-level trees you are pathing through, you just store the route, and always take that route.
If you're passing through St. Louis by taking the Interstate, you don't need to look at the individual city routes at all, you just need to stay on the Interstate heading on whatever direction you are heading.
You can have fairly simple nodes that are conglomerated together on the next level of abstraction so that you have an area consisting of no more than some arbitrary number of nodes or some arbitrary amount of area where you can ensure relatively easy connectivity between that area's nodes, and then make that area of nodes the "city" which is the smallest unit in the tree data structure - hence, conglomerating 100 nodes into a "node area" that occupies one large-scale tree node.
The starting point and destination would then compare whether they are in the same node area or not, and if they are in the same area, you wouldn't have to use the large-scale tree at all.
You could work the small-scale area into being a tree configuration, as well. You simply need to make a list of known connections to the other areas, and pare them down to the ideal candidates. In a collection of smaller-scale nodes, where everything is conglomerated based upon being in proximity to one another, you will actually see your "A" concern a little less prominent, as well, since you can simply build the closest thing to a fractal "star" that you can get out of the nodes to make your small-scale tree.
You can also throw as many levels of abstraction into this as you ever deem necessary at that point - you can simply have some arbitrary cut-off of x-number of nodes in a tree that forces the jump to a new level of abstraction, allowing for arbitrary levels of abstraction, although this would make the address of each node become more complex, and require a vector to store it, rather than set data types.
Sounds OK in principle, but how would it all be done more concretely?
Well, you just start with whatever convex cubes you wind up having from the node-creating function (that's beyond this discussion). Then you have to start stitching these pieces together.
You have two major paths to follow from there.
The first option is that you can start from geometric isolation:
This means starting with splitting the map into cubes of geographic areas, and then looking to make sure that all nodes in those cubes connect with one another, and splitting of any cubes which do not have connectivity. You then stitch together trees out of the nodes that are still in those geographic areas have connectivity, and then you build the next level of hierarchy as a way to get from one geographic area to the others.
The other option is that you can start from an arbitrary point somewhere in the "middle" of the map:
This works best with the innate nature of a tree data structure, upon further thought, since it relies upon some of its basic principles, and assumes some dynamic ability to shift to changing nodes.
Pick a node, any random node on the map.
Now, you have to start connecting all its adjacent nodes to it as branches, and start building a tree.
During this process, the tree needs to "balance" itself, which is a basic function of many trees. This means that the "root node" will change to be the node with the most roughly-equidistant leaf nodes hanging off of it. (This means that you don't want a lopsided tree, if a branch becomes larger than the rest of the tree, the start of that branch becomes the new root, and the old root becomes a branch of the new root.)
Once you hit a certain point of complexity, say ten tiers on the tree structure, it is time to build the next level of hierarchy. In this, the root node becomes not only a root node for the level-1 tree that we were building before, but also the root node of the level-2 tree, which acts as the geographical equivalent of moving from a city-level to a state-level of travel. Each of those other level-1 trees becomes filled out with new root nodes for their level-1 trees, based upon the fastest connection to the root node of the level-2 tree. Each of these new level-1 trees is now a branch of the level-2 tree. You then keep on going, populating the level-1 trees until they have to simply split off, and make a new level-1 tree which they are connected to.
Eventually, you wind up building the whole tree by flood-filling the map of nodes until you have connected every node that has a connection.
If the level-2 tree becomes complex enough, it, too, would split into a level-3 tree. You can make this sort of method have theoretically infinite levels.