Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: GPU threading in DF  (Read 1696 times)

Foxite

  • Bay Watcher
    • View Profile
GPU threading in DF
« on: March 29, 2016, 09:04:17 am »

This is just a food of thought, not a serious suggestion to make DF faster.

So I know overdoing threading can actually reduce performance to a greater degree than that it will improve when done right. But DF only makes small calculations for each creature's movement and actions. I was thinking, what if DF was programmed/modded to use each GPU core to calculate each creature's movement and actions? I know that most GPUs have core counts that run into the hundreds, sometimes even thousands, so even low-end GPUs could probably have enough cores to have each one calculate the actions for every single creature on the map. Of course, the creatures in a world can run into the millions, but since they are not done in tile-level detail, they can be calculated in a single thread on the CPU like how it's done now.

Would this improve performance?
Logged
The best way to demonstrate it to him is take a save of 40 year old fortress with 150 dwarves in it on a good sized embark with a volcano that just breached the circus and install it on his gaming rig and watch it bring his rig to its knees.

therahedwig

  • Bay Watcher
    • View Profile
    • wolthera.info
Re: GPU threading in DF
« Reply #1 on: March 29, 2016, 10:04:28 am »

GPU threading is regular multi-threading's younger, whinier and tantrum prone brother. So it's all the regular issues+decent two way communication with the gpu(which is an artform in it's own right)+GPU specific limitations.

I am saying this as a Krita(open source painting program) developer/community manager, and our filters are multi-treaded, but doing the two way communication is extremely difficult, which is something I end up having to tell to everyone who hopes to see gpu-based filters.

Maybe any of the other programming buffs here have different opinions on it.
Logged
Stonesense Grim Dark 0.2 Alternate detailed and darker tiles for stonesense. Now with all ores!

Putnam

  • Bay Watcher
  • DAT WIZARD
    • View Profile
Re: GPU threading in DF
« Reply #2 on: March 29, 2016, 10:36:58 pm »

GPU calculation is the greatest hope for the future of DF's performance by far. Graphics card performance is still improving exponentially, doubling every year or so.

That's not to say that it's practical or even that it's doable. Just that if it's the former that it's going to be big.

I don't think "small calculations" is a good characterization of each creature's actions (movement is an action), though.

Foxite

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #3 on: March 30, 2016, 02:44:34 am »

I don't think "small calculations" is a good characterization of each creature's actions (movement is an action), though.
As I understand it, pathing in DF works like this:
  • Create a straight vector from start to finish.
  • If it contains any impassable tiles, draw a straight vector from the last passable tile in the vector to nearest point around the blockage.
  • Repeat for all routes around the blockage.
  • Calculate which is the cheapest by traffic designations. Use the vector points for that route as route waypoints for the final route.
  • Repeat from step 1 until there is an unblocked path to the target.
When targeting a creature this does need to be redone every time it moves, but it doesn't look very expensive in resources.

Of course I might be horribly wrong. But this is my best guess from what I usually see it do. It doesn't care about ice, it doesn't care about dangerous creatures in the way, it only cares about the shortest (aka least traffic cost) path and nothing else.
Logged
The best way to demonstrate it to him is take a save of 40 year old fortress with 150 dwarves in it on a good sized embark with a volcano that just breached the circus and install it on his gaming rig and watch it bring his rig to its knees.

Willfor

  • Bay Watcher
  • The great magmaman adventurer. I do it for hugs.
    • View Profile
Re: GPU threading in DF
« Reply #4 on: March 30, 2016, 03:01:17 am »

Actually, DF's pathfinding is a Dijkstra flood-fill-from-position from the last time I heard. I mean, it probably has a few optimizations, but there's only so much you can do to cut it down when you're using a floodfill pathfinding method.
Logged
In the wells of livestock vans with shells and garden sands /
Iron mixed with oxygen as per the laws of chemistry and chance /
A shape was roughly human, it was only roughly human /
Apparition eyes / Apparition eyes / Knock, apparition, knock / Eyes, apparition eyes /

Thief^

  • Bay Watcher
  • Official crazy person
    • View Profile
Re: GPU threading in DF
« Reply #5 on: March 30, 2016, 04:56:59 am »

It's A*.

It's like Dijkstra but it floods tiles towards the destination first, and then floods outwards around obstacles. The worst case (if the target is completely inaccessible) is a full Dijkstra flood fill, but DF has an optimisation to remove totally unconnected areas so that should never happen.

But back to the original proposal - it has the same problem as most attempts to "theoretically" multithread DF: all the entity updates currently expect to be looking at the actual current world-state, but when you multithread (especially super-wide like on a GPU) generally the best you get is looking at the previous frame's state. This would require rewriting most entity update code.
« Last Edit: March 30, 2016, 05:00:07 am by Thief^ »
Logged
Dwarven blood types are not A, B, AB, O but Ale, Wine, Beer, Rum, Whisky and so forth.
It's not an embark so much as seven dwarves having a simultaneous strange mood and going off to build an artifact fortress that menaces with spikes of awesome and hanging rings of death.

Miuramir

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #6 on: March 30, 2016, 02:57:07 pm »

It's A*.

* We know from previous posts that DF uses a modified version of A* (usually pronounced "A star"). 

* We know that DF has (per run) static weighting enabled, because that is what the traffic designations actually do, they provide weights to the A* routing.  Understanding how this actually work is a big step in path-optimizing a fortress. 

* We know that DF keeps a cache of sorts that "colors in" the map based on what parts can be accessed from others; the primary purpose of this is to only call A* when it is highly likely that it will succeed, as A* can waste an enormous amount of time trying to find a route if there isn't one. 

* We know that at one point, DF used "street distance" (presumably a reference to Manhattan distance) for some things including local dwarf pathing, and a SQRT(2) approximation of 362/256 for some adventure mode travel situations. 

* We know that in at least some situations, the first "step" creatures take is basically free, and then the game figures out when a figure's next step can be taken based on various factors. 

* We know that minecarts move in a more complicated fashion, which is arguably more precise (sub-square positioning with a far amount of precision). 

* We suspect that it actually searches via A* from the destination back toward the target, or did at one time.  In other words, when a dwarf ends up with a job to pull a lever, we think it's actually starting from the lever to find a path back toward the dwarf.  In a stereotypical fortress where the dwarves tend to hang out more or less in the middle, and the materials and jobs tend to be out on the edges, this makes a lot of sense from an optimization standpoint. 

Search direction for A* and friends matters for Star or Tree topology; you want to search from the points / spokes / leaves back toward the center.  Line, Bus, Ring, and Fully Connected / Mesh topologies don't really care, and Hybrid / Mesh is unpredictable.  So in DF it makes for slightly more efficiency if you have your dwarves mostly hang out in the middle, with stockpiles and workshops around them, rather than the other way around. 

Amit's pages on pathfinding are a great introduction; if you allow them to run scripting, they have "live" examples you can play with. 

Visual Pathfinding is also a great resource, although it is much simpler than DF. 

There is some research on A* on GPU (mostly CUDA) out there; I've not had a chance to read up on it. 

In one sense, what should have been the biggest optimization to pathfinding was the introduction of minecarts; much like mass transit (trains on tracks) takes people off the roads and therefore makes commutes better even for those not using them.  However, comparatively few forts seem to be using minecarts to truly "industrialize" their production. 
Logged

Kumquat

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #7 on: April 01, 2016, 01:23:06 pm »

A* and pathfinding in general are typically sequential processes that do not translate quite that well onto GPU. By sequential process I mean things where each step is dependent on the previous step. Certainly if you need to do hundreds or thousands of pathfinding tasks simultaneously then GPU could be something worth considering. If you do a heavy preprocessing step such as all-pairs shortest path for the whole shebang then GPU would be a more viable thing to consider.

However things where you have a relatively simple step with lots of parallel data that needs to be processed is where GPU processing shines. It would make more sense to use GPU for weather and temperature simulation rather than pathfinding. Those are the sort of massively parallel single-step simulations that GPUs shine at.
Logged

Kirkegaard

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #8 on: April 01, 2016, 05:11:01 pm »

Would GPU be a really good choice (in an ideal world, with unlimited programmer time) to do stuff updating temperatures on all items?
That is a lot of really simple calculations, I would guess?
Logged

Putnam

  • Bay Watcher
  • DAT WIZARD
    • View Profile
Re: GPU threading in DF
« Reply #9 on: April 01, 2016, 05:13:28 pm »

At the very least, it probably won't be altogether that complex. The calculation is probably a simple calculation regarding ambient temperature of the tile that the item is on, the volume of the item and the item's specific heat capacity (surface area isn't taken into account mostly because surface area literally isn't taken into account by items in the game at all)

NW_Kohaku

  • Bay Watcher
  • [ETHIC:SCIENCE_FOR_FUN: REQUIRED]
    • View Profile
Re: GPU threading in DF
« Reply #10 on: April 02, 2016, 09:31:19 am »

Actually, the overwhelming majority of temperature checks are just "Has anything changed this tick? Nope? OK, then, carry on... NEXT!"  Keep in mind that most items are underground where temperature is a constant anytime magma isn't flowing over them, anyway.

A large portion of the game's calculations are the responsibility of various "checks" on all the sundry "items" in the game.  Several of us did SCIENTIFIC testing on the subject a few years ago, see: Operation FPS Bomb. When you have a sufficiently large number of items in the game (possibly enough to start going outside memory cache?) then the game starts slowing down significantly, and because of this, "garbage disposal" and maintaining sane stockpile levels are critical for maintaining FPS.

Other problems are that the modified A* (although it should be noted that it is modified mainly in that standard A* only handles 2d pathfinding) just doesn't handle large open spaces too well, even if they "don't go anywhere". This is mainly because the modified A* has extremely little clue how to find its way through 3d mazes, which are less "obvious" than paths through 2d mazes.  Methods like a Vector Pathfinding, or even simple addition of Jump Point Search to A* would be of assistance, by allowing the simplification of large open areas with minimal additional overhead, but Toady has been loathe to dig into alternate pathfinding methods until he comes across problems current pathfinding just can't handle. (Like, say, multi-tile creatures...)

This brings us to the third in the big three FPS killers, connectivity map redraws.  The connectivity map is derived from the real map, doing things like declaring any tiles with more than 4/7 water or any magma to be "walls" (actually somewhat clever, because it only has to check two bits) for connectivity purposes, and is used for the modified A* pathfinding.  (This is why fliers and swimmers have entirely different, floodfill-based pathfinding, since they cannot rely upon the same connectivity map, and Toady considers those movement types rare enough not to be worth the added lag of making more connectivity maps.) The problem is, these connectivity maps are only effective time-saving measures in a static map, and DF does not have a static map.  Fluid motion easily "breaks" the map through dropping water levels below 4/7 then rising back up again, or having magma flow, evaporate, then flow again, etc.  The connectivity map can also be broken any time a drawbridge is activated, or other pieces of fortress move, blocking traffic. Once broken, the connectivity map has to be redrawn, and considering the sizable, apparent-to-the-player amount of lag simple drawbridge raisings and lowerings cause, this requires a full-map redraw that produces significant lag if you have continuously moving fluids.

Now then, I'm not educated on GPU threading enough to say whether any of those bottlenecks are significantly faster with GPU threading. I can, however, say that there are some general optimizations that could be made just through rethinking DF's item checks to use schedulers and/or checks to items only when environmental changes are taking place, as well as changing over to the previously mentioned vector-style pathfinding that would obviate the need for connectivity maps in the first place and simplify calculations of large open fields or rooms.

If you want to have a faster computer for DF, incidentally, get a computer with as large a memory cache as possible, and as fast a memory access speed as possible.  CPU power doesn't really matter much, because most of what the game winds up slowing down upon tends to be memory cache misses, and waiting upon access to memory.  What lags DF are really, really simple calculations that are made randomly upon data tables too large to be cached effectively.
Logged
Personally, I like [DF] because after climbing the damned learning cliff, I'm too elitist to consider not liking it.
"And no Frankenstein-esque body part stitching?"
"Not yet"

Improved Farming
Class Warfare

NW_Kohaku

  • Bay Watcher
  • [ETHIC:SCIENCE_FOR_FUN: REQUIRED]
    • View Profile
Re: GPU threading in DF
« Reply #11 on: April 02, 2016, 10:00:43 am »

Also...

* We know that minecarts move in a more complicated fashion, which is arguably more precise (sub-square positioning with a far amount of precision).

Actually, no, minecart pathing is MUCH simpler. Minecart movement is perhaps unintuitive, and has recorded velocity checked against a "completion percentage" of a move through a tile, but it's much, much simpler from a calculation standpoint.

Guided minecarts (that is, pulled by a dwarf) use completely different rules from unguided minecarts.  Guided minecarts are pulled by dwarves across rails following their rails on the shortest route to their destination. 

I have no direct evidence of this, but an educated guess would say that minecarts use a similar form of A* to standard movement, but where connectivity is determined by the location of physical rails.  (I say this because Toady One is a programmer who has only learned as he has needed, and prefers to avoid complex subjects outside his areas of expertise like pathfinding unless absolutely necessary.  This is pretty much why he sticks to A* - it's what he understands and has experience with, and isn't too keen on dipping into new things he has no experience with.) Since rails are restricted to much fewer possible connections per tile, and there will always be less tiles with rails than open, potentially pathable floor/stairs/furniture/etc, then rail pathfinding will always be simpler than standard creature pathfinding.

Unguided minecarts have the simplest pathfinding imaginable in that it does no pathfinding at all; They have a velocity, and it carries them in a given direction until they collide with something or they have their velocity altered.  They are not making "intelligent pathfinding", they just follow what the terrain forces upon them once they actually get there, and as such, it's much simpler.
« Last Edit: April 02, 2016, 10:03:33 am by NW_Kohaku »
Logged
Personally, I like [DF] because after climbing the damned learning cliff, I'm too elitist to consider not liking it.
"And no Frankenstein-esque body part stitching?"
"Not yet"

Improved Farming
Class Warfare

Kumquat

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #12 on: April 03, 2016, 08:49:40 am »


In any case moving some large but trivial calculations to the GPU would be pointless since transferring the data over PCIe bus takes much longer than actually doing the calculations. Only if you can leave the data there and then do a fairly small query about it, like, 'is it going to rain over the fort today' then it could actually help a little.

People who are actually interested in software optimization on the CPU should look up Mike Acton's presentations on the subject. Most suggestions on these forums are rather silly and/or just beating the same horses over and over.
Logged

xpi0t0s

  • Bay Watcher
    • View Profile
Re: GPU threading in DF
« Reply #13 on: April 03, 2016, 06:00:34 pm »

Not sure if dwarves do recalculate on every move, cos I'm sure I've seen, after building a wall that blocks a certain route, dwarves go up to the wall while on their way somewhere, briefly flash up a question mark then reroute via a non-blocked path.  This suggests to me that they calculate the route once at the start, then follow it until they can't follow it any more, and only recalculate if it becomes necessary.
Logged