Topic: Multhreading Dwarf fortress (Read 13460 times)

Sordid · « **Reply #15 on:** March 29, 2011, 05:51:54 am »

Quote from: Capntastic on March 29, 2011, 05:03:44 am

Quote from: Sordid on March 29, 2011, 04:59:17 am
And he's going to have to do it eventually.

This isn't actually true.

Well okay, maybe not strictly true. If he wanted to, he could just stop developing DF altogether, so he doesn't have to do anything. But if he does want to continue developing it, then keeping it single-thread greatly limits what he can do. Multi-threading has a huge potential in terms of performance even on today's chips with a measly four cores. In a few years, we're going to have CPUs with dozens of cores. And he's going to keep using only one, he's going to let all that performance go to waste because he can't be bothered to implement multi-threading? Yes, it would be a lot of work, but the potential gains are huge. I think it would be worth it.

Niseg · « **Reply #16 on:** March 29, 2011, 05:58:10 am »

If Toady had 1$ for every time someone said "multi-something" ...

I don't think it's worth the effort unless you want to add GPGPU which would improve processing by 1000%+ . Throwing processing tasks to work threads won't be that tough to implement but you have to have some kind of synchronization and have to worry about memory protection etc.

Debugging multi-threaded code on a multiprocessor machine is not easy. I'm working on a platform with a DSP where we offload some of the complex tasks to it but data size is fairly small but require huge amounts of processing(audio if you don't get the hint

).

In my opinion the best way to improve performance is improving algorithms . As someone mentioned pathing is extremely heavy it has a worst case of k^n . here is an image In an article on the subject:

Spoiler (click to show/hide)

If you are wondering which one to look at it's the Cyan 2^N. The N(x) scale also has an exponential scale so it looks like a linear time .If want to understand the perspective think about the common powers of 2 in our life 2^10 is 1024 (1k) 1024^2 about a million(1M) 2^30 is close to a billion (1G) and 2^40 is close to a trillion (1T ). The algorithm is still bounded by the embark size in the case of 4X4 the worst case is about 3-7 milion nodes.

We have a big thread on the topic of path finding and I'm hoping my contribution would help progress it to something real. My suggestion is to turn the big problem into a lot of smaller problems so instead looking at k^(w*n) we'll have a n*k^w. There was a talk about path caching but I haven't seen many implementation suggestions .

I think that if you don't show Toady a system he can mimic in DF with little effort it won't find its way to the game. Multithreading is a big general term . Just avoid flying creatures and stick to small embarks for now if you want performance . You can just get the latest single cpu performer and then overclock it. Avoid flying creatures if you can they tend to be big n pathing if their path is blocked

. I also noticed temperatures are killers but i didn't look at flows yet.

PS . this thread is gaining speed faster than I can post.

sockless · « **Reply #17 on:** March 29, 2011, 06:15:48 am »

I think the best option is to desyncronise. By that, I mean doing path finding on the fly, or at least to an extent, instead of slaving it to the framerate.

Temperature could also be done once every 2 ticks, which would theoretically half the amount of time spend on temperature checking. In fact, the entire temperature thing needs a rewrite into something more efficient, where it uses sets and subsets of heat as compared to each tile having it's own temperature.

Sordid · « **Reply #18 on:** March 29, 2011, 06:19:29 am »

Quote from: Niseg on March 29, 2011, 05:58:10 am

Just avoid flying creatures and stick to small embarks for now if you want performance.

Well that's exactly what I was talking about in terms of limitations. Yes, optimizing is a great way to get better performance, but it can only take you so far, eventually you're going to hit the limit of what the single core can do, and at that point you have to limit the scope of the game. You have to limit the number of creatures, the size of the map, all that stuff. I'm not saying that Toady shouldn't optimize, of course he should. But he shouldn't just optimize. There's all this processing power sitting unused, I think it should be taken advantage of.

xordae · « **Reply #19 on:** March 29, 2011, 09:08:08 am »

"Not worth the effort" is a weird choice of words when fortresses hit a FPS brick wall with more than X number of dwarves. Some kind of improvement needs to be made at some point down the road. Whether that is cleaning up the code / finding faster code where possible or making the game multithreaded.

Toady is not required to do anything, neither are we entitled to anything. But look at the scope of the game and the scope it is moving towards. It's not getting any smaller or simpler. If you're not a masochist, you want code improvements or multithreading. Or continue to have it sit at 25% processor usage while your fortress dies yet another FPS death.

And I don't believe DF 2 is something he'd want to do. This here is the big dog and it's not even close to done.

NW_Kohaku · « **Reply #20 on:** March 29, 2011, 11:20:40 am »

"Not worth the effort" is exactly what it is, however... Multi-threading, at an absolute best case scenario would double or quadruple speed. It wouldn't actually do this for a wide range of reasons, not the least of which being that the real bottleneck in this game is simply loading massive amounts of data from RAM, making the real hindrence to the game your RAM's access speed, not your CPU, anyway.

Meanwhile, to help drive Niseg's point home, "merely" making pathfinding a more efficient O(N^2) would potentially have speed boosts of making the process that takes up 80% of the game's processing time potentially 10 to 100 times the current speed for a given size or complexity of a fortress.

You are comparing linear growth to geometric. In a large and complex enough situation (and DF is nothing if not large and complex), geometric growth dwarfs (no pun intended) linear growth every time.

Niseg · « **Reply #21 on:** March 29, 2011, 11:38:10 am »

Quote from: xordae on March 29, 2011, 09:08:08 am

"Not worth the effort" is a weird choice of words when fortresses hit a FPS brick wall with more than X number of dwarves.

Adding another processor or two to the mix will not give you enough performance to cover for the fact that some algorithms are extremely inefficient especially for larger Ns. It's better to invest in reducing the complexity of those algorithms than to invest in making them run faster .

It would be a large amount of effort to add multithreading /multiprocessor support for someone that had no experience in that . The gains you'll get from it is at most a factor of 4 if you consider there are no other bottlenecks other than processing time.

Improving an algorithm by limiting their runtime would give a lot of improvement by reducing cpu usage . This will allow you to run DF on laptops without killing it's battery .

note: I wrote this 2 hours ago

Sordid · « **Reply #22 on:** March 29, 2011, 11:58:11 am »

Quote from: Niseg on March 29, 2011, 11:38:10 am

It would be a large amount of effort to add multithreading /multiprocessor support for someone that had no experience in that.

True, but then I also think Toady should get some help with his coding. DF is an absolutely insane project to tackle single-handedly, and I respect Toady hugely for having the balls to do it. But imagine how much more he could get done with a couple more guys on board, and I'm sure there are at least a few players who would be willing to contribute their time and expertise.

Sunday · « **Reply #23 on:** March 29, 2011, 12:13:35 pm »

Everything in this thread has already been discussed ad nauseum.

Anyway, with a 3-4 weeks of bug-fixing, Toady increased FPS on my comp (at least by .24; haven't tried .25 yet) at least by 30 (was getting around 65-70 FPS with around 100 dwarfs; right now I'm at 109 dwarfs, and am still getting 100 FPS). If there are more optimizations he can do along those lines (e.g. as NW_Kohaku says, pathfinding), then the question of multithreading will be moot. Multithreading is only worth it if it increases FPS. If FPS is acceptable without multithreading, then there's no reason to do it.

Sordid · « **Reply #24 on:** March 29, 2011, 01:32:20 pm »

Quote from: Sunday on March 29, 2011, 12:13:35 pm

Everything in this thread has already been discussed ad nauseum.

Anyway, with a 3-4 weeks of bug-fixing, Toady increased FPS on my comp (at least by .24; haven't tried .25 yet) at least by 30 (was getting around 65-70 FPS with around 100 dwarfs; right now I'm at 109 dwarfs, and am still getting 100 FPS). If there are more optimizations he can do along those lines (e.g. as NW_Kohaku says, pathfinding), then the question of multithreading will be moot. Multithreading is only worth it if it increases FPS. If FPS is acceptable without multithreading, then there's no reason to do it.

Speaking of things already discussed, fps largely depends on the amount of creatures you have. The problem with DF is that fps never really becomes acceptable, because you can always upscale. Sure, with a hundred dwarves your fort runs well. But you can run it with two hundred, five hundred, a thousand dwarves. You can do a bigger embark. DF has those options, but they're not practical because your computer will choke on them. And even if you optimize the hell out of it, eventually you're going to hit a point where you're again limited by the available processing power... while your other three or five or seven cores are sitting idle.

Maklak · « **Reply #25 on:** March 29, 2011, 03:06:12 pm »

@Niseq:
Huh???

Dwarf Fortress uses A* pathfinding, which is polynominal, not exponential in data size (same for Dijkstra's algorithm). For 3 dimensions it should be something like O(distance³).

Thing is, one doesn't need exponential algorithm to choke a computer, even cubical or square algorithms are slow for large enough data sets, and done often enough. One part of the problem is that so many creatures path, and then repath on meeting an obstacle. Some of them are also rumored to path every frame, fail, and path again. This throws another layer of complexity at pathfinding. Improvements may be in how often / in what circumstances creatures path, the pathfinding itself, and pathfinding in paralell (which would interfere with liquid flow for example). It is not a simple problem, and Toady said he is looking into it on many ocasions.

While multithreading (more than just display) would likely benefit FPS, improvement wouldn't even be close to the number of cores. Someone already mentioned, DF is heavilly dependent on memory speed, therefore it is also cache-bound, and that's a bad thing for parallelization. Considering time needed to do it, and increased difficulty in debugging and adding more stuff, it may be not worth it, when there are other possible and easier speedhacks (such as tweaking pathfinding or decreasing overall number of calls to pathfinding function). Anyway, fork / join on some phases of a frame seems like the most feasible way to use multithreading for this game.

Uristocrat · « **Reply #26 on:** March 29, 2011, 04:17:17 pm »

Quote from: NW_Kohaku on March 29, 2011, 11:20:40 am

"Not worth the effort" is exactly what it is, however... Multi-threading, at an absolute best case scenario would double or quadruple speed. It wouldn't actually do this for a wide range of reasons, not the least of which being that the real bottleneck in this game is simply loading massive amounts of data from RAM, making the real hindrence to the game your RAM's access speed, not your CPU, anyway.

That reminds me: how much FPS improvement can one get out of keeping DF (and one's swap partition) on an SSD?

kaenneth · « **Reply #27 on:** March 29, 2011, 08:42:50 pm »

Multithreading is great for running independant tasks concurrently; such as graphics only particle effects, audio streams, or other tasks that don't require syncronization. Basically, the overhead of splitting a fine grained task into multiple theads can easily overwhelm the gain of running on multiple cores; but anything coarsely grained enough to split off from DF's main game loop probably won't have a noticable affect on FPS; aside from rendering and audio.

I'm working on a little Tower Defense game, and was stress testing the projectile system; with the max number of towers with the max rate of fire, with the slowest projectiles; leading to thousands of bullets on the screen; bullets were drawn with direct memory writes to a raw bitmap, that would then be blitted over the battlefield; the projectile data was read from an array of structs.

I split the projectile processing into 4 threads (as I have a quad core... the code was configurable); each thread would process projectiles with an index that matched it's ID when modulused(sp?) by it's ID, so thread 0 would do 0,4,8,12,16... thread 1 would do 1,5,9,13,17... and so on. the performance on a single core appeared fairly linear; and there was no direct interaction between projectiles, they only checked if they were due to expire, updated their position, checked for collision with a pre-culled selection of targets, and wrote their sprite to a calculated position in memory.

Even with no 'lock'ing, and no danger of the the threads corrupting each others state, the multithreading version was universally slower; I believe because each write invalidated the memory of it's neighbors that shared cache lines, and the cache lines of the bitmap. It may still be workable if I figure out how to align the data used by each thread with CPU cache line sizes; but those may vary depending on CPU model...

(while describing this, I came up with an idea for minimizing memory writes that I'll have to try later...)

Niseg · « **Reply #28 on:** March 30, 2011, 03:39:56 am »

Quote from: Maklak on March 29, 2011, 03:06:12 pm

@Niseq:
Huh???

Dwarf Fortress uses A* pathfinding, which is polynominal, not exponential in data size (same for Dijkstra's algorithm). For 3 dimensions it should be something like O(distance³).

I might be wrong about the complexity but my approach is bisection which try to avoid worst cases by keeping the problem limited to small Ns. If N= Wx and you claim complexity (W*x)³= W³*x³ . I want to turn it into x problems with a complexity W^3 or total complexity of (x+1)* W^3 with a small W you won't reach huge complexity and it would also spread the processing time between frames which would also save the huge repath penalty.

Quote from: Uristocrat on March 29, 2011, 04:17:17 pm

That reminds me: how much FPS improvement can one get out of keeping DF (and one's swap partition) on an SSD?

Won't help anything with performance other than load/save performance if it's a fast SSD. I've already tested it. The FPS improvement is related dirrectly to how fast your CPU is. I kept loading the same game with different overclocks while keeping FSB constant and got almost 1:1 ratio between the CPU speed and FPS .

Quote from: kaenneth on March 29, 2011, 08:42:50 pm

I split the projectile processing into 4 threads (as I have a quad core... the code was configurable); each thread would process projectiles with an index that matched it's ID when modulused(sp?) by it's ID, so thread 0 would do 0,4,8,12,16... thread 1 would do 1,5,9,13,17... and so on. the performance on a single core appeared fairly linear; and there was no direct interaction between projectiles, they only checked if they were due to expire, updated their position, checked for collision with a pre-culled selection of targets, and wrote their sprite to a calculated position in memory.

Even with no 'lock'ing, and no danger of the the threads corrupting each others state, the multithreading version was universally slower; I believe because each write invalidated the memory of it's neighbors that shared cache lines, and the cache lines of the bitmap. It may still be workable if I figure out how to align the data used by each thread with CPU cache line sizes; but those may vary depending on CPU model...

Your problem is the "interleaving " you should split the array of projectiles to N equal sized parts and process it like that. I think the post processing rendering output should either be post process or having 4 separate outputs and then merging them. The processors usually communicate through memory and using a common memory space may cause too many memory accesses

ZeroGravitas · « **Reply #29 on:** March 30, 2011, 12:14:14 pm »

Quote from: Capntastic on March 29, 2011, 05:03:44 am

Quote from: Sordid on March 29, 2011, 04:59:17 am
And he's going to have to do it eventually.

This isn't actually true. And I myself personally agree that multithreading would be a good thing.

You don't think so?

More and more people are going to have multi-core CPUs, and the individual cores in those CPUs will be getting slower and slower. DF is going to run worse and worse - in particular, the pathfinding is going to get slower and slower, not faster.

News:

Author Topic: Multhreading Dwarf fortress (Read 13460 times)

Sordid

Re: Multhreading Dwarf fortress

Niseg

Re: Multhreading Dwarf fortress

sockless

Re: Multhreading Dwarf fortress

Sordid

Re: Multhreading Dwarf fortress

xordae

Re: Multhreading Dwarf fortress

NW_Kohaku

Re: Multhreading Dwarf fortress

Niseg

Re: Multhreading Dwarf fortress

Sordid

Re: Multhreading Dwarf fortress

Sunday

Re: Multhreading Dwarf fortress

Sordid

Re: Multhreading Dwarf fortress

Maklak

Re: Multhreading Dwarf fortress

Uristocrat

Re: Multhreading Dwarf fortress

kaenneth

Re: Multhreading Dwarf fortress

Niseg

Re: Multhreading Dwarf fortress

ZeroGravitas

Re: Multhreading Dwarf fortress