Topic: Securing DF in case of unfortunate event? (Read 22723 times)

catpaw · « **Reply #45 on:** May 07, 2014, 02:24:47 pm »

Honestly as coder, I personally hate classical multi-threaded code. Simply because its a nightmare to debug when you have a race condition, a lock up, or randomly destroyed memory due to a failed mutex. Around the millennium when it was the cool thing to do, I also did some multi-threaded stuff, nowadays I avoid it like hell.

Its correct that many things that take some process load, like path finding, can be computer in parallel with classical multithreading from one tick to the next. However, you'll need one master thread to collect all computations and execute all the changes on the main data tree which then will become a bottle neck. Otherwise if you allow all threads to apply changes on the main data tree, welcome to race condition and mutex nightmare. With mutexing you have the classic problem, make a few, and you'll gain little benefit from multithreading since one thread will hold large pieces of the tree while others wait for it, or go fine grained, and lose a lot of the speed benefits on the load created by mutexes.

I agree that it is impossible to add something like this on an existing code base. If it would have been to be planned from start. Anyway, I agree with putnam. Maybe you get +100% or so speed. But then you'll hit a limit with classical multi-threading, no matter how many cores you go.

I didn't follow CPU-developments of late, but I got the idea that also here, the multi-coreing hits a limit, since unless you're creating something like a VirtualMachine host for lots of VMs, many have learned, you can only take useful advantage with a few cores, while more hardly get anything new. I think we have to accept we're hitting a limit not only on linear speed, but also on parallelism with core in Van-Neumann design.

One thing that might be big is being able to utilize the GPU for DF calculation. I googled a bit, there are people that use GPU for path finding. BTW: Even on supercomputers, most of the top hundret nowadays go vector-based, GPU. There is less and less classical calculating in the upper area.

If I'd had the time to design DF from base, I'd go multiprocess with a database design. Have one main process (the database) that holds the world but does not compute anything, and a few processes for stuff that communicate with the main process via a socket what they'd like to change from tick to tick and getting the changes from the world as answer. Creatures, stuff and areas can be assigned to processes.

GavJ · « **Reply #46 on:** May 07, 2014, 02:31:28 pm »

Quote

However, you'll need one master thread to collect all computations and execute all the changes on the main data tree which then will become a bottle neck.

Yes, it has to wait for whatever the slowest thread is. But if made up data:
Pathing = 50% of calculations
Temperature = 40%
Combat = 5%
trajectory and vectors = 5%

Then add on 5% as long to collate all of them in a main thread at the end and apply final movements, even a dual core processor can now still finish the tick in 55% of the total time it did before. Not much more, ever, though.

Although hm... pathfinding doesn't really have to be on one thread, does it? Nor does all of temperature, really, does it? Couldn't you assign 1/4 of all creatures to one thread (or even each creature to its own), and some number of tiles to threads for temperature? If you only have a dual core, that wouldn't add any extra speed in the above example. But if you had quad core, it would now only take 30% as long as the vanilla tick (pathfinding a temp. both sliced and diced optimally, all cores running for 25% of the normal time + 5% for collating at the end). If you had 8 cores, it would take 17.5% (12.5% as long, all 8 cores maxxed out, + 5% collating) as long as a vanilla tick... Ignoring operating system needs for now.

How much of a benefit you can actually see depends on how fine-grained you can make the thread cutoffs. And why couldn't it be potentially as fine-grained as individual creatures for pathfinding, for instance?

In a future with 16 core processors and greater than 16 creatures in a fortress, you might very potentially see 1,000% improvements in FPS!

Putnam · « **Reply #47 on:** May 07, 2014, 02:34:50 pm »

All of the pathfinding is going to be accessing the same map, so you can't actually thread it in that manner or you'd probably get a lockup of some sort... unless you copy the entire map for each thread, which might cause other issues.

GavJ · « **Reply #48 on:** May 07, 2014, 02:37:05 pm »

It doesn't have to make a new copy of the map every tick. Only when the map changes (mining, building/tearing walls, bridges toggling), and even then it only has to update changes to each copy. So it might affect standing RAM usage, but shouldn't affect processor speed in pathfinding to have many copies available for different threads to read.

And a 4x4 embark 50 tiles in z height = about 2 million tiles. Allow for a few relevant states for pathfinding (ramp/not, door/not, open/not, whatever) probably can represent each with 1 byte = pathfinding relevant map copies only take up 2 megabytes each. Times the number of cores on your system, quad core only needs 8 megs of RAM to avoid any read bottlenecks from threads sharing.

Putnam · « **Reply #49 on:** May 07, 2014, 02:38:15 pm »

I think you're gravely underestimating there how slow RAM is.

catpaw · « **Reply #50 on:** May 07, 2014, 02:41:03 pm »

Every creature can have its own core. But you have to gather all the creature decisions into the one main thread to apply them.

The percentages you give, they are already a sum of the basic computation and the costs of applying the data to the tree. You can parallelize the former well but not the later. Especially with temperature, I don't see how one can make much useful computation for temperature without being able to apply it on the data tree right away.

As I wrote, going too fine with threads, basically speaking, threads have to go through semaphores to come together again, like for a tick. Semaphores are not too cheap. Go too fine-grained, and you'll have the semaphores eat up all the benefits you'd get from parallelism before.

(Classical) Multi-threading isn't the golden grail we were told ten years ago. Vector-based multi-threading on the otherhand is awesome in its gains, but very difficult to adept your algorithms for it. Its not well suited for the human brain.

GavJ · « **Reply #51 on:** May 07, 2014, 02:42:39 pm »

Quote from: Putnam on May 07, 2014, 02:38:15 pm

I think you're gravely underestimating there how slow RAM is.

Yes, I am at my limits of knowledge here. Is there only one memory bus for all cores? If so, multithreading pathfinding will not help much at all.
And if that is how computers are built today, should we expect that in the near future there MIGHT be a memory bus for each core with redundant connections? Or is there some good physical reason that's infeasible?

I.e. in general, would two threads accessing DIFFERENT memory addresses take twice as long to wait on memory as one thread? Or as long? Or 1.2x or something as long?

catpaw · « **Reply #52 on:** May 07, 2014, 03:15:58 pm »

Honestly, how exactly its working on base level I'm no expert in, I've doing most of my coding on levels where these things are abstraced away, so take it with a grain of salt.

On a classical computer (desktop) there is only one system bus, so when another core wants to use it, it has to wait to get free. Cores have their own caches tough, so if the memory is in the cache, it doesn't have to use the bus. There is also prefetching going on, if the bus is idle, which is a guess-work by the CPU what memory might be used next. etc. There has been a lot of research into that. One core can block the whole bus if it does nothing but truely true random memory access. But in practice, a) access is not random but focused on pages that get cached, b) the core wants to do some calculations before it needs new memory again, which gives another core air to use the bus.

You can't just multiply bus access. A RAM chip only has so much wires.

reality.auditor · « **Reply #53 on:** May 08, 2014, 06:00:10 am »

Quote from: Putnam on May 07, 2014, 12:55:05 pm

I can't even describe how much you're overstating the importance of multithreading. Better branch prediction is at least twice as important.

I do not overstate importance, I focused on multithreading because it is most obvious and jarring problem. I know branch predictions can do miracles in some cases. Out of curiosity, where in DF you see potential to improve branch prediction? If I was to guess, probably inventory (these zilion boulders and other things lying around) handling.

With multithreading I was imagining things more streamlined and without nutty ideas. For reasons unknown to me GavJ wants to paraelize everything in main loop - and THAT is bound to create horrible, horrible problems. Nope, forget about that, GavJ. This is nonsense that will randomly crash your game.

So, how to do it?

In main loop for each tick we still do major things separately (DUH), each one in block. For example, as first block would be things by nature impossible to sensibly paraelize, like I dunno, finishing constructions, movement, maintenance of tick and things like that. Next blocks would be things like temp calculations etc, and as last - pathfinding. Multithreading would be used only in one block (of course only if in given block it would make sense at all) at once and main loop would wait in each block for every thread to be finished before going further.

This should be easier and leave state of memory consistent after every block.

Example: If you have say 4 threads assigned to pathfinding and say 100 creatures requesting pathfinding, each thread pathfinds for 25 of them. It will work without any lockups or copy of map for each thread, because whole shebang is in its own block. Pathfinding does not modify map and modifies only certain data for exactly one creature (path to follow) exactly one time in whole block. Map is not modified during pathfinding, as map modifications was already done in previous blocks of current tick.

GavJ · « **Reply #54 on:** May 08, 2014, 11:05:30 am »

Quote

With multithreading I was imagining things more streamlined and without nutty ideas. For reasons unknown to me GavJ wants to paraelize everything in main loop - and THAT is bound to create horrible, horrible problems. Nope, forget about that, GavJ. This is nonsense that will randomly crash your game.

That's a very compelling argument you have there. "This is INSANE, because... because... unstated reasons!"

Can you please describe in more specific detail what sort of "horrible problems" or "random crashes" would occur from calculating, say for example, pathfinding, liquid flow, and temperature completely at the same time if you wanted to? A dwarf might accidentally have a path through 5/7 water for one tick? So what? That happens all the time anyway! When movement is checked later and separately, it will stop anything impossible from resulting.

Of course you can't do *everything* in parallel, and I explicitly said you wouldn't. You always need some number of "blocks" but each block doesn't have to have just one thing. It can have anything and everything in it that doesn't use the same memory addresses or rely on necessary contingencies with each other. Then the next block can have the next set of things that aren't contingently reliant.

And if you get yourself out of the mindset that EVERYTHING has to be "squared off" and fully up to date by the end of the tick, a lot more parallelism becomes possible. For example, who on earth cares whether on one tick, you do liquid flowing for a tile, and then temperature based on its new position, versus temperature based on its old position and then temperature-less flowing afterward? It will catch up in the very next tick, and I don't see how that will meaningfully affect anybody's gameplay.

catpaw · « **Reply #55 on:** May 08, 2014, 01:15:13 pm »

If one threads writes memory while the other reads it, the second reads random garbadge. a lot of care has to be taken that such things cannot happen and are a real pain to debug since you cant reproduce that bug at will, it can quickly become a bug hell where crashes just happen every so often and nobody understands why... then you can ugly stuff like thread A requests exclusive access to block 1, gets it and block 2, which is taken by another, so it waits. thread B requests exclusive access to block 2, gets it, and block 1, and is taken by another, so it waits. Bang. Hang!

Really trust people who have written some multi-threaded stuff, and trust the coders who say, you'll have to structure it very tightly to work at all.

There is some value in doing pathing in parallel. But you just cant do anything. And as we discussed already, if you overload the system bus, you don't get anything worthwhile from more cores.

GavJ · « **Reply #56 on:** May 08, 2014, 02:06:46 pm »

*eyeroll* This is a very straw man-y thread.

Nobody is suggesting you just slap random things into threads without thinking about it at all and with no care or consideration. Obviously that will break your game.

What I'm saying is that most of the intensive calculations in the game don't actually LOGICALLY require contingencies on one another, so there almost certainly is indeed a way to run them in parallel, after of course having thought it through carefully. I'm just saying it could be done with some effort, because there's nothing fundamentally in conflict betwee, say, pathing and temperature such that they logically would need to be run in sequence.

The only things to worry about between those two in terms of logical conflicts are paths that end up moving through newly formed ice or obsidian, or rely on ice that is going to melt, etc. But those are only actually conflicts with MOVEMENT, not pathfinding. A path can go ahead and be defined through a wall no problem, as long as the movement itself is in a protected part of the calculation and checks the tile it's moving into each time it happens to make sure it's still open. Which of course it already does.

Yes, you'd still have to do stuff like make sure the pathing has a cache to work from so it doesn't read the same memory address as temperature is writing an ice wall to, etc. You have to think it through, but I'm saying since there aren't any inherent reasons why they really need to be using precisely the same memory address, a solution is there. Probably a pretty simple one (a map of an entire default embark site would fit into a core's cache easily, even without any compression. With compression [such as "next 15 blocks are impassable" instead of "impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable impassable"], it would take up a tiny space in memory)

Putnam · « **Reply #57 on:** May 08, 2014, 02:08:54 pm »

Quote from: GavJ on May 08, 2014, 02:06:46 pm

What I'm saying is that most of the intensive calculations in the game don't actually LOGICALLY require contingencies on one another, so there almost certainly is indeed a way to run them in parallel, after of course having thought it through carefully. I'm just saying it could be done with some effort, because there's nothing fundamentally in conflict betwee, say, pathing and temperature such that they logically would need to be run in sequence.

Well, there's also fire, which is a mite more risky should it fail, especially since I'm not sure if there are any special rules for moving in and out of fires (adventure mode won't let you do it unless you careful-move, but I'm not sure if the AI is doing it in pathfinding or movement).

GavJ · « **Reply #58 on:** May 08, 2014, 02:11:07 pm »

That's just a matter of adding another movement check during the movement phase, not a conflict with pathfinding.

It seems like Toady might not want to do this for dwarf personality reasons, based on how long that's been a "bug" (feature?), but assuming dwarves aren't suppsoed to walk into fire, you just have to check whether there's fire in the next spot or next couple spots in your path, when you are checking movement in its own very short little period of protected time when it checks that near the end of the tick. Same time you check to see if there's a wall there now.

catpaw · « **Reply #59 on:** May 08, 2014, 05:14:46 pm »

GavJ, go ahead and write a larger project using multi-threading. It seems to me you got some theoretical ideas about threading, but never had the "joy" to work through a complexer project utilizing it, and seeing how tons of semaphores a) eat up your coding time, b) eat up your debugging time and c) even eat up cpu time gained first place.

Nobody says there isn't some value gained, but you're definietly overestimating it. Unless you'd go some completely different like SIMD.

Like I told you from bening, which has been rephrased a few times by others. At the end you'll have to have one main thread that collects all the computed data and applies it. It be the big semaphore all stuff will have to go through. This will be a bottle neck. It will be faster than currently, but with 8 cores, definitely not 8 times faster. If you'd leave that main thread away, you'll have no predictable race conditions and chaos. If you have it, you'll have the pathing diamond shape. Which is the worst kind of shape.

I imagine with classical cores, you'll get twice as fast as currently or something like that. Not much more, you'll hit a limit.

This whole thread discussing threading is pointless, if you want to, you're free to write a DF-clone using multithreading. Thats all that will ever get out of it.

The DF code suffers much bigger performance from much more trivial things anyway, like fat dwarves, or the container-temperature flickering and the such. Sometimes I got the idea, it does a lot of binary searches, where hashtables would be applicatable.

News:

Author Topic: Securing DF in case of unfortunate event? (Read 22723 times)

catpaw

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

Putnam

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

Putnam

Re: Securing DF in case of unfortunate event?

catpaw

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

catpaw

Re: Securing DF in case of unfortunate event?

reality.auditor

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

catpaw

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

Putnam

Re: Securing DF in case of unfortunate event?

GavJ

Re: Securing DF in case of unfortunate event?

catpaw

Re: Securing DF in case of unfortunate event?