Topic: Hertz, RAM, and Framerate (Read 6801 times)

schlake · « **on:** September 12, 2014, 10:34:56 am »

I've seen a lot of posts about framerate, but none that touches on my questions.

Is DF multithreaded? It doesn't appear to be?

So I'm assuming DF is bound by raw CPU speed, not cores.

It also appears to be very memory intensive, so more RAM is better. But how much RAM?

Maw · « **Reply #1 on:** September 12, 2014, 10:40:34 am »

Multithread - in the main, no (I remember something about graphics/video separated sometime ago)
You assume correctly
How much you got? Nevermind answering, if you're asking, you need more.

schlake · « **Reply #2 on:** September 12, 2014, 10:59:59 am »

I'm asking because I think I should change which machine I play on. Or buy a new machine. Currently I use my laptop, because its portable. But I'm only a few weeks in. While I was adventurous enough to borrow to Hell in the first week, I'm still trying to coerce my dwarves into actually defending themselves against a goblin attack. I'm getting close!

My laptop was the cheapest one available two years ago. It's not very powerful. My gaming PC would probably be a huge step up. It's older than the laptop, but was built with 12GB of RAM and still has a faster CPU. But is isn't convenient to play DF on because I use my television as the monitor for it, which means I couldn't watch movies while playing DF. Or I could buy or build a new computer with a stupidly fast CPU and skip extra cores.

zimluura · « **Reply #3 on:** September 12, 2014, 11:34:49 am »

At the moment i think df can only use 2GB ram before it crashes because of the 32bit ram limitation. It looks like this could be extended to 4GB on 64bit windows with a simple tweak to the large address aware flag. I think this can be done without Toady's help.
http://www.bay12forums.com/smf/index.php?topic=101046.0

Getting DF to use more that 4GB would require going to a 64bit executable, which is something only Toady can do, and might be similar in magnitude to the SDL port of DF2010 (best case lots and lots of find and replace, worst case...who knows). Anyway, if you have more than 4GB of ram you're probably ok there for the time being.

Most of the game is single threaded I think. SDL is a multi-threaded DLL so that might be why it can use more than 1 thread, but i don't think there is enough graphics workload in DF to warrant a separate thread for drawing. Separate threads for pathfinding and maybe flows will hopefully come eventually. In my rough, over-general, never-actually-seen-the-source, estimation; those should be possible without a complete engine redesign.

BoredVirulence · « **Reply #4 on:** September 12, 2014, 06:02:56 pm »

The core logic of DF is single-threaded. The graphics that were reworked long ago run in a separate thread.
You need at least 3GB, but 4GB should also be fine. Keep in mind that while DF can use 2GB, your OS will be using a portion too.

DF will crash if it tries to go above 2GB, making it LAA will help, but you should generally consider roundToOneGB((maximum DF can use) + (maximum your OS will use)).

Faster RAM is always better, DF is very much memory bound, and no Cache's are large enough to sate this beast. More cores do help, by allowing other stuff to be offloaded to other cores, but the difference between 2 cores and 4 cores is negligible.

Making DF 64-bit shouldn't be difficult for Toady. If he isn't heavily reliant on the data type size in his calculations, there should be little to no work. If he does rely on the size of the data type he's using, there could be a lot of work. Even if his calculations did depend on it, it shouldn't be too much effort to abstract the data type, but that doesn't matter, Toady wants to make new features not optimize. And who can blame him, optimization is boring, and he'll probably have to do rewrites of his optimization later making it more work...

Saiko Kila · « **Reply #5 on:** September 12, 2014, 06:43:12 pm »

Quote from: schlake on September 12, 2014, 10:34:56 am

It also appears to be very memory intensive, so more RAM is better. But how much RAM?

If you use java things like DwarfTheRapist or SoundSense, plus have open some programs like web browser (for wiki or forum) and screen shot capture program, then 8 GiB can be low after longer session and Windows has to use swap more which decreases general performance. This is also caused by leaks, so killing programs and reloading will help (especially java.exe is a hog and can reserve way more than 2 GiBs). Still, I wouldn't try below that amount.

When I play DF, my commit charge (practically a physical memory usage, though it counts page files if used too) is about 6 GiB.

Also, all windows 32-bit programs can be allowed to use more than 2 GiB memory for a process, but not all of them can make a good use of it (and some even may crash when LAA flag is enabled, because they use one bit of a pointer for something else than pointing

). I'm not sure how it is with DF, because it usually uses less memory than that.

Olith McHuman · « **Reply #6 on:** September 13, 2014, 03:16:14 pm »

Regarding the ram speed question, I did some testing a while back. I was experimenting with my ram's frequency and dropped it from 1600 to 1333. This resulted in df slowing down by 8%. According to a post on tom's hardware (I think), most other cpu intensive programs slow down by 1% under those conditions.

If your pc allows overclocking (somewhat unlikely unless you built it from parts), you might try lowering the ram command rate. In my case it was at 2T and I had good results with dropping it to 1T (overclocking may crash your computer/make it unstable, not responsible if it eats you dog, yada yada). None of the other timings seemed to have any measurable impact on speed.

Kryxx · « **Reply #7 on:** September 13, 2014, 04:25:32 pm »

Coding for multi-threading+multi core is not easy.

You run into memory management issues in passing off the information, it's not something you do easily. It means re-building your whole engine to do it.

I've ran this over 8 hours. Again I'm not very far into it with only 50 dwarves, but I haven't seen the game use more than 75% of 1 of my i-5 cores and that's mostly when it's loading/saving, and not more than 5GB of total system ram of 8 GB used. This is with Dwarf therapist and about 1.5GB from other apps running.

wierd · « **Reply #8 on:** September 14, 2014, 01:37:04 am »

DF is single threaded, and heavy on ram consumption.

So much so in both respects, that it is one of the few applications where ram SPEED really has a noticeable effect.

Basically, because DF is doing so many operations in memory, constantly, and with large chunks of memory at a time, there is a pronounced performance value that is captured by better memory access. A fast CPU with slow memory will run DF like poop.

For the performance crazed tinkerers out there, i would advocate structured tests using well established "FPS Death" saved games for DFFD as a benchmarking tool, then test the differences in performance that can be gained from things like using single or dual channel memory configurations, DDR2 vs DDR3 with matched clockrate CPUs, etc.

Just from the nature of the program though, you want a workhorse CPU with as much cache on die as you can afford, with as high a clockrate as you can afford, with the fastest memory you can afford, in a dual channel configuration.

DF has similar requirements to a high utilization database server.

The difference is that DF is a 32bit process, so a system with more than 6gb of memory is not going to benefit much, and most modern DB servers are 64bit native these days. (As a 32bit process, the most memory address space it can see is 4gb, some nontrivial portion of which is going to be reserved for libraries and system API use. This is usually around 3gb of data and application memory, and 1gb for system. When it exhausts that, the program simply crashes.) The benefit of having more memory than 6gb comes from increased disk cache use by the OS behind the game, keeping DF's incessant raw checking from actually hitting actual disk from cache misses. Since DF is a 32bit process, there is actually a point where more RAM does nothing.

I would love to see a 64bit build of DF, after having all the datatypes sanitized for 64bit register use. (This is not a trivial thing. A LOT CAN AND PROBABLY WOULD BREAK while trying to do this.) since it would allow DF to make use of even larger memory pools, and larger register sizes (meaning better cache use), which would almost certainly result in a profound performance improvement of the game. However, as this would require a significant overhaul of the entire codebase, it's not something I think toady wants to handle until he simply has no other alternative. (Which he is actually running into right now. I've watched memory allocation during complex, large-world worldgens. At least some portion of the crashes are from it exhausting the 32bit address space. As he adds more features and complexity to the game, the 32bit limit will chafe more and more.)

Multithreading is not really something that DF is achitechturally going to be good at. I suppose that with the new "active world" paradigm, he could have 2 worker threads going-- at least for fortress mode anyway. 1 thread handling all the stuff going on in fortress mode, and another handling the "Active world" process, with controlled interfaces between both processes for things like invading armies, diplomats entering and leaving the map, et al. --- but that would also be a non-trivial code change.

zimluura · « **Reply #9 on:** September 15, 2014, 07:51:32 am »

Quote from: BoredVirulence on September 12, 2014, 06:02:56 pm

The core logic of DF is single-threaded. The graphics that were reworked long ago run in a separate thread.

Quote from: Kryxx on September 13, 2014, 04:25:32 pm

Coding for multi-threading+multi core is not easy.

You run into memory management issues in passing off the information, it's not something you do easily. It means re-building your whole engine to do it.

I'm involved in this process at the moment, so I'm very interested in talking about this with others. I've just redesigned my 3d engine to use a triple-buffered world-state, so that I can always use the latest completed state for rendering, while always being able to generate a new world-state; and yes, _that_ was a complete engine redesign.

I don't think Dwarf Fortress would benefit from that approach to multi-threading though. The ram usage for world-states doubles(or triples, or more), and what you get out of it is the ability to render a scene while doing AI, physics, animation, et cetera calculations. Dwarf Fortress looks like* it has a pretty light work-load when it comes to rendering to the screen, so getting that to run in parallel sounds like a waste of effort (complete redesign) for a net loss (making it use much much more ram), and a possible, linear, 50ms speed per frame improvement, which would also get negated (and then some) by the overhead involved in the ram copying or message-processing needed to keep things concurrent. Modern 3d graphics engines benefit from this approach since rendering for them is very complex (typically the most time consuming thing in the loop), while the ram load from the world-state is very light. Dwarf Fortress looks to be the polar opposite of that.

There is, however another approach, which, from my (albeit limited) research, doesn't typically require a complete redesign. One offloads computationally intensive tasks to other cores (from my understanding path finding is the big one for DF), and the cores report when they've completed the work.

* Again, I haven't seen the source, but for the world pass you have discrete, integer, xyz coordinates to draw from, for the building pass I can't say how he does it (added into the map=fast, run through linear array=slower), and for the creature pass you typically have less than 3000 creatures which are each easily culled with those discrete xyz boundaries of the viewport.

BoogieMan · « **Reply #10 on:** September 16, 2014, 07:30:11 pm »

I would hazard a guess that the best way to get Dwarf Fortress running as best as possible hardware wise would be a CPU with powerful individual cores, and probably a lot of cache followed by fast RAM. I wonder if it's more about memory timing or just the raw power of the memory? I'm not very knowledgeable about the differences in RAM beyond what the marketed speed rating say. A high quality motherboard would probably make *some* difference as well.

I wish I knew exactly what strengths are best for DF. I would seriously considering building a Dwarf Fortress PC.

wierd · « **Reply #11 on:** September 17, 2014, 01:33:51 am »

Ideally, you would want your RAM to run at the same clock rate as the CPU's instruction cycle, but this hasn't been realistically possible for many many years now.

This is why you see two different figures for a CPU's speed. It's rated clock frequency, and the speed of the Font Side Bus. (FSB)

The FSB speed is the clockrate in Mhz at which data can be fetched from memory, and stored in memory. The CPU's clockspeed is the FSB speed, multipled by some clock multiplier value. Obligatory wikipedia on FSB

The faster the FSB clockrate, the faster the CPU can get at the data in memory.

In addition to the raw clockrate, there is also the raw width of the access itself. (How many bits at a time, or the word size.) A 64bit CPU has a larger word size than does a 32bit processor. This is where "Dual channel memory architecture" and pals come in. More obligatory wikipedia by using multiple, identical channels to identical memory modules, multiple words can be read/written by the memory controller at a time, allowing larger chunks of memory to be accessed directly per fetch/store cycle.

Because DF uses such large amounts of memory, its major constraint is NOT how quickly it processes its information, but rather how quickly it can get its data in and out of memory-- As noted, the FSB speed is some non-trivial factor of the cpu's clock speed, which for modern CPUs is always slower than that clockspeed. This means that applications like DF spend some significant amount of time just sitting around twiddling thumbs waiting for data to get read or written before it can move on to the next thing it needs to do. DF's data structures are very large. (MANY words large, Many many many words large). This means that DF's process spends a very large amount of time getting data into and out of the CPU's registers. When this happens, and it is the primary limiter on performance, the process is known as being "Memory bound"

The ideal computer for running DF on (which does not currently exist!) is a very fast CPU, with matched clockrates (1:1 relationship between FSB speed and clockrate), with a very very large word size.

Processor cache serves 2 major functions.

Firstly, it stores program instructions so that they dont have to be fetched from memory after an operation completes. Keeping useful instruction information in the cache is one of the purposes of a thing called "branch prediction", which is a feature of modern compilers. Toady gets this kind of optimization for free just by turning on optimization flags with his compiler. (The compilers these days often do a better job of optimizing this than does manual hand-assembler optimization, in case you were wondering.) By keeping program instructions in cache memory, which DOES run at 1:1 speed with the CPU's clock, (and in conjunction with branch prediction) very complex operations that have conditional checks can work with data in the CPU's registers, and never skip a beat. The CPU just loads instructions from the instruction cache, and does its thing, completely without any need to touch system memory at all!

The other function of processor cache is to cache the contents of frequently used memory addresses. In the case of DF, one of the most frequently accessed memory structures is the item registry vector, and another is the entity registry vector. (The first is a structure that contains the data for all the non-living objects currently being evaluated, and the second is a structure that keeps track of all the living creatures, including dwarves.)

A very large cache would allow larger structures to be accessed immediately by the CPU, rather than having to force the CPU to simply wait around. CPU cache is only important for system architectures that do not have a 1:1 speed pairing between memory and the internal clock speed. (This is why the ideal system does not need it! Main memory *IS* cache! This is also why the ideal system does not exist!)

So, when going shopping for a workhorse that can play DF like a beast, you want a system with multichannel memory architecture, with as much cache as possible, and the highest clock rate you can afford, with the fastest FSB speed you can find.

That said, there are some things to be aware of!

1) Multicore chips have a unified/shared cache pool! That is to say, the 1mb or so of processor cache is shared between all of the cores on that cpu's die!

2) Server grade CPUs tend to have MUCH larger cpu cache sizes built into the chip.

This means the best boards you will find for running DF on are not going to be the fancy gaming rigs you find all over newegg. Instead, you are going to find much better love looking for server boards. Like this:

http://www.tyan.com/Motherboards_S7070_S7070WGM2NR

This has 2 discrete CPU sockets, intended for high horsepower XEON processors.

Modern XEON processors can have up to 24mb of L2 (On processor) cache built into them!
http://en.wikipedia.org/wiki/Xeon

Compare this with the largest you can get with consumer i7 chips, which tops out at around 10mb for high end chips.

(More than you ever could possibly want to know about intel's CPU offerings and how they stack up.

The advantage of server boards, like that referenced tyan offering, is that they use discrete CPUs, each with its very own CPU cache. This means that stuffing DF on one of the CPUs, and other system processes on the other CPU, will allow DF to run without any interference at all from the system processes. That CPU's cache will not have to be shared with other processes instructions. This coupled with the very large cache size, and 4-channel memory architecture. That tyan board would run DF like a beast.

It just so happens that it would also cost you a small fortune.

BoogieMan · « **Reply #12 on:** September 17, 2014, 08:25:02 am »

Thanks for the well written and informative post, wierd.

Even if the most optimal system would be out of the price range of most people, maybe with that knowledge one could assemble the next best thing.

wuphonsreach · « **Reply #13 on:** September 17, 2014, 03:33:44 pm »

4x4GB of DDR3/2133 is going to run around $160-$225. Which is not that bad and seems to be a standard clock frequency. DDR3/2400 is a bit more expensive (5-15%).

Except that a lot of CPUs/motherboards only support DDR3/1600, unless you jump through some extra hoops. Anyone have a list of MB/CPU combos that support DDR3/2400 speeds?

DeadlyDodo · « **Reply #14 on:** September 25, 2014, 08:40:52 am »

Quote from: wuphonsreach on September 17, 2014, 03:33:44 pm

Except that a lot of CPUs/motherboards only support DDR3/1600, unless you jump through some extra hoops. Anyone have a list of MB/CPU combos that support DDR3/2400 speeds?

I'm using the Rampage Extreme IV with 2400 Corsairs, works rather well, (occasional BSOD).

News:

Author Topic: Hertz, RAM, and Framerate (Read 6801 times)

schlake

Hertz, RAM, and Framerate

Maw

Re: Hertz, RAM, and Framerate

schlake

Re: Hertz, RAM, and Framerate

zimluura

Re: Hertz, RAM, and Framerate

BoredVirulence

Re: Hertz, RAM, and Framerate

Saiko Kila

Re: Hertz, RAM, and Framerate

Olith McHuman

Re: Hertz, RAM, and Framerate

Kryxx

Re: Hertz, RAM, and Framerate

wierd

Re: Hertz, RAM, and Framerate

zimluura

Re: Hertz, RAM, and Framerate

BoogieMan

Re: Hertz, RAM, and Framerate

wierd

Re: Hertz, RAM, and Framerate

BoogieMan

Re: Hertz, RAM, and Framerate

wuphonsreach

Re: Hertz, RAM, and Framerate

DeadlyDodo

Re: Hertz, RAM, and Framerate