Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1] 2

Author Topic: DF and Quad-Channel Ram  (Read 2896 times)

BauxiteProcessor

  • Bay Watcher
    • View Profile
DF and Quad-Channel Ram
« on: December 03, 2016, 02:47:26 pm »

Does Dwarf Fortress benefit from quad-channel ram?
Logged

wierd

  • Bay Watcher
  • I like to eat small children.
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #1 on: December 03, 2016, 09:38:34 pm »

possibly. Needs a 3ghz+ cpu for the bottleneck to move to memory in my experience.  Slower than 2ghz and it gets cpu bound.
Logged

Miuramir

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #2 on: December 05, 2016, 09:12:27 pm »

Does Dwarf Fortress benefit from quad-channel ram?

Unhelpful but correct answer (see below for more interesting info): Only if your computer (motherboard, CPU, RAM) does, and to the same degree.  DF doesn't know enough about the underlying hardware to understand or care about any of that, it just "sees" an overall speed. 

Somewhat more useful info:

We *think* that on a system that is otherwise fast enough in every respect, the ultimate bottleneck is getting large quantities of data from main memory to the CPU and back.  You can increase the overall transfer rate by increasing the real clock speed of the bus (and all other components that need to match), or by increasing the amount of data transferred in one go.  (If you think of a highway cargo analogy, increasing bus speed is putting in a higher speed limit and faster trucks, and increasing the channels is putting additional trailers on all the trucks so they carry more per load.  All DF cares about is tons of cargo per hour, though, not how it gets there.) 

In general terms, silicon speeds aren't increasing that fast any more, so that most of the improvements are from higher numbers of channels.  But those numbers are generally already taken account for.  If four-channel RAM allows for faster ultimate rates than two- or three-, which it should eventually given equally-developed hardware, it should help, possibly significantly on a memory-transfer-constrained system. 

If you've got access to a system where you can A/B test this, it would be interesting science.  (And if you additionally had the ability to underclock it to see at what point it no longer mattered, it would be extremely interesting.) 
Logged

BauxiteProcessor

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #3 on: December 06, 2016, 01:50:37 am »

Unhelpful but correct answer (see below for more interesting info): Only if your computer (motherboard, CPU, RAM) does, and to the same degree.  DF doesn't know enough about the underlying hardware to understand or care about any of that, it just "sees" an overall speed. 

Somewhat more useful info:

We *think* that on a system that is otherwise fast enough in every respect, the ultimate bottleneck is getting large quantities of data from main memory to the CPU and back.  You can increase the overall transfer rate by increasing the real clock speed of the bus (and all other components that need to match), or by increasing the amount of data transferred in one go.  (If you think of a highway cargo analogy, increasing bus speed is putting in a higher speed limit and faster trucks, and increasing the channels is putting additional trailers on all the trucks so they carry more per load.  All DF cares about is tons of cargo per hour, though, not how it gets there.) 

In general terms, silicon speeds aren't increasing that fast any more, so that most of the improvements are from higher numbers of channels.  But those numbers are generally already taken account for.  If four-channel RAM allows for faster ultimate rates than two- or three-, which it should eventually given equally-developed hardware, it should help, possibly significantly on a memory-transfer-constrained system. 

If you've got access to a system where you can A/B test this, it would be interesting science.  (And if you additionally had the ability to underclock it to see at what point it no longer mattered, it would be extremely interesting.)

My concern was that the amount of memory DF can check before it needs to do another calculation to be able to check memory might not have been large enough to benefit from increased memory bandwidth. If we think it's wrangling large amounts of memory in single chunks that's good. Next thing to think about is whether it's better to go for slightly higher clocks or more channels and slightly higher cache size (AKA i7-6700k vs i7-6850k for current CPU choices.)

I'll look into doing science when I put together my new computer. We really could use more info on what makes for a good DF computer besides the broad strokes.
Logged

Max™

  • Bay Watcher
  • [CULL:SQUARE]
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #4 on: December 06, 2016, 09:03:41 pm »

Ideal for df would be massive L2/L3/L4 on-die CPU cache. A cache miss goes from damn near instant work on data to pulling it out of ram.

More bandwidth and higher speed will help with ram, and 64 bit helps some in letting the game load bigger chunks into ram to be worked on simultaneously. A page swap is soul-crushing compared to cache misses.
Logged

Jairl

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #5 on: December 14, 2016, 08:32:00 pm »

My concern was that the amount of memory DF can check before it needs to do another calculation to be able to check memory might not have been large enough to benefit from increased memory bandwidth. If we think it's wrangling large amounts of memory in single chunks that's good.

Uhh... I don't keep up with Toady's programming style, but last I read he didn't even know what vectorized math (SIMD) was and just assumed the compiler would automagically optimize his code to use it. SSE is far too impractical when you're extremely OOP with massive pointer chains to follow and data spread out all over the place which makes me doubt it being used at all (yes you can use it, but it'd result in negative performance gain... err, again... yes and no... but really it works much better with linearized memory. Just load, operate, and write.).

Having more channels would really matter more on the fetching mechanisms of the CPU than DF's code (given toady is not explicitly taking advantage of this stuff) which again, hit and miss.
« Last Edit: December 14, 2016, 08:38:15 pm by Jairl »
Logged

Kumquat

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #6 on: December 15, 2016, 04:32:14 pm »

My concern was that the amount of memory DF can check before it needs to do another calculation to be able to check memory might not have been large enough to benefit from increased memory bandwidth. If we think it's wrangling large amounts of memory in single chunks that's good.

Uhh... I don't keep up with Toady's programming style, but last I read he didn't even know what vectorized math (SIMD) was and just assumed the compiler would automagically optimize his code to use it. SSE is far too impractical when you're extremely OOP with massive pointer chains to follow and data spread out all over the place which makes me doubt it being used at all (yes you can use it, but it'd result in negative performance gain... err, again... yes and no... but really it works much better with linearized memory. Just load, operate, and write.).

Having more channels would really matter more on the fetching mechanisms of the CPU than DF's code (given toady is not explicitly taking advantage of this stuff) which again, hit and miss.

Well ... compilers do use SSE automatically ... they just use the single-float instructions there, not SIMD. Some compilers might do a little bit of vectorization but that is pretty rare and, indeed, the gains from that in general case are pretty marginal. In the context of DF, they'd probably have some application in weather and temperature calculations but little else.

I'm sure a person with some tools and more enthusiasm than I have could do some in-depth profiling about it but I seem to recall that with DF processor speed matters much less than your memory latency and cache size. With quad-channel you can theoretically have four cache misses going the same time but since DF is single-threaded that is fairly unlikely to give any benefit.

Also, in my experience, the only way to optimize massive pointer chains is to get rid of them.
Logged

Grimlocke

  • Bay Watcher
  • *kobold noises*
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #7 on: December 16, 2016, 07:32:08 pm »

I had at some point intended to make a set of standardized DF benchmarks for people to be able to put some actual practice to all this theory.

But, while the fluids benchmark was easy enough (arena level with tons of water moving around, note down framerate once stable), benchmarking fortress and adventurer mode proved to be rather difficult. The idea was the let a big, cluttered fortress run idle from one season to another with the seasonal autosave enabled, frame limited disabled and measure the time taken. But Armok does play dice and randomization really mucks up the results. Same for adventurer mode.
Logged
I make Grimlocke's History & Realism Mods. Its got poleaxes, sturdy joints and bloomeries. Now compatible with DF Revised!

BauxiteProcessor

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #8 on: December 16, 2016, 08:11:25 pm »

I had at some point intended to make a set of standardized DF benchmarks for people to be able to put some actual practice to all this theory.

But, while the fluids benchmark was easy enough (arena level with tons of water moving around, note down framerate once stable), benchmarking fortress and adventurer mode proved to be rather difficult. The idea was the let a big, cluttered fortress run idle from one season to another with the seasonal autosave enabled, frame limited disabled and measure the time taken. But Armok does play dice and randomization really mucks up the results. Same for adventurer mode.

We really do need benchmarks. Even if we could just test fluid behaviour for varying levels of overclocking for a given CPU, different levels of RAM latency and frequency, presence of eDRAM, cache size and dual vs quad memory channels that would be really useful.

It also ought to be possible to design a deterministic pathfinding test in the arena and we could also test worldgen completion time with a given seed.
« Last Edit: December 16, 2016, 08:16:59 pm by BauxiteProcessor »
Logged

Grimlocke

  • Bay Watcher
  • *kobold noises*
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #9 on: December 16, 2016, 08:48:35 pm »

I had at some point intended to make a set of standardized DF benchmarks for people to be able to put some actual practice to all this theory.

But, while the fluids benchmark was easy enough (arena level with tons of water moving around, note down framerate once stable), benchmarking fortress and adventurer mode proved to be rather difficult. The idea was the let a big, cluttered fortress run idle from one season to another with the seasonal autosave enabled, frame limited disabled and measure the time taken. But Armok does play dice and randomization really mucks up the results. Same for adventurer mode.

We really do need benchmarks. Even if we could just test fluid behaviour for varying levels of overclocking for a given CPU, different levels of RAM latency and frequency, presence of eDRAM, cache size and dual vs quad memory channels that would be really useful.

It also ought to be possible to design a deterministic pathfinding test in the arena and we could also test worldgen completion time with a given seed.

Good point, worldgen is actually mostly deterministic if you use the same seed every time.

Though dwarf mode is probably still the most pertinent one, and might have a very different load profile than fluids and worldgen. Maybe setting BenchMarked (the Benching Mark-Bench of Marking) in a dead world would at least save the random influences of world progression? Modding all the races to live only a couple years and not reproduce would do the trick, though actually starting a fortress might be tricky at that point.
Setting of the dorfs to all do only a single task, like lugging rocks from one end to another, might also reduce the randomness. The arena is a bit limited in usefulness there since its hard to set creatures to run around the map for a longer time in some sort of predetermined way, especially ones that involve lots of items being called on.

For fluids I can so far say that the performance scales pretty much linear with single-core performance and that for CPUs with a lot of cores you need to make sure turbo mode actually does its job, which for me it did not do with the default settings (it refused to clock up to turbo speed, probably because it counted 17% load as low workload). Adjusting DF's process priority fixed it, and so did disabling 4 of the 6 cores which also had a nice bones of letting me reach slightly higher overclocks.
Logged
I make Grimlocke's History & Realism Mods. Its got poleaxes, sturdy joints and bloomeries. Now compatible with DF Revised!

BauxiteProcessor

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #10 on: December 16, 2016, 11:12:05 pm »

I had at some point intended to make a set of standardized DF benchmarks for people to be able to put some actual practice to all this theory.

But, while the fluids benchmark was easy enough (arena level with tons of water moving around, note down framerate once stable), benchmarking fortress and adventurer mode proved to be rather difficult. The idea was the let a big, cluttered fortress run idle from one season to another with the seasonal autosave enabled, frame limited disabled and measure the time taken. But Armok does play dice and randomization really mucks up the results. Same for adventurer mode.

We really do need benchmarks. Even if we could just test fluid behaviour for varying levels of overclocking for a given CPU, different levels of RAM latency and frequency, presence of eDRAM, cache size and dual vs quad memory channels that would be really useful.

It also ought to be possible to design a deterministic pathfinding test in the arena and we could also test worldgen completion time with a given seed.

Good point, worldgen is actually mostly deterministic if you use the same seed every time.

Though dwarf mode is probably still the most pertinent one, and might have a very different load profile than fluids and worldgen. Maybe setting BenchMarked (the Benching Mark-Bench of Marking) in a dead world would at least save the random influences of world progression? Modding all the races to live only a couple years and not reproduce would do the trick, though actually starting a fortress might be tricky at that point.
Setting of the dorfs to all do only a single task, like lugging rocks from one end to another, might also reduce the randomness. The arena is a bit limited in usefulness there since its hard to set creatures to run around the map for a longer time in some sort of predetermined way, especially ones that involve lots of items being called on.

For fluids I can so far say that the performance scales pretty much linear with single-core performance and that for CPUs with a lot of cores you need to make sure turbo mode actually does its job, which for me it did not do with the default settings (it refused to clock up to turbo speed, probably because it counted 17% load as low workload). Adjusting DF's process priority fixed it, and so did disabling 4 of the 6 cores which also had a nice bones of letting me reach slightly higher overclocks.

A dead outside world without civilizations or megabeasts behaving unpredictably. A map with no animals capable of spawning, on a terrain without random saplings appearing, caverns removed, spoilers not breached and with weather turned off. There we can do tests like "there are lots of items lying around and an unconscious dwarf to keep the fortress from being abandoned", "there are dwarves who will not be distracted by things like thirst pathing deterministically" and "there is a lot of machinery doing stuff plus unconscious dwarf". Possibly also a fort with lots of items, deterministic pathing, machinery and fluids going on at the same time to see what it's like in combination.
Logged

Goatmaan

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #11 on: December 17, 2016, 11:04:23 am »

 I have a save that might be good to do benchmark tests on.
It's a. 40.19 starter pack r2 save so dfhack and therapist both work fine, for managing things.
 I'm kinda worried I didn't zip the save right though, its icon looks like a page, not like a folder with a zipper on the side. The (hopefully correctly) zipped region1 size is 118 mb.
 Only problem is it crashes after 7 game days(nemisis unit load fail) BUT on my system a game day took 7 minutes 10 seconds.
 I didn't make notes on world gen, medium world, weres off, minerals everwhere I think that's it.
6*6 forested embark...that's mostly floored, with 10 10*10 binned block stockpiles ready to almost finish the flooring.
90k+ drinks, 20k+ food,
more than 1 totally mined out z that's open to pathing.
75 miners, over 40 +5
 Too much to list, but ask and ill see if I remember.
 Oh and its got 830+ adult dwarfs, and about +-100 children.

 Guaranteed to melt your shiny new cpu, or at least give it a good burnin'  :P

   Goatmaan
« Last Edit: December 17, 2016, 11:50:26 am by Goatmaan »
Logged
My !!XXcpuXX!! *HATES* me.

BauxiteProcessor

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #12 on: December 17, 2016, 12:52:34 pm »

You should post it for entertainment purposes, though for benchmarking it's important to note that consistent results are mord important than difficulty (indeed if a computer can't run a benchmark at all that just gives you binary information instead of really testing it.)
Logged

redivider

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #13 on: December 17, 2016, 07:53:57 pm »

Link that save, I can give you some FPS readings on my old i7 3960x.
« Last Edit: December 17, 2016, 07:56:14 pm by redivider »
Logged

Goatmaan

  • Bay Watcher
    • View Profile
Re: DF and Quad-Channel Ram
« Reply #14 on: December 17, 2016, 10:29:24 pm »

I only have my phone for net access, I'll have to upload it to dffd at the library in the next couple of days. I was getting 3 fps, but it became unstable (ram limit) so Large Address Aware is probably required, just to get you the seven game days before the nemisis bug kills it. 

Toady said he'd take a look,if he had time...he didn't say.... Oh I can fix that easy! :(

New 43.05 (64 bit) fort is already at 475 total dwarfs, 300 of those are adults. 10 fps.
Really need Therapist and dfhacks cleanowned to continue to seriously play it.

    Goatmaan
Logged
My !!XXcpuXX!! *HATES* me.
Pages: [1] 2