Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1] 2

Author Topic: Offerings to the FPS god  (Read 3640 times)

SAFry

  • Bay Watcher
  • Call me Seb
    • View Profile
Offerings to the FPS god
« on: April 18, 2012, 12:35:52 pm »

Firstly I'd like to that DF for giving me something no modern game such as Skyrim, Civ 4 mod or even Shogun Total War 2 (with it's poor CPU optimisation) has been able to offer me. An excuse to upgrade my PC.

It seems one of the biggest problems with fortress mode is FPS death. I've been reading a lot about hardware and DF and it seems to have raised more questions then it's answered.

A) am I right in thinking there is no 64 bit version of DF and if there was it wouldn't help much?

B) the large address extender only helps with large worldgen and embarking?

C) DF doesn't support multi cores, but when I launch it on my quad core I get only about 25% load spread out across all 4 CPUs, if I set the core affinity to 1 then I get 25% load but it's all on one core that peaks out at 100%. What's that all about?

D) some people say the memory speed and latency is the most important hardware factor. Dat true? I read something about memory aperture, isn't that to do with the motherboard/CPU interaction?

My current fort has 207 dwarves, lots of Z levels, too many stock piles (probably around 1500 tiles) and with the ini tweaks I'm getting around 40-60 fps on a 4x2.4ghz PC with 4 gig 800mhz RAM on Windows 7 64.

What I was looking at was upgrading my PC to a 4.2ghz quad core with 8ghz of 1866mhz RAM with all the mod cons on the motherboard including a decent system bus speed and level 3 CPU cache support.

That's got to help right? Any advice or experience with building a supercomputer to run DF?

Girlinhat

  • Bay Watcher
  • [PREFSTRING:large ears]
    • View Profile
Re: Offerings to the FPS god
« Reply #1 on: April 18, 2012, 12:45:01 pm »

I'm not sure of the specifics, but...  DF's biggest bottleneck is memory control and processing power.  Namely, when you're running a magma pump stack you'll hit the processing wall as it updates all the temperatures all the time and runs your CPU into the ground.  When you have a large map with a lot of items, you hit the memory wall, where it takes longer and longer to update each tick because DF has to ask RAM about every item.

For the most part, I actually think that faster memory would serve better for old forts.  When you've got piles of stuff and a lot of items, the faster your computer can access the memory and relay that information, the faster it will process the ticks.

LAA generally only helps with worldgen, yes.  It's not entirely useful in regular play as you're not likely to hit the 2GB limit with just items and dwarves.

i2amroy

  • Bay Watcher
  • Cats, ruling the world one dwarf at a time
    • View Profile
Re: Offerings to the FPS god
« Reply #2 on: April 18, 2012, 12:55:29 pm »

A) Correct, there is no 64 bit version of DF.
B) The large address aware thing helps mainly with worldgen, but it also provides a tiny benefit on forts that have huge amounts of junk lying around.
C) DF actually has a tiny amount of multi-threading present in the graphics code that Baughn wrote up (So STANDARD, PARTIAL:X and the other modes without 2D at the front). So that might be what it causing it to do that. Personally I get the best FPS when I set DF to have 1 core all to itself and then also be able to have at least 1 other core help to run it.
D) Memory speed and latency is one of the major things that can help forts, especially older ones. If your fortress is dying due to having tons of junk, it will help a lot. If it's dying to massive amounts of calculations (such as a badly planned magma pumpstack, there are much better designs now that remove that lag), then memory speed will help much less then the number of commands/sec (note that I'm talking about commands here, not cycles) that your processor can handle.
Logged
Quote from: PTTG
It would be brutally difficult and probably won't work. In other words, it's absolutely dwarven!
Cataclysm: Dark Days Ahead - A fun zombie survival rougelike that I'm dev-ing for.

Tripphippy

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #3 on: April 18, 2012, 02:05:33 pm »

The best hardware to take advantage of DF is as follows  :D
Fastest mhz chip speed. number of cores doesn't matter as DF can only utilize one of them. This is the primary bottleneck in DF for modern hardware. Basically it would run faster in DOS with a single core pentium 4 than on modern hardware. (basically you would get worse performance from a multi-core processor because they are usually clocked slower.) I have a quad-core clocked at 2.3 and I think the fastest P4 was 4.2 IIRC.
Fastest memory speed. this will help with memory bottlenecks, which from my experience is not the issue.
64bit will not help DF nor will the LargeAddress flag (except in rare worldgen crashes)

Your proposed upgrade would probably close to double the speed of DF as you would be trading the 2.4ghz for the 4.2ghz clock speed.
Logged

NW_Kohaku

  • Bay Watcher
  • [ETHIC:SCIENCE_FOR_FUN: REQUIRED]
    • View Profile
Re: Offerings to the FPS god
« Reply #4 on: April 18, 2012, 02:19:42 pm »

You have that backwards, Tripphippy - Memory is far more likely to be the main issue in DF.  The most likely bottlenecks if you aren't doing something weird are either pathfinding or things like temperature updates against the item vector. 

DF is constantly accessing essentially random portions of memory for things like Pathfinding or the items vector that can be thousands or even over a million items long, and the guesses it makes pre-processing this memory basically just get discarded, causing a lot of wasted CPU cycle time.  Getting the data out of memory as fast as possible is the best way to combat the problem from the hardware perspective. 

Modern CPUs are basically just 2 Ghz cpus glued together into multicores in ever-more-compex bundles.  However, you only get to use one, and the wait on memory is measured in megahertz, not gigahertz, which means that unless an action takes thousands of computations on the same item from memory, it's all on the memory's latency, not the CPU's speed.  So long as you don't completely cheap out on the CPU, you can basically compare a top-of-the-line to a mid-range and find fairly similar results.  There will be improvements, but not really worth the money.

When purchasing memory, you might want to look at something like this:
http://en.wikipedia.org/wiki/CAS_latency#Memory_timing_examples

Basically, take the stated speed of the RAM, and divide by the CAS Latency clocks to give yourself a number to compare to other sticks of RAM.  Lowest number wins. 

Of course, nowadays, RAM's getting kinda pricey, too.  Just looking this stuff up yesterday, I saw a set of RAM selling for $650, which means it might be actually costing more than the CPU... and its CAS Latency was pretty high, so it wasn't actually any faster than my current 6-year-old RAM. 
Logged
Personally, I like [DF] because after climbing the damned learning cliff, I'm too elitist to consider not liking it.
"And no Frankenstein-esque body part stitching?"
"Not yet"

Improved Farming
Class Warfare

Girlinhat

  • Bay Watcher
  • [PREFSTRING:large ears]
    • View Profile
Re: Offerings to the FPS god
« Reply #5 on: April 18, 2012, 02:40:17 pm »

Although I wonder, do you really need a large expensive quick RAM, or will a small quick RAM work?  Is 1GB RAM as good as 2GB, if it's the same speed?  I suppose it depends on how much memory you expect to use, but I've not looked up how much actual RAM DF consumes during normal play.

i2amroy

  • Bay Watcher
  • Cats, ruling the world one dwarf at a time
    • View Profile
Re: Offerings to the FPS god
« Reply #6 on: April 18, 2012, 02:50:31 pm »

Although I wonder, do you really need a large expensive quick RAM, or will a small quick RAM work?  Is 1GB RAM as good as 2GB, if it's the same speed?  I suppose it depends on how much memory you expect to use, but I've not looked up how much actual RAM DF consumes during normal play.
Assuming you run small enough forts to not be hampered by the lack of memory, then yes.

Also as I mentioned earlier as long as you aren't running a 2D graphics mode you can actually get a small benefit from a second core. I personally tend to average about 100% of 1 core and 12% of another on my Intel i7 with PARTIAL:0. It's just that 3+ cores doesn't give you any benefit over 2 cores.
Logged
Quote from: PTTG
It would be brutally difficult and probably won't work. In other words, it's absolutely dwarven!
Cataclysm: Dark Days Ahead - A fun zombie survival rougelike that I'm dev-ing for.

NW_Kohaku

  • Bay Watcher
  • [ETHIC:SCIENCE_FOR_FUN: REQUIRED]
    • View Profile
Re: Offerings to the FPS god
« Reply #7 on: April 18, 2012, 03:25:36 pm »

Just looking at Newegg, you're generally better off trying to go for smaller amounts of memory... of course, it depends on how much you actually use.

After all, memory is like hard-disk space, it's not really a matter of "how large is it" it's a matter of "do I have enough".  It doesn't matter if you have 4 gigs of RAM or 64 gigs of RAM if you're only using 2. 

Frankly, you'd have to be trying really hard to use up memory in most cases to go beyond 8 gigs with current programs, most of which aren't even capable of going beyond 2 gigs... I'm still on 4 gigs, and haven't had much trouble.  Of course, if you're doing something that really chugs on that RAM for some reason, like a really complex simulation, or you wanted to leave open 800 tabs of YouTube on your web browser, then you might need more. 

For example, the products on this page are all only 8 gigs of RAM, which isn't all that much nowadays, but they'd have a response time of around 5 nanoseconds on a memory fetch.  Plus they're all under $100. 

Meanwhile, this monster is going to respond at the same rate at 32 gigs of RAM, but has a price around $650 or $700.

And here we have 4 gigs of RAM at a theoretical fetch time of 3 nanoseconds for $140.
Logged
Personally, I like [DF] because after climbing the damned learning cliff, I'm too elitist to consider not liking it.
"And no Frankenstein-esque body part stitching?"
"Not yet"

Improved Farming
Class Warfare

Tripphippy

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #8 on: April 18, 2012, 03:37:19 pm »

I'm going to have to disagree with Kohaku on the memory vs. cpu speed issue. I just hooked DF up to my profiling tool and it is CPU bound on 1 core. I may have exaggerated when I said almost double performance from DF when you go from 2.4 to 4.2 Ghz but I think it would provide a greater boost than memory. I don't see a ton of L2 cache misses when I run DF, which would indicate that the CPU is idling waiting for memory to be placed on the bus. I really can't say either way since L2 cache misses cannot be sampled per application.
Logged

arphen

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #9 on: April 18, 2012, 03:52:23 pm »

i'm with trippyhippy on this one.
i7-3960X /thread
Logged

khearn

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #10 on: April 18, 2012, 03:52:56 pm »

Another important consideration is CPU cache size. That old Pentium 4 may have a fast CPU clock, but it's got a small cache compared to current CPUs. Fetching data from an on-die cache of far faster than fetching if from RAM that's somewhere out on the bus. Depending on how the program uses memory, a bigger cache can speed things up a lot. On the other hand, if the program goes sequentially through more data than can fit in cache, you may have very few cache hits, and thus not gain much. I have no idea how DF does in this area.

I've heard a few people claim that an old P4 is the fastest CPU for DF, but I've never heard anyone who actually uses one and gets great results. I suspect a 4.2GHz P4 is actually slower than a 3.2GHz i7 because of things like bus speed and on-die cache. There's a whole lot more to running a program quickly than just the CPU clock speed.

Logged
Have them killed. Nothing solves a problem quite as effectively as simply having it killed.

arphen

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #11 on: April 18, 2012, 03:59:06 pm »

I don't know why people keep bringing up pentium 4s. P4 was a box menacing with spikes of fail.(MOAR PIPELIENS)
the i2500k can be oced to 4,6ghz i dunno about i2700k.
cache size is going to be the most important factor for DF.
The processing speed of an i2500k is INCREDIBUL.
Logged

NW_Kohaku

  • Bay Watcher
  • [ETHIC:SCIENCE_FOR_FUN: REQUIRED]
    • View Profile
Re: Offerings to the FPS god
« Reply #12 on: April 18, 2012, 03:59:43 pm »

I don't really think so - depending on where you start, you can quarter your memory's latency without really bleeding for it in price, but CPU is going to be marginal, and A* Pathfinding (one of the big problem-causers of lag) results in plenty of misses.  If you aren't already using very fast RAM, it's much cheaper to upgrade that (provided you aren't going for very large quantities of RAM) than it is to upgrade your CPU, and likely to give you more performance gains in return.

Putting objects into the object vector likewise resulted in geometric lag - 10,000 objects results in the game running around 1/10th as fast as initially, and 100,000 objects results in 1/1000th the speed. 

These objects weren't doing anything, so it's most likely not hurting the CPU any, it's just iterating through the vector every frame that causes geometric memory lag.  That's caused by pointer tracing down the length of a vector. 
Logged
Personally, I like [DF] because after climbing the damned learning cliff, I'm too elitist to consider not liking it.
"And no Frankenstein-esque body part stitching?"
"Not yet"

Improved Farming
Class Warfare

Tripphippy

  • Bay Watcher
    • View Profile
Re: Offerings to the FPS god
« Reply #13 on: April 18, 2012, 04:16:58 pm »

The only way to be sure is to hook up a performance counter for L2 cache misses. Unfortunately my place of employment does not offer us such high level tools so I am in the process of cobbling together some intel libraries and a .net service to add performance data for the low level cpu functions. More information to follow.

The comment about P4 running DF the best is mostly because that was the highest ever clock speed attained on a single core, not because it was an awesome kick ass chip. Really a modern chip at 4Ghz+ is going to do a better job in other areas.

I do agree that the size of your cache is going to really affect processing speed, however I believe that Toady is pretty good at keeping his algorithms from cache thrashing.
Logged

SAFry

  • Bay Watcher
  • Call me Seb
    • View Profile
Re: Offerings to the FPS god
« Reply #14 on: April 18, 2012, 05:33:46 pm »

Wow, thanks for your responses, been annoying my mates following the forum on my phone whilst we were supposed to be playing a board game.

NW_Kohaku, damn, that's really going to annoy me now about the CAS, I must admit I don't really understand how you are calculating it but I'd like to try to take it into consideration. Could you show me how to calculate it for this please http://www.overclockers.co.uk/showproduct.php?prodid=MY-138-CR

I haven't got a massive budget so I have to stick to AMD not Intel, the fastest CPU is 4.2ghz and max RAM speed supported is 1866mhz http://www.amd.com/us/products/desktop/processors/amdfx/Pages/amdfx-model-number-comparison.aspx

Shame because I would be tempted to order that 4gb of uber RAM from the states and hope customs don't stop it and fine me the extra 20% UK sales tax plus admin fee!

Tripphippy, good thing about that AMD FX is that it has a 4mb L2 cache and an 8mb L3 plus the motherboard I was looking at specifically supports all that. Obviously I wont JUST be running DF and this PC has to last me 5 years until the next upgrade so I got to get something realistic but I've steered away from hex or 8 cores and I don't really need 8gig of RAM, 4 would be fine just that it's actually becoming hard to find smaller amounts these days.

I have a PC I could roll back to windows 98 but not as far as DOS I'm afraid so won't be testing that one out.

OK, so getting a fast CPU can't hurt, fast RAM, 4 gig is more then enough, just this question of CAS latency. I'd certainly like to try to get it as low as possible as long as it doesn't cost a fortune and the PC will still function normally on other tasks. 
Pages: [1] 2