Topic: Use SSE (Read 1348 times)

copx · « **on:** June 25, 2008, 05:54:54 pm »

There might be an easy way to improve frame rate which does not require any code changes: make your compiler emit SSE instructions!

It seems that the current binaries don't use SSE - you don't mention any CPU requirements.

If your code uses floating point calculations in speed critical parts using SSE could result in greatly increased performance.

The only downside is that binaries which contain SSE instructions won't run on CPUs without SSE support. But DF is such a CPU hog that it's not really playable on older CPUs anyway.

SSE1 only requires a Pentium III / Athlon XP or better.
However, if you use double precision reals (type "double" in C/C++) you want SSE2. That needs a Pentium IV or Athlon 64/Opteron or better.

Seriously, try it (should only require setting the appropriate compiler flag and rebuilding). You could keep support for older CPUs with an alternative binary if you want.

Mikademus · « **Reply #1 on:** June 25, 2008, 06:30:27 pm »

What you say about speed increases from using Intel's SSE/SSE2 (Streaming SIMD Extensions) or AMD's 3DNow! were very true for Pentium platforms about 10 years ago. Today, all reasonably modern CPUs have these, have 1 cycle float operations and all compilers automatically optimise for it.

copx · « **Reply #2 on:** June 25, 2008, 08:18:27 pm »

Quote from: Mikademus on June 25, 2008, 06:30:27 pm

What you say about speed increases from using Intel's SSE/SSE2 (Streaming SIMD Extensions) or AMD's 3DNow! were very true for Pentium platforms about 10 years ago. Today, all reasonably modern CPUs have these, have 1 cycle float operations and all compilers automatically optimise for it.

You are simply wrong. SSE 2 did not even exist 10 years ago for starters.
I don't know which compiler Toady uses but GCC (and most other x86 compilers) emit only instructions which can be executed on a 386 by default, because that's the only way to ensure that the resulting code will actually run on all CPUs. If you don't specify SSE use or do it implicitly by targeting a CPU which supports SSE the optimizer won't use any of the advanced floating point instructions (and the related registers) available on modern CPUs.

All current, high end professional apps which are floating point heavy use SSE and Intel certainly didn't create those extensions for no reason.

I repeat my suggest to Toady to try it (if DF actually uses floating point math).

bartavelle · « **Reply #3 on:** June 26, 2008, 04:39:38 am »

SSE is good for scalar operations, not floating point operations. The problem is that compilers aren't most of the time smart enough to decide if code could be converted efficiently, so it has to be done by hand.

Moreover, most "professionnal applications" using these instructions are trivial to vectorize (image and sound processing, for example), whereas the biggest time sinks in DF are likely to get a lot more benefit from an algorithmic optimization than from a local optimization.

Zruty · « **Reply #4 on:** June 26, 2008, 11:01:49 am »

BTW, at some moment I was thinking about the pathfinding optimization.

I assumed that pathfinding and fluid dynamics consume the most of the processing power and tried to think of a better PF algorithm than a brute-force 'wave' Dijkstra.

I don't know whether Toady have already done something about it though, so my thoughts about the current implementation were mere guesses.

Suppose we somehow set a number of 'Flags' (remember Settlers II?) at different popular locations, then pre-compute the paths between them. So any desired path from A to B can be split to
A -> nearest flag F1
B -> nearest flag F2
precomputed F1 -> F2.

Well, then setting the flags themselves could be done manually or by some cunning algorithm...

winner · « **Reply #5 on:** June 26, 2008, 12:08:59 pm »

does anyone know what method the "desktop tower defense" guy uses?

Dame de la Licorne · « **Reply #6 on:** June 26, 2008, 12:13:27 pm »

Quote from: Zruty on June 26, 2008, 11:01:49 am

Suppose we somehow set a number of 'Flags' (remember Settlers II?) at different popular locations, then pre-compute the paths between them. So any desired path from A to B can be split to
A -> nearest flag F1
B -> nearest flag F2
precomputed F1 -> F2.

Well, then setting the flags themselves could be done manually or by some cunning algorithm...

I think this solution would be the best for the pathfinding issues, as long as the player could set the flags. Each of our fortresses are laid out differently (or at least, no two of mine are the same), so we should be able to set the flags for the "most frequently traveled' routes. It should cut down on the pathfinding FPS lags, at the very least.

irmo · « **Reply #7 on:** June 27, 2008, 03:44:27 am »

Quote from: Dame de la Licorne on June 26, 2008, 12:13:27 pm

I think this solution would be the best for the pathfinding issues, as long as the player could set the flags. Each of our fortresses are laid out differently (or at least, no two of mine are the same), so we should be able to set the flags for the "most frequently traveled' routes. It should cut down on the pathfinding FPS lags, at the very least.

On the other hand, it (1) increases congestion (flag-to-flag routes have to carry all of the traffic) and (2) increases travel distance (since, by necessity, nobody is taking the shortest route to anywhere).

On the other other hand, if the player has to set these flags manually, those of us who don't want them can just not set any flags. So it's not a bad idea, just less than optimal in some situations.

Zruty · « **Reply #8 on:** June 27, 2008, 07:01:20 am »

1) I think that at the moment dwarves have some kind of micro-PF algorithm so that they do avoid each other when travelling through, say, a 3-tile wide tunnel in opposite direction. This will remain so and decrease the number of collisions.

Also the system may track the 'load' on each flagged route (i.e. by the nuber of dwarf collisions per step) and try not to overload the routes.

2a) if we are searching for a flag F from point A and suddenly find the point B (this can happen via Dijkstra), we can abort the search and take a direct doute.

2b) I was thinking about large number of flags that are set (semi) automatically - not several flags set manually.

News:

Author Topic: Use SSE (Read 1348 times)

copx

Use SSE

Mikademus

Re: Use SSE

copx

Re: Use SSE

bartavelle

Re: Use SSE

Zruty

Re: Use SSE

winner

Re: Use SSE

Dame de la Licorne

Re: Use SSE

irmo

Re: Use SSE

Zruty

Re: Use SSE