Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: one executable per cpu arch (i.e. core2, corei7) to take advantage of SSE etc  (Read 1856 times)

anallyst

  • Bay Watcher
    • View Profile
    • github repos

the standard binary is most likely compiled for i486 compatibility and could in theory even run on a 486.
that's nice, though modern processors have many features that can be automatically used via compiler optimization.
to take advantage of this, one only has to recompile with a switch like this: "gcc -O3 -march=core2".
that allows the compiler to use MMX, SSE1-4 and friends for things like memcmp and memcpy (could i.e. fetch one 128bit junk of mem in one cpu cycle) and in many other cases.

DF could be distributed like this
Code: [Select]
downloads
  windows
    df_31_13_win_main.zip (includes everything as usual)
    df_31_13_win_corei7-bin.zip (includes only Dwarf Fortress.exe with corei7 specific optimization)
    df_31_13_win_core2-bin.zip (includes only Dwarf Fortress.exe with core2 specific optimization)
    df_31_13_win_pentium4-bin.zip (includes only Dwarf Fortress.exe with p4 specific optimization)
...etc.
that would cost nearly no effort, and speed up things for most of us (i'm pretty sure nobody uses a cpu below pentium4 for DF anyway, so MMX and SSE could speed up things already)
Logged
how to be first one to get mayday tileset after toady released a new version: https://github.com/rofl0r/df-mayday

jei

  • Bay Watcher
    • View Profile

the standard binary is most likely compiled for i486 compatibility and could in theory even run on a 486.
that's nice, though modern processors have many features that can be automatically used via compiler optimization.
to take advantage of this, one only has to recompile with a switch like this: "gcc -O3 -march=core2".
that allows the compiler to use MMX, SSE1-4 and friends for things like memcmp and memcpy (could i.e. fetch one 128bit junk of mem in one cpu cycle) and in many other cases.

DF could be distributed like this
Code: [Select]
downloads
  windows
    df_31_13_win_main.zip (includes everything as usual)
    df_31_13_win_corei7-bin.zip (includes only Dwarf Fortress.exe with corei7 specific optimization)
    df_31_13_win_core2-bin.zip (includes only Dwarf Fortress.exe with core2 specific optimization)
    df_31_13_win_pentium4-bin.zip (includes only Dwarf Fortress.exe with p4 specific optimization)
...etc.
that would cost nearly no effort, and speed up things for most of us (i'm pretty sure nobody uses a cpu below pentium4 for DF anyway, so MMX and SSE could speed up things already)

This would be very nice to have, as would multicore and 64bit cpu support. Multicore use would also bring instant relief to severe performance problems bugging many, and IMHO, would be relatively easy to implement.
Logged
Engraved on the monitor is an exceptionally designed image of FPS in Dwarf Fortress and it's multicore support by Toady. Toady is raising the multicore. The artwork relates to the masterful multicore support by Toady for the Dwarf Fortress in midwinter of 2010. Toady is surrounded by dwarves. The dwarves are rejoicing.

Fishbreath

  • Bay Watcher
  • [AVATAR HERE]
    • View Profile
    • Many Words

IMHO, would be relatively easy to implement.

Multicore use is never easy to implement, because shared memory is a royal pain if you haven't planned on using it.

Delta

  • Escaped Lunatic
    • View Profile

The most speedup usually occurs when programming for special processor features. The game should be compiled as usual and specifically for a processor; both versions should be compared with a profiler. I doubt, there will be a big difference.
If there is, there is still the problem of compiler bugs when compiling for a special architecture. These could result in bugs only on that special platform.
I think, such a risk is only justified by a great speedup.
Logged

Nahno

  • Bay Watcher
    • View Profile

the standard binary is most likely compiled for i486 compatibility and could in theory even run on a 486.
that's nice, though modern processors have many features that can be automatically used via compiler optimization.
to take advantage of this, one only has to recompile with a switch like this: "gcc -O3 -march=core2".
that allows the compiler to use MMX, SSE1-4 and friends for things like memcmp and memcpy (could i.e. fetch one 128bit junk of mem in one cpu cycle) and in many other cases.

DF could be distributed like this
Code: [Select]
downloads
  windows
    df_31_13_win_main.zip (includes everything as usual)
    df_31_13_win_corei7-bin.zip (includes only Dwarf Fortress.exe with corei7 specific optimization)
    df_31_13_win_core2-bin.zip (includes only Dwarf Fortress.exe with core2 specific optimization)
    df_31_13_win_pentium4-bin.zip (includes only Dwarf Fortress.exe with p4 specific optimization)
...etc.
that would cost nearly no effort, and speed up things for most of us (i'm pretty sure nobody uses a cpu below pentium4 for DF anyway, so MMX and SSE could speed up things already)

To my understanding the slowdown caused by the algorithms, such as path finding, is by far the most significant. As such, optimizing for (current) hardware will not help much. It may be enough to push the (subjective) fatal limit of frame rate for a fort a bit further into the future; I wouldn't know.
Logged

anallyst

  • Bay Watcher
    • View Profile
    • github repos

To my understanding the slowdown caused by the algorithms, such as path finding, is by far the most significant. As such, optimizing for (current) hardware will not help much. It may be enough to push the (subjective) fatal limit of frame rate for a fort a bit further into the future; I wouldn't know.
gcc does a pretty good job on optimizing, it even supports auto-vectorization. this would use the SIMD instructions, whenever possible. it can not beat handtuning of the algorithms by an expert, but it should give a nice performance speedup.
i thinks it's worth a try, if it gets you 15 vs 13 fps this can be the little difference to have an acceptable smooth framerate.

btw, i'd really suggest ToadyOne or Baughn (whoever makes the linux build) to use the newest GCC 4.6, which has some really nice new optimizations.
additionally, the whole-program-optimization could be taken advantage of. http://gcc.gnu.org/wiki/LinkTimeOptimization
Also, statically linking of the required libraries would allow certain optimizations to take place, the biggest one being called directly and not via position-independant-code as needed by a shared object (.so), and it would allow execution on 64 bit linux without 32bit libraries.
Logged
how to be first one to get mayday tileset after toady released a new version: https://github.com/rofl0r/df-mayday

Dwarf

  • Bay Watcher
  • The Light shall take us
    • View Profile

IMHO, would be relatively easy to implement.

Multicore use is never easy to implement, because shared memory is a royal pain if you haven't planned on using it.

As far as I can see, this is not about using multiple cores, but using one core more efficiently.
Logged
Quote from: Akura
Now, if we could only mod Giant War Eagles to carry crossbows, we could do strafing runs on the elves who sold the eagles to us in the first place.

Fishbreath

  • Bay Watcher
  • [AVATAR HERE]
    • View Profile
    • Many Words

The thread as a whole isn't; the place I quoted is. I'm all for wringing whatever few extra cycles out of the game that we can.