Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Poll

Having tested both 2D and STANDARD, how is 40d19 compared to 40d?

Faster, no (unknown) problems
Faster, problematic
Same speed, no (unknown) problems
Same speed, problematic
Slower, no other (unknown) problems
Slower, problematic
Doesn't work at all

Pages: 1 ... 21 22 [23] 24 25 ... 34

Author Topic: FotF: Dwarf Fortress 40d19  (Read 163252 times)

Veroule

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #330 on: March 19, 2010, 06:23:18 pm »

For those that are interested MSVC was optomizing the x*dimy*4 etc in the 'big if' of graphicst::display.

Here is the pertinent assembly:
Spoiler (click to show/hide)
for this c code:
Spoiler (click to show/hide)

For things like that particular if I am a big fan of a union.
typedef union {
 uint_8 array[4];
 uint_32 Value;
} screen_union;

After adjusting all the references then we can make the if
if (screen[x2*dimy*4 + y2*4].Value == screen_old[x2*dimy*4 + y2*4].Value
 && the other tests)

The same can be done with an unint_32* cast.  The pointer cast makes for a much more surgical change but has slightly lower readability to the union method.  I would definitely suggest a pointer cast in this case as it is a good speed gain.  The speed gain comes from both being able to reduces those 4 CMP instructions to one and being able to calculate the address outside of the loops and just increment the address with each loop.  The assembly above shows the calculation and is inside the 2 for loops.

There is a reason we wanted bigger processors.  I already want a 256 bit processor with 128 registers.
« Last Edit: March 19, 2010, 06:30:17 pm by Veroule »
Logged
"Please, spare us additional torture; and just euthanise yourselves."
Delivered by Tim Curry of Clue as a parody of the lead ass from American Idol in the show Psych.

bombcar

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #331 on: March 19, 2010, 07:29:50 pm »

I'd rather have a 64 bit processor with 512 registers, even if some are 128 bits wide. ;)
Logged

Veroule

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #332 on: March 19, 2010, 07:46:41 pm »

You could also eliminate nearly all the multiplications in that section of the code by changing your array strategy from 2 dimensional to contiguous.  In order to make this change replace the calculations in addchar.
Spoiler (click to show/hide)
with
Spoiler (click to show/hide)
and move that function back into graphics.cpp.  There is also the other function that handles setting screentexpos and related variables; same treatment there.

Then display looks something like this
uint_32* s=(uint_32*)screen;
uint_32* so=(uint_32*)screen_old;
for (uint_32 pos=init.display.grid_y*init.display.grid_y-1;pos;pos--) {
 if (s[pos]!=so[pos] && etc) etc
}
Logged
"Please, spare us additional torture; and just euthanise yourselves."
Delivered by Tim Curry of Clue as a parody of the lead ass from American Idol in the show Psych.

Andir

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #333 on: March 19, 2010, 07:52:59 pm »

Sorry, I know I should probably study that code a bit more, but from glancing at what's going on it looks like someone is trying to make a singular dimensional vector to apply to a multidimensional problem...

What's wrong with a multidimensional vector?  The object overhead?  How big is this object we are talking about here?  Screen resolution?
Logged
"Having faith" that the bridge will not fall, implies that the bridge itself isn't that trustworthy. It's not that different from "I pray that the bridge will hold my weight."

PencilinHand

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #334 on: March 19, 2010, 11:33:57 pm »

Sorry, I know I should probably study that code a bit more, but from glancing at what's going on it looks like someone is trying to make a singular dimensional vector to apply to a multidimensional problem...

What's wrong with a multidimensional vector?  The object overhead?  How big is this object we are talking about here?  Screen resolution?

You could also eliminate nearly all the multiplications in that section of the code by changing your array strategy from 2 dimensional to contiguous.
Spoiler (click to show/hide)

If I recall correctly, most CPU's can perform 3 add operations in the same time as one multiplication operation.
Logged

Aachen

  • Bay Watcher
  • Wenzo Pilgrim cancels job: unstuck in time.
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #335 on: March 19, 2010, 11:40:02 pm »

Please pardon the interruption.

When exporting after world-generation, rather than a world-map image, I get repeated screenshots of the world-gen final screen strung together. I'm not sure if it's worth the time to upload an image file for review (and hopefully this hasn't already been discussed to death). I've had the same result with the three worlds I've created in the current version.

Logged
Quote from: Rithol Camus
There is but one truly serious philosophical problem and that is magma.

Quote from: Chinua Achebe
.... For Cliché is pauperized Ecstasy.

Andir

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #336 on: March 20, 2010, 12:13:33 am »

You could also eliminate nearly all the multiplications in that section of the code by changing your array strategy from 2 dimensional to contiguous.
I see no 2 dimensional arrays anywhere (but I haven't looked at the git code...) All I see in these quotes is two contiguous arrays with two different implementations...

screen[x2*dimy*4 + y2*4 + 0]
turned to:
screen[screenx*dimy*4 + screeny*4 + 0]=c;

...or am I missing a conjoined set of brackets "[ x ][ y ]" or something in these quotes somewhere?
« Last Edit: March 20, 2010, 12:15:23 am by Andir »
Logged
"Having faith" that the bridge will not fall, implies that the bridge itself isn't that trustworthy. It's not that different from "I pray that the bridge will hold my weight."

PencilinHand

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #337 on: March 20, 2010, 12:23:10 am »

You could also eliminate nearly all the multiplications in that section of the code by changing your array strategy from 2 dimensional to contiguous.
I see no 2 dimensional arrays anywhere (but I haven't looked at the git code...) All I see in these quotes is two contiguous arrays with two different implementations...

screen[x2*dimy*4 + y2*4 + 0]
turned to:
screen[screenx*dimy*4 + screeny*4 + 0]=c;

...or am I missing a conjoined set of brackets "[ x ][ y ]" or something in these quotes somewhere?

I believe that is the point.  Veroule was suggesting a code change to avoid using 2 dimensional arrays which eliminates the need for multiplication in a performance critical section of code.

------

Please pardon the interruption.

When exporting after world-generation, rather than a world-map image, I get repeated screenshots of the world-gen final screen strung together. I'm not sure if it's worth the time to upload an image file for review (and hopefully this hasn't already been discussed to death). I've had the same result with the three worlds I've created in the current version.

It is a known bug which Toady will have to handle.  Baughn,they guy doing the graphics code, can't touch the code responsible for image export.
"- Image export is generally broken; some don't work, some actually freeze the game. Don't try it."
« Last Edit: March 20, 2010, 12:26:30 am by PencilinHand »
Logged

Andir

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #338 on: March 20, 2010, 12:25:31 am »

That doesn't make sense... there's no need for multiplication with an array of arrays because it's simply a memory pointer.  The current contiguous method requires multiplication to find the proper address space to fiddle with.
Logged
"Having faith" that the bridge will not fall, implies that the bridge itself isn't that trustworthy. It's not that different from "I pray that the bridge will hold my weight."

PencilinHand

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #339 on: March 20, 2010, 12:28:14 am »

That doesn't make sense... there's no need for multiplication with an array of arrays because it's simply a memory pointer.  The current contiguous method requires multiplication to find the proper address space to fiddle with.

I think if we both looked at the .git code then Veroule's suggestion would make more sense.  Otherwise, you will have to wait for someone more competent....
Logged

Andir

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #340 on: March 20, 2010, 12:31:03 am »

Excellent point... stupid lazy me sticking my nose where it doesn't belong.  I should be in bed anyway.
Logged
"Having faith" that the bridge will not fall, implies that the bridge itself isn't that trustworthy. It's not that different from "I pray that the bridge will hold my weight."

Baughn

  • Noble Phantasm
  • The Haruhiist
  • Hiss
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #341 on: March 20, 2010, 03:35:56 am »

Incrementing the pointers instead of recalculating the offset for each tile makes perfect sense, it's just that the arrays didn't use to be contiguous; until quite recently they were always 256x256, never mind the actual grid size.

Well, there we go then. Fixed.
Logged
C++ makes baby Cthulhu weep. Why settle for the lesser horror?

vyznev

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #342 on: March 20, 2010, 09:28:19 am »

That doesn't make sense... there's no need for multiplication with an array of arrays because it's simply a memory pointer.  The current contiguous method requires multiplication to find the proper address space to fiddle with.
While I think it would probably be best to let this subthread die out (or continue it elsewhere), I'd just like to quickly note that it's usually faster to do a multiplication to index a contiguous 2D array than to fetch a pointer from memory for a ragged array.

Of course, this assumes that the multiplier is either constant or already stored in a register, or at least in on-chip cache, and that the pointer isn't.  Since there's only one multiplier, but one pointer for each row, this is likely at least for random access patterns.  Still, YMMV, especially if you're traversing the array row by row.  Also, for the sake of completeness, I should note that things can be very different if you're writing code for a very old or very low-end CPU that lacks a fast multiplier -- but if you're coding for such platforms, there are specific optimization tricks you'll be using anyway, and in any case there's no hope of DF ever running on such systems.
Logged
Climbing is a strength-based skill. Elephants are very strong. Why are you surprised?

CobaltKobold

  • Bay Watcher
  • ☼HOOD☼ ☼ROBE☼ ☼DAGGER☼ [TAIL]
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #343 on: March 20, 2010, 04:03:05 pm »

and in any case there's no hope of DF ever running on such systems.
I...have hope...

How old would "such systems" need to be, anyway?
Logged
Neither whole, nor broken. Interpreting this post is left as an exercise for the reader.
OCEANCLIFF seeding, high z-var(40d)
Tilesets

vyznev

  • Bay Watcher
    • View Profile
Re: FotF: Dwarf Fortress 40d19
« Reply #344 on: March 20, 2010, 06:07:21 pm »

Older than the mid-90s or so, I'd say.  The Amiga 500 and 1200 computers that I learned to code on had slow multiplication (around 40 cycles for the MC68000 CPU in the A500, ~28 for the newer A1200 with its 68020 processor) and small clock multipliers (effectively 1x for fast RAM, 2x for chip RAM), making memory access comparatively fast.  Atari ST and old Apple Macintosh computers used the same Motorola 680x0 series processors.  (The 68060, introduced in 1994, had very fast multiplication, but did not see much desktop use.)  On the PC side, Intel x86 CPUs up to the 486 were quite similar.

During the middle 90s, several things happened at roughly the same time: multipliers got faster, superscalar architecture was introduced to mainstream desktop CPUs, and RAM speeds started lagging behind CPU speeds, necessitating higher clock multipliers.  Put together, these mean that a modern desktop CPU may (under ideal circumstances) be able to do one multiplication every cycle (the latency will be higher, but still only a few cycles) while needing several cycles to fetch data from memory.
Logged
Climbing is a strength-based skill. Elephants are very strong. Why are you surprised?
Pages: 1 ... 21 22 [23] 24 25 ... 34