Topic: Dwarf Fortress 0.28.181.40c Released (Read 56269 times)

mattmoss · « **Reply #105 on:** September 11, 2008, 01:07:08 pm »

The main slowness has been because the game is using immediate-mode OpenGL. This means the CPU has to wait on every command sent to the GPU. It is one of the easiest ways to get something up and running, but also the least efficient.

I believe Toady is working with someone (can't recall name, offhand, sorry) to incorporate a much more efficient rendering technique (VBOs) into the game, which means the CPU will stop waiting on the GPU for the most part.

At that point, graphic speed should remain a non-issue, and CPU should be much better as all that excessive CPU-GPU communication will be gone.

Baughn · « **Reply #106 on:** September 11, 2008, 01:18:21 pm »

Me, possibly among others.

Mmh. Immediate mode doesn't do what you said, though - it doesn't force synchronization on every call. That *might* be how it works today, but I rather doubt it.

Instead, in immediate mode the calls are supposed to build up some sort of scene description inside the opengl driver, that is then transferred to the GPU when you flush. Or when you hit the buffer size limit. In any case, while that can be *fairly* fast if it's well optimized, the unfortunate fact is that it isn't well optimized in modern drivers - and the code is bit-rotting more every month, since nobody uses it.

It's even deprecated in opengl 3. Yep, time to replace it.

Jorgon · « **Reply #107 on:** September 11, 2008, 01:57:21 pm »

Supporting windows software rendering is worse than I initially thought. Well the code was fairly easy, but there are not any optimizations I can do that can match Toady's PARTIAL_PRINT. I can get 250% performance increases, but that still only gets the framerate up to 80 fps on my virtual machine (from 25 with normal, no partial print) with full processor usage during the hottest parts of battle. In contrast, Toady's partial print code is consistently 100, which is the limit in the init file.

I mean it is a remarkable improvement, and I really cannot complain, but I am my own worst critic, and am disappointed I couldn't get it up to 100.

I can repost the new code if toady wants it, but there is still one bug that I know of that needs tied up (game screen resoultion greater than maximum texture size) and the code needs to be cleaned up a bit before it is production worthy. It is also a lot more code than just the previous release.

The point seems rather moot because under the software renderer, toady's partial code works correctly, is fast, and does not have corruption problems.

Toady:
It would be good to get your feedback so far on what you are thinking about the proposed solutions.

Jorgon · « **Reply #108 on:** September 11, 2008, 02:15:32 pm »

Figures that 2 more responses posted while I was minding my P's and Q's writing up that last response, double checking everything and even writing more code.

Am I wrong thinking it is dumb to make a text game require a geforce 6+?

The fact is that Toady's partial print works great for the speedup, is simple, and quick, even under pure software rendering. The only problem it has is (at least) the nvidia drivers invalidate the back buffer if the window is minimized (and a few other cases). Why gut it?

Baughn · « **Reply #109 on:** September 11, 2008, 03:53:53 pm »

Hey, I wasn't going to require a geforce 6. The pixel-shader approach is good enough, and those were actually introduced with the geforce *3*.

For cards older than that, I question the usefulness of supporting them at all. A software fallback, maybe - 2d output, then - but not opengl; there'd be no point. I'd also question the sanity of anyone running DF on such a slow machine, but that's their issue; the nanofortress would work, I guess, so supporting it is fine. With 2d.

mattmoss · « **Reply #110 on:** September 11, 2008, 04:28:03 pm »

Quote from: Baughn on September 11, 2008, 01:18:21 pm

Mmh. Immediate mode doesn't do what you said, though - it doesn't force synchronization on every call. That *might* be how it works today, but I rather doubt it.

True... mattmoss has used terminology incorrectly. mattmoss is about to die.

I'm not sure what I was thinking there... Still, even if the CPU isn't trying to synch with the GPU, there is serious overhead in those calls, probably due to bitrot as you say. Even if optimized, immediate mode would not be the fastest way to get things done.

Hmm.... now I have a hankering for my dwarf fortress to be a Gauntlet level (old-school Gauntlet, not the crappy 3d Gauntlet). Cat generator, anyone?

Toady One · « **Reply #111 on:** September 11, 2008, 07:10:36 pm »

As far as copying textures versus VBOs, I think I vaguely understand what's going on, though my lack of keeping up with this stuff is what put us in this position in the first place. Gutting any of the BC code isn't that big a deal to me compared to compatibility, both forward and backward, which is where there seems to be some disagreement, and I don't really have answers for these things. Immediate mode is now deprecated, whatever that means for the driver writers -- are texture copies based on rendering a single large immediate mode quad guaranteed to perform even in the near future on newer cards? It's generally modern cards/drivers that have been having problems with my current code, as far as I can tell from the reports, as mine has no trouble eating up the immediate mode calls. On the other hand, with VBOs, when you (Baughn) say handling older cards "with 2d", what does this mean? Something other than OpenGL? I remember some scary things from OpenGL 2d functions, like it turning all the pixels into quads or something. It was very slow, but that was many years ago and I have no clear memories.

As far as GLUT vs SDL vs SFML, yeah, I dunno. Again. The SDL people seem confused about their own license, and they certainly aren't clear about it. SFML has mentions of GPL and LGPL external libraries that I'm not sure about -- do those cause the same issues (I'm not sure how it's all bundled -- I don't plan on using OpenAL for example, but maybe it and the LGPL come along for the ride anyway)? I don't remember anything about GLUT licensing and portability, but somebody probably told me something at some point.

Igor Savin · « **Reply #112 on:** September 12, 2008, 07:43:11 am »

Quote from: Toady One on September 11, 2008, 07:10:36 pm

As far as GLUT vs SDL vs SFML, yeah, I dunno.

SDL:

LGPL, which requires

1) Link with the library as a shared object (e.g. SDL.dll or libSDL.so)
OR
2) Provide the object or source code

Option 1 seems rather reasonable.
But SDL is painfully slow...

SFML:

"SFML is completely free for any use, commercial or not, open-source or not. That is, you can use SFML API in your project without any restriction"

GLUT (btw, they recommend other alternatives: http://www.opengl.org/resources/libraries/windowtoolkits/)

is old and hairy, it'd be better to use FreeGLUT:
http://freeglut.sourceforge.net/
it uses X-Consortium license:
http://opensource.org/licenses/mit-license.html

Jorgon · « **Reply #113 on:** September 12, 2008, 09:39:56 am »

Warning: Below is a long, technical rambling by a man currently questioning his own sanity while looking at message preview.

Spoiler (click to show/hide)

Going to sdl (or related) probably wouldn't be that difficult, but it makes it harder and slower to use opengl later on for visualizations. There would also be little to no acceleration. SFML sounds interesting, I had never heard of it before. I will have to check it out for my own projects.

I am really neutral in all of this. I don't care if my or anybody else's code gets used. I want what is best for DF, and only the Super Toad Bros. can decide that. I just want them to work on it until the logic optimizations are complete, which is the real bottleneck.

dreiche2 · « **Reply #114 on:** September 12, 2008, 11:50:15 am »

Quote from: Igor Savin on September 12, 2008, 07:43:11 am

Quote from: Toady One on September 11, 2008, 07:10:36 pm
As far as GLUT vs SDL vs SFML, yeah, I dunno.
SDL:

LGPL, which requires

1) Link with the library as a shared object (e.g. SDL.dll or libSDL.so)
OR
2) Provide the object or source code

Option 1 seems rather reasonable.
But SDL is painfully slow...

As discussed a page or two ago, the license actually says that you also have give "prominent" notification and mention the SDL copyright plus a reference to the license file in your program if you display any copyrights at all (which Toady does). The FAQ on the SDL website does not mention this.

Btw, I asked at the SDL mailing list about how the licensing works, and that's the only reply so far:

Quote

If you distribute a binary unmodified sdl.dll/so - you need to include an
sdl-license.txt that is the LGPL file as well as information within said txt
file about where to fetch the original sauce (libsdl.org etc).

Prominent notice can undoubtedly be something as simple as the RAD game
tools thing that's displayed during the Blizzard splash screens, or in help,
etc. Alot of things make use of LGPL libraries but don't blatantly
advertise them because they're very common libraries.

I would suggest a 'uses sdl, see readme or sdl.txt or something for more
info' line somewhere.

Many games have more than one splash screen, such a line would be easy to
stuff in one of them.

-Will

Don't know if that comes from an authoritative source, though.

Baughn · « **Reply #115 on:** September 12, 2008, 04:52:56 pm »

Quote from: Jorgon on September 12, 2008, 09:39:56 am

According to my research, VBOs were finalized and approved in 2003 (http://oss.sgi.com/projects/ogl-sample/registry/ARB/vertex_buffer_object.txt), and the next chip redesign for nvidia was the Geforce 6 (http://en.wikipedia.org/wiki/Geforce).

In general, such performance enhancements are first supported via ARB extensions, quite a while before they're added to the base opengl. The geforce 6 supports *everything* in the opengl standard it claims to support - it kinda has to - but they can pick and choose which ARBs to support.

Using the ARB api, VBOs were introduced quite a while before that.

Quote

ARB fragment shaders were approved of about the same time in 2003 (http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_shader.txt), so again, Geforce 6.

The earlier cards did have pixel shaders, however, and the code was backported to run on at least some of these. I have no idea which ones; I don't generally save such ancient hardware. Probably should..

Quote

Both options were available starting with OpenGL 2.0.

See my above statement on ARBs.

Quote

To be fair, framebuffer_object was approved in 2005, which is after VBOs and GLSL. These cards are old enough that normal PARTIAL_PRINT should not have an issue though.

Geforce 3 only has GL 1.2 compatibility http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units, so while it supported pixel shaders in DirectX, I cannot find any indication it was supported in OpenGL.

Etc. Again, this is more theory than fact; the macintosh geforce 3 profile simulator certainly runs my code, but that's the lowest profile it's *got*, which means I don't know if it'd *ever* fail.

Quote

Geforce 5 was the first listed with 2.0 support, but it is marked with "**" on the table, but the only note I could see was that its support was incomplete with no details. Geforce 6 was first card with full OpenGL 2.0 support.

I welcome corrections on any of this.

Immediate Mode is deprecated in OpenGL 3.0, but OpenGL 2.0 and 3.0 are not mutually exclusive on a video card. Immediate Mode will always be supported (at least for the next decade or so) for 1.1 - 2.0 profiles. DirectX is (theoretically) backward compatible all the way back to v1. It may be slow as a dead snail when you push more than a few thousand polys through it, but with PARTIAL_PRINT, most of the time you are not drawing more than 20 a frame.

True, deprecated doesn't mean removed.
In practice, it means it'll work well only on workstation-class boards - Quadros and such, which are expected to run very old software, and whose cost is chiefly in driver development to support that. Gamer boards.. aren't.

Quote

From my understanding, Immediate mode commands are queued, organized, and optimized by the driver, and it actually sends it to the video card asynchronously when it is deemed best, which will be forced by glFlush, glFinish, or SwapBuffers.

Yep, that's how it's supposed to work. Simple-simple.

Quote

All of this goes to show just why game developers prefer directx.

Huh? Immediate mode is a *simple*, *easy to use* API. Why would its existence scare anyone to using the far more complex directx api?

Quote

A VBO (Vertex Buffer Object) solution is fairly easy to understand, all of your vertices and texcoords are kept inside an array, which is transfered to the video card, and drawn with a single function call. The downside is you can only have a few (1-4) textures active at any point in time, so you would have to sort the vertices by texture, create a VBO for each texture (or 4, but the more textures you use at a time, the more incompatible it is), and render them one at a time. The solution to this is to put all textures in a single texture on the video card. Because the VBOs would be changing every frame, all of them would have to be recalculated and reuploaded to the video card every frame, which will be slower than Immediate mode with PARTIAL_PRINT.

It might not be slower, actually. You'd only be uploading the new texture coordinates each frame; they don't have to be in the same VBO as the vertex coordinates. The chief problem is that adding colors (foreground/background) gets complex, but this is probably a good solution. It just isn't as simple as the pixel-shader approach.

Quote

The pixel shader solution is probably the hardest to code/understand. All of your rendering would be done inside a pixel (aka Fragment) shader, so you would draw a single fullscreen quad with the fragment shader active. The fragment shader would have a texture active that holds a combined image of all the individual tiles (like in VBO), and a texture holding metadata about what char/colors a tile should be. The fragment shader would run for every pixel, and have to calculate which tile it is on, lookup the texture/color information in the second texture, calculate which texture fragment it needs to pull from the first texture, and output that. All of this is done inside of a block of text with little to no debugging. The rest of the application has to generate the meta texture every frame, and upload it before rendering the quad.

Yep. That's the way my DF Accelerator works, and the way I'm aiming at if I can ever completely unravel BC's code.

It's really not that complex. The pixel shader is about ten lines of code, all told. Oh, and you can upload the texture via a PBO - same principle as the VBO.

It does put a considerably higher load on the GPU than a fixed-functionality path using VBOs would, though.

Quote

Render to texture uses the existing code, in partial print mode. A texture the same size as the game window is created, and set as the render target. The rest of the render code is called as normal. Afterwards, the render target is set back to default, which is the backbuffer, and a fullscreen quad is drawn using the rendered texture. All code is reused.

This method would indeed be easy to implement, but seems a bit silly. It's really just a workaround for buggy drivers.. granted, it succeeds admirably at that, but who's to say other drivers won't bug out on drawing /to the texture/?

Quote

Feel free to correct me in which way you are implementing it (I actually am interested in what techniques other developers use), I am assuming based on my own knowledge of the subject.

For all three methods, if a card does not support the required features, it needs a second graphics path, which would just be the normal PARTIAL_PRINT. Any card old enough to not support any of those features would be old enough to not have a problem with normal PARTIAL_PRINT. Partial print as is, in a virtual machine, with software rendering, is perfectly fine for playing at less than 5% processor usage.

Yeah, partial print is good for 2D mode; it works well when little changes. It's not faster than opengl should be, but it can at least not be *slower*, most of the time.

Quote

All three methods will work, but the question boils down to your priorities. Do you want it easy to integrate/update/maintain, or marginally faster?

But.. they're all simple. Given that, I'll take "faster". ^_^

Quote

Going to sdl (or related) probably wouldn't be that difficult, but it makes it harder and slower to use opengl later on for visualizations. There would also be little to no acceleration. SFML sounds interesting, I had never heard of it before. I will have to check it out for my own projects.

Yes, some notion of capabilities, and preferably a semantic API of some sort, would be preferable when having two such very different output methods.

Sadly, that seems at least two levels above Toady's design skills - frankly, it'd be a lot of work for anyone, even with a clean codebase. :/

Given that, just standardizing on opengl might be the best thing.

Quote

I am really neutral in all of this. I don't care if my or anybody else's code gets used. I want what is best for DF, and only the Super Toad Bros. can decide that. I just want them to work on it until the logic optimizations are complete, which is the real bottleneck.

I'd hope this goes for everyone. When you see several *thousand* percent speed improvements for even quick hacks like an opengl shim, though, it's hard not to start hacking things apart.

Jorgon · « **Reply #116 on:** September 13, 2008, 09:47:29 am »

Most of the reply is fine, I will only reply to a few specific points. Most people are probably not interested in this conversion at all, so I will (try to) keep it brief(er).

I know things are supported in card specific extensions first, (and this is the thing I am unsure of), if they are supported as card specific extensions, do you have to code for all of the card specific extensions, or can you just code for the ARB?

Quote

True, deprecated doesn't mean removed.
In practice, it means it'll work well only on workstation-class boards - Quadros and such, which are expected to run very old software, and whose cost is chiefly in driver development to support that. Gamer boards.. aren't.

I don't agree, I think 1.1 and 2.0 profiles will always exist for backward compatibility. Only time would settle this question.

My experience with Quadros, the driver itself is the only thing making it a Quadro (and a hardware lock on a gamer board keeping it from using quadro drivers).

Quote

Huh? Immediate mode is a *simple*, *easy to use* API. Why would its existence scare anyone to using the far more complex directx api?

This was more a reference to how complicated writing for extensions is. DirectX starts off by saying what needs to be supported for a specific version compatibility. By creating a 9.0c context, you get all of 9.0c's features. OpenGl starts off with a basic base, and lets rendering implementers add support for specific extensions. I do agree that OpenGL is simpler, but now that I have had experience with VBOs, PBOs, render targets and all, DirectX is going to be much simpler to understand. Like how you have to try writing something yourself to understand why they did it that way.

Quote

It might not be slower, actually. You'd only be uploading the new texture coordinates each frame; they don't have to be in the same VBO as the vertex coordinates. The chief problem is that adding colors (foreground/background) gets complex, but this is probably a good solution. It just isn't as simple as the pixel-shader approach.

I was trying to keep the explanations as simple as possible, there are still ways of optimizing them further in almost all of the solutions.

Quote

Yep. That's the way my DF Accelerator works, and the way I'm aiming at if I can ever completely unravel BC's code.

It's really not that complex. The pixel shader is about ten lines of code, all told. Oh, and you can upload the texture via a PBO - same principle as the VBO.

It does put a considerably higher load on the GPU than a fixed-functionality path using VBOs would, though.

How does your code work under GRAPHICS mode? What happens when it is time to replace the menus to add scroll bars? There are a ton of interface tweaks that will be upcoming that don't fit into this solution. I would rather have Toady fighting against intermediate mode than against fragment shaders.

Quote

This method would indeed be easy to implement, but seems a bit silly.

I think it is silly to REQUIRE fragment shaders for a text game. There are two types of players, ones that have semi recent hardware, and those who have no hardware. Why tell the ones that don't have hardware that they cannot play when you dont need to. Fragment shaders should be left for what they are good at.

Quote

It's really just a workaround for buggy drivers.. granted, it succeeds admirably at that, but who's to say other drivers won't bug out on drawing /to the texture/?

If it is going to bug out on a RTT, not a single game would work. My bet is almost every game uses imposters (speedtree makes heavy use of them if I remember correctly). I remember Vampire: The Masquerade: Redemption used them for live views of the character heads in the GUI. Claiming the drivers may bug out could be applied to any of the 3 solutions.

Quote

Yeah, partial print is good for 2D mode; it works well when little changes. It's not faster than opengl should be, but it can at least not be *slower*, most of the time.

But 99.9% of the time only a dozen or so tiles change in any one frame. That is why partial print works in software mode as well as it does.

Quote

But.. they're all simple. Given that, I'll take "faster". ^_^

They are all simple to YOU, but the question is, which is simpler for Toady. He is the one that has to update it for all upcoming goodness in future releases. VBOs and Pixelshaders are all well and good, but RTT would use the code Toady already has, written by himself, and that he understands. He can also render whatever he wants, and it will just work.

Why optimize a further 1%, just to make it harder to code with later?

So much for keeping it brief... /sigh

Electronic Phantom · « **Reply #117 on:** September 24, 2008, 11:23:32 pm »

I dunno if it's just me, Jorgon, but when I compile your code I just get a blank white window.

In any case, it's nice to poke through code I don't understand trying to glean tidbits of wisdom.

-(e)EP

Jorgon · « **Reply #118 on:** September 26, 2008, 03:09:51 pm »

White screen means that you either have no video acceleration, or an extremely limited one. Does PARTIAL_PRINT work in the original Battle Champs? (Without my changes)

Electronic Phantom · « **Reply #119 on:** September 27, 2008, 02:03:53 pm »

The BC without your code changes works fine with pretty_print... partial_print...

In any case... if your code depends on video accelleration, I'm not surprised it's having issues. I have an oldish stock graphics card (GF-II I believe).

I wouldn't be able to tell non-VA dependant code from VA dependant code in any case.

-(e)EP

News:

Author Topic: Dwarf Fortress 0.28.181.40c Released (Read 56269 times)

mattmoss

Re: Dwarf Fortress 0.28.181.40c Released

Baughn

Re: Dwarf Fortress 0.28.181.40c Released

Jorgon

Re: Dwarf Fortress 0.28.181.40c Released

Jorgon

Re: Dwarf Fortress 0.28.181.40c Released

Baughn

Re: Dwarf Fortress 0.28.181.40c Released

mattmoss

Re: Dwarf Fortress 0.28.181.40c Released

Toady One

Re: Dwarf Fortress 0.28.181.40c Released

Igor Savin

Re: Dwarf Fortress 0.28.181.40c Released

Jorgon

Re: Dwarf Fortress 0.28.181.40c Released

dreiche2

Re: Dwarf Fortress 0.28.181.40c Released

Baughn

Re: Dwarf Fortress 0.28.181.40c Released

Jorgon

Re: Dwarf Fortress 0.28.181.40c Released

Electronic Phantom

Re: Dwarf Fortress 0.28.181.40c Released

Jorgon

Re: Dwarf Fortress 0.28.181.40c Released

Electronic Phantom

Re: Dwarf Fortress 0.28.181.40c Released