According to my research, VBOs were finalized and approved in 2003 (
http://oss.sgi.com/projects/ogl-sample/registry/ARB/vertex_buffer_object.txt), and the next chip redesign for nvidia was the Geforce 6 (
http://en.wikipedia.org/wiki/Geforce).
ARB fragment shaders were approved of about the same time in 2003 (
http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_shader.txt), so again, Geforce 6.
Both options were available starting with OpenGL 2.0.
To be fair, framebuffer_object was approved in 2005, which is after VBOs and GLSL. These cards are old enough that normal PARTIAL_PRINT should not have an issue though.
Geforce 3 only has GL 1.2 compatibility
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units, so while it supported pixel shaders in DirectX, I cannot find any indication it was supported in OpenGL.
Geforce 5 was the first listed with 2.0 support, but it is marked with "**" on the table, but the only note I could see was that its support was incomplete with no details. Geforce 6 was first card with full OpenGL 2.0 support.
I welcome corrections on any of this.
Immediate Mode is deprecated in OpenGL 3.0, but OpenGL 2.0 and 3.0 are not mutually exclusive on a video card. Immediate Mode will always be supported (at least for the next decade or so) for 1.1 - 2.0 profiles. DirectX is (theoretically) backward compatible all the way back to v1. It may be slow as a dead snail when you push more than a few thousand polys through it, but with PARTIAL_PRINT, most of the time you are not drawing more than 20 a frame.
From my understanding, Immediate mode commands are queued, organized, and optimized by the driver, and it actually sends it to the video card asynchronously when it is deemed best, which will be forced by glFlush, glFinish, or SwapBuffers.
All of this goes to show just why game developers prefer directx.
A VBO (Vertex Buffer Object) solution is fairly easy to understand, all of your vertices and texcoords are kept inside an array, which is transfered to the video card, and drawn with a single function call. The downside is you can only have a few (1-4) textures active at any point in time, so you would have to sort the vertices by texture, create a VBO for each texture (or 4, but the more textures you use at a time, the more incompatible it is), and render them one at a time. The solution to this is to put all textures in a single texture on the video card. Because the VBOs would be changing every frame, all of them would have to be recalculated and reuploaded to the video card every frame, which will be slower than Immediate mode with PARTIAL_PRINT.
The pixel shader solution is probably the hardest to code/understand. All of your rendering would be done inside a pixel (aka Fragment) shader, so you would draw a single fullscreen quad with the fragment shader active. The fragment shader would have a texture active that holds a combined image of all the individual tiles (like in VBO), and a texture holding metadata about what char/colors a tile should be. The fragment shader would run for every pixel, and have to calculate which tile it is on, lookup the texture/color information in the second texture, calculate which texture fragment it needs to pull from the first texture, and output that. All of this is done inside of a block of text with little to no debugging. The rest of the application has to generate the meta texture every frame, and upload it before rendering the quad.
Render to texture uses the existing code, in partial print mode. A texture the same size as the game window is created, and set as the render target. The rest of the render code is called as normal. Afterwards, the render target is set back to default, which is the backbuffer, and a fullscreen quad is drawn using the rendered texture. All code is reused.
Feel free to correct me in which way you are implementing it (I actually am interested in what techniques other developers use), I am assuming based on my own knowledge of the subject.
For all three methods, if a card does not support the required features, it needs a second graphics path, which would just be the normal PARTIAL_PRINT. Any card old enough to not support any of those features would be old enough to not have a problem with normal PARTIAL_PRINT. Partial print as is, in a virtual machine, with software rendering, is perfectly fine for playing at less than 5% processor usage.
All three methods will work, but the question boils down to your priorities. Do you want it easy to integrate/update/maintain, or marginally faster?