Ran it through a profiler and found that each frame was making in excess of 200,000 (!!) openGL draw calls! More importantly, I noticed that every tile on the screen was enclosed in its own glBegin() glEnd() block. Remember, each glBegin ... glEnd will force a synchronization between the cpu and the gpu! This is unnessicary as long as you're only passing vertices and texcoords and colors. Put the entire background color pass into a single glBegin ... glEnd
The worse problem is that during the foreground pass, you're calling bind texture for every single tile. This is VERY VERY bad, since you're forcing the GPU to do say 128*96=12,288 texture state changes per frame! These are expensive, so it's very important to avoid doing as many as possible.
One thing to do would be to sort the tiles to be drawn by the texture they use, thus preventing you from needing to call glBindTexture more than once per unique texture.
However, the best thing to do is probably to simply pack all textures into a single large texture, a texture atlas, just like graphics sets are stored in bmp's. This way, instead of calling glBindTexture, you can simply pass in the texCoords of the appropriate texture to the vertices. Now you only need to call glBindTexture once.
These changes should increase performance dramatically. In the profiler, the framerate more than tripled when I toggled the eliminate draw calls option! This means that at least 2/3rds of the frame time was spent in openGL calls!
Pseudocode:
onFramedraw
glBindTexture(texture atlas)
glBegin()
for each tile
glColor(background color)
glVertex(...)
glTexCoord(coord of corner of blank tile in texture atlas)
glColor(foreground color)
glVertex(...)
glTexCoord(coord of corner of tile to render within the atlas)
....
end for
glEnd()
frameEnd