Buildings can be transparent (the floor tile shows through), so that means an additional layer of data, not a replacement.
Certainly. Question is how to format this data so that work is offloaded to the GPU as much as possible, but without bogging it down.
I absolutely want to avoid walking around structures on the CPU determining how a door is drawn in open vs closed state.
One way to do it is by adding 3 additional fields to the map data: building (which building), building coordinate (ie which of the nine tiles of a workshop to use), and building material.
That's 10 bits for material, 5 bits for tile number (trade depot and siege workshop are 5x5), and 7 bits for building type. Note how this lacks open/closed spinning/stopped, build stage, orientation for pumps/waterwheels, etc state. Which add up to 44 bits by my estimation. Now that I got those 44 bits onto the gpu, what do I do with them? Somehow they must ultimately be mapped to graphics, and this is my main problem.
(not to mention bridges, stockpiles, civzones and rooms, their arbitrary dimensions is an entirely separate problem I haven't even thought about)
Another way to do it would be to have a list of buildings with coordinates. This is a sparse array. The GPU can merge it with the dense array of map data or render it as a second pass. The 3 fields are the same.
That would result in the GPU having to walk this list for each tile drawn. Not a very big deal in itself, but I expect to draw 4-6 z-levels, fake floors under trees and such require up to 8 texture lookups, and yet undetermined algorithm to apply stencils to tile borders will eat who knows how much gpu ticks. So brute-force methods don't cut it.
The building list would have to be sortied or indexed the list on the CPU, each game engine frame. But what sort of index? It has to be suitable for craetures and items too.
Also remember that you can use two textures if 1 texture has more than 32 bits of data per location. I don't know if that's your problem, but you mentioned 44 bits of building data being an issue.
44 bits are an issue if you try to use them as an index into some table. Otherwise, GL3.0 textures can hold 32bits per channel, that is quite enough.