0
0x---x---x
| /| /|
| / | / |
| / | / |
x---x---x
| /| /|
| / | / |
| / | / |
x---x---x2
2
OK, this is the most basic way a terrain mesh can be represented. Each X represents a data point, the lines show where the triangles would be drawn. So we can see that to get a 2x2 square grid, we need a 3x3 grid of integers. Also, for each square, we have two triangles.
Each embark square is 48x48 tiles, so this means we need a grid of 49x49, or 2401 ints. Plus to draw it at full resolution we need to draw 4802 triangles. Now assuming a default embark of 4x4 or 16 embark tiles, and we have 2401x16 = 38416 ints per z-level, and 76832 triangles. Now let's assume 200 total z-levels, so we have 7,683,200 data points. The triangles are a non-issue because as I've said several times, you only draw the ones that are visible. This means unless the camera is pulled out really far, you'll never even see the full 76k triangles for a z-level.
Now we don't need to use a 32-bit, or 4 byte int. A 32-bit int can range in value from -2,147,483,648 to +2,147,483,648. We don't need 2 billion z-levels. A 16-bit is just fine with a range of +/- 32768. Unsigned that range becomes 0-65,536. An 8-bit, or 1 byte, int is to small. Unsigned that gives us 255 max z-levels. Given that I think it is possible to have more than that right now, we need to stick with the 16 bit. I really think that 65k z-levels is enough though. So, 7,683,200 * 2 = roughly 15 MB per z-level, so for 200 z-levels we need 15*200=3,000MB or 3GB of memory.
Ok, stop the 'I told you so' right now. We're not done. Is it possible to see all 200 levels at once? No. Even if you had a shaft that reached all the way down, and you had the camera all the way at the top, you wouldn't see all the z levels. All you should reasonably expect to see is a black hole in the ground. Maybe some small detail a few z-levels into the pit, but not more. Plus, 100 of those z-levels is potentially in the sky. What exactly is in the sky? Nothing. So why define nothing? In short, it would be very easy to only keep 10 z-levels in memory, which would only be a 150mb footprint. You could take the optimization further, and implement the mesh such that you only define things of interest. For the vast majority of the underground, is there anything to render besides solid rock? Nope. So why define it? For the hotkey locations, You could keep those levels compressed in memory. The data would compress very, very well, and with multi-threading wouldn't be much of a cpu load. Plus you could keep a history of commonly accessed z-levels in compressed memory. The rest of the levels would stay on disk, swapped in/out of memory as needed.
And finally you could always just give the middle-finger to 32bit OS users and make DF 64bit only.
Yes there are technical issues, but they aren't unsolvable. I hope this helps with the understanding of this. Please keep in mind that using a terrain mesh like system is only one way of doing this. There are other approaches as well. This is just the 'brute force' method and easiest to explain.