1. Which of the identified memory structures is the stone/mineral data coming from?
2. What causes the potential inaccuracy?
The root cause of the inaccuracy is that the world is generated using a multiple level of detail approach, and without cloning the generation algorithm with 100% accuracy it is only possible to see what is currently generated in memory. The existing levels of detail are:
- A map of biomes as a grid of embark regions, which you see in the middle map of the embark screen. Every biome has some parameters like one value of elevation and rainfall/savagery/etc. It also has a list of mineral layers, and veins that can be found therein with associated probabilities.
- When you select a region, it generates the map you see on the left. At this point it decides how the biomes associated with the big regions are actually located within it, producing those irregularly shaped blobs. Basically, for every one of the 16x16 embark rectangles in the region it decides which of the 3x3 possible surrounding biomes it contains. It also computes elevation maps for the embark rectangles, and plots the path of the rivers - again only at embark rect level.
- Once you actually embark, it finally generates the detailed map, consisting of all the millions of actual tiles.
Since the actual tiles of veins and layers are only generated in phase 3, it is impossible to tell exactly how many tiles of the given type there will be, only the sort-of average expected amount. Veins in particular are chosen randomly based on their probability, and then their tiles are again randomized. They can also be cut away by caverns, which are again random - the only thing you can tell before phase 3 is the depth range where they will be located. Finally, the way it decides how many soil layers will be present is something of a mystery, so prospect can get it wrong, thus shifting the whole stack in z dimension, and possibly announcing soil types that won't be there at all. Currently it uses a heuristic based on elevation, since that seems to be what the embark screen is doing when deciding whether to display or not display soil on the right - i.e. the higher the elevation the less soil there is, culminating in mountains. It is also possible for adjacent biomes to intrude on the margins of the embark area, like it happens on bigger scale with region biomes; although this is probably mostly irrelevant for calculating veins, only layer materials.
3. How feasible is it to make a script that searches the entire world and provides coordinates for mineral deposits that meet preselected specifications (the most common search being "iron ore + flux")?
4. How feasible is it to make a better site finder, with some kind of UI?
An issue here is that the level of detail phase 2 is also lazily generated, so you would have to feed key presses to the regular embark screen to move its cursor to the region you want; just setting variables won't do it. This is completely feasible, and not that hard at that, except that it may cause confusing flicker in the ui.