I've done some DFHack structure exploration, and think I've finally gotten a handle on how the mid level tile "edges" work, i.e. how the biome of tiles flow into adjacent ones. In practical terms, that means it may be possible to actually determine if a mid level tile will get small pieces of different biomes, but there are significant caveats when it comes to actually using that info.
How (I think) it works [Technical, can be skipped]:
- For each mid level tile side, information exists whether the main biome of the tile should be used, or if the neighboring tile's one should be used, i.e. if biomes should flow in or out. In most cases it doesn't matter, as it's the same on both sides, but the code has to handle the situation.
- Similarly, for each corner there's info on which of the 4 tiles that meet there that should be used to provide the corner information for the corresponding corner of all of the 4 tiles.
- There's info on how the sides of a tile is split into one side section and 2 corner section (the corners need their other side's info to be completely defined).
- The split of sides into sections for a starting point for DF's (RNG seed driven) algorithm for fighting it out between the "intruder" and "defending" biome, as well as between different "intruders". This can result in the boundary between sections ending up in different places from the one defined, and it can also have the "intruder" pushed back completely along part of the line. This means the "foreign" biomes can vary much in size, with no external info on how large they'll be.
- I GUESS an "intruder" biome can only be pushed back to the border, but not spilled over backwards (i.e. that the intrusion direction will always hold, so you can reliably predict that no foreign biome will appear along a section).
- I furthermore GUESS that is should be possible, although probably extremely rare, that an "intruder" biome is completely expelled, resulting in no foreign tile where the initial info stated there should be one. If it happened, there would be a completely straight border line for that section, assuming the previous guess holds.
- The term "biome" isn't completely accurate, as apart from the biome and geo biome, the information also covers Elevation, which DF defined on a mid level tile level, and so can vary even if the actual biome is the same.
Complications:
- The mid level tiles at the border of each world tile would need to potentially import information from mid level tiles belonging to another world tile. While this isn't a problem after embark, it is pre embark, because then this information exists only for a single world tiles at that time (3*3 world tiles post embark). This is a rather significant issue, because taking all information into consideration would require either a very significant cashing of information bloating the memory footprint, or scanning the world twice on the first search (the relevant border information can probably be cashed during the first scan, as it shouldn't be too much data), because regardless of which order you scan the world in, world tiles bordering the unscanned tiles rely on information that DF hasn't generated yet for those tiles. A compromise of sorts is to not consider the border tiles for embarks during the first scan.
- When do we want to take "foreign" biome incursions into mid level tiles into consideration? That's actually not a straight forward issue: For getting a flat embark we definitely want to take it into account, but if you're looking for iron, you won't be very happy to find the "iron" consists of a dozen in-game tiles of the geo biome, and they may not even have any vein in it...
- Savagery/Evilness parameters: Most of the time you probably want incursions to be taken into account.
- Aquifer: You probably want to take it into account all the time.
- Flat: All the time.
- Clay and Sand only need a single tile to access the resource, but incursions tend to be (dangerously) close to the border, but all the time would probably be good enough.
- Coal/Flux: Incursions should probably be ignored, as you want decent volumes.
- Soil: Don't know...
- Evil/Syndrome rain/reanimation: Probably situation dependent...
- Min/Max Biome count, region type, biome: You'd probably want to take incursions into account, but will be unhappy if you were aiming for something that isn't present in that small section...
- Metal/Economic/Mineral: Same as Coal/Flux: Incursions are typically too small to be of interest.
Note that adding switches for whether to take incursions into consideration on a selection item per selection item basis would result in a selection list bloat which would make the UI unwieldier than necessary.