I've been trying to find out how plant selection for an embark works, and believe I've found parts of it, but there are pieces missing.
What I think I know:
1. World tiles refer to biome regions that list plants (and animals) available in that region.
2. World tiles control areas of the embark in terms of biome, etc., and an embark can contain multiple biomes, each controlled by a world region that's either the one of the embark or one of the 8 surrounding ones.
3. At embark, a 7*7 world region grid centered on the embark's world tile is drawn from to populate the df.global.world.populations list. This list contains entries from the surface as well as each underground level, organized per world tile. I don't know how complete this list is, i.e. whether it contains everything in the biome regions, only things legal to the world tile, or possibly contains an RNG selection of entries.
4. For the surface, only plants/animals legal to an actual biome show up, i.e. temperate/tropical/good/evil/savage restrictions are upheld for the embark, even when the region biome contains things that are not legal (since a region biome can span multiple biomes of the same broad type. In addition to that, it's possible to add inappropriate things through hacking).
5. The underground biomes do not seem to perform a legality check: hacking water dependent plants into Underground Chasm regions prior to embark cause those to show up on embark, so it seems DF relies on the population of the underground regions perform the check. Hacking surface plants into underground biomes prior to embark has resulted in surface plants being present in the caverns at embark. I don't know if new surface plants would sprout there after embark, but suspect the light check would prevent it.
6. For some reason, grasses seem to have been given a special treatment. The df.global.world.unk_59dc4 structure seems to contain "grasses" organized into world tiles and levels, with each level containing a list of "grasses". Adding a grass here causes the grass to start appearing on the embark, even if not legal (eyeball on a good embark, for instance). Hacking the df.global.world.populations structure has not given any effect at all regarding grasses: it seems unk_59dc4 has taken over completely.
7. Underground Chasm regions do not follow the normal pattern of providing "grasses" to unk_59dc4 from their population list (which does not contain any "grasses" [or other plants except possibly blood thorn] at all). Despite that unk_59dc4 gets both cavern moss and floor fungi for these regions, and removing the mud and pre existing ground cover with magma shows the grasses do regrow. Removing "grasses" from Underground Water regions (the normal caverns, i.e. not mud covered) prior to embark causes those caverns to be covered by generic grass, although I haven't verified that the corresponding unk_59dc4 list is empty.
8. Hacking df.global.world.populations to replace the plant reference in an entry to refer to a different plant of the same (Bush/Tree) type causes the embark to start providing the new type instead of the old one, even if the reference to the biome region's population still refers to an entry for the original type. I speculate that this back link is only used for animals for which there is a need to keep track of their numbers.
9. Hacking df.global.world.populations to insert new non grass plant entries seems to have no effect, even if properly linked up to hacked new entries in the region biome's population, even when the new entry has been inserted in between existing ones.
10. River creature populations are available in the feature_map's river feature entries on an embark tile level. Where that population is drawn from is currently unknown (educated guesses can be made, but I've made no investigations).
Things I know I don't know:
1. How does DF select which bush/tree to place when it's time to generate one? Presumably an RNG number is used to select from some kind of list, but where is that list located, and can it be manipulated? 8 & 9 above leads me to guess there is a list of pointers to the actual entries in the df.global.populations list (rather than a list of indices for the entries) somewhere.
2. Is the bush/tree "list" DF uses complete, i.e. does it contain every legal entry from the region biome's population list, or is there a further RNG based selection (on top of the one that led to the generation of the region biome population list contents)? Put a different way: does presence in the region biome's population list guarantee that you have a chance to get that plant on your embark (assuming it fulfills the local legality criteria regarding actual biome, etc.)? Experience indicates you may miss out on some, but it may just be bad luck.
So, do others have additional insights or knowledge proving/indicating things above are incorrect?
Edit: Added point 10 above.