It's largely a matter of which situation you want to optimize for:
a) Minimize the time it takes to search a full set of potentially matching world tiles; or
b) Minimize the time it takes to search a sparse set of potentially matching world tiles.
Now, if the cost to a) of achieving b) is small, it may be worth aiming for b).
My original prototype (written in Lua, although I was fairly certain the script penalty would be too high to be practical) tried cashing everything, but the estimated memory requirement for that would be excessive: I don't remember the number, but something like 30 GB, which I also suspected would be the case. Therefore I changed to use the philosophy Toady uses, i.e. regenerate information rather than cashing all of it. The plugin stores all the world tile level information used plus some aggregate level info about mid level tiles, but not the mid level tile information itself. The mid level tile aggregate information is generated during the first search, and thus is available for the following searches.
The aggregate level information is of the type "no sand on any tile", which means a search for sand in an embark will skip this world tile when that field is true. Before the first pass this field is false, because we haven't scanned the mid level tiles yet. Similarly, if there's no river in a world tile, there's no waterfall either. In the same vein, this info contains the biomes of the current world tile as well as the surrounding 8 tiles, and if an embark requires a particular biome not on that list the world tile can be skipped (and this list is cut further if examination of the mid level tiles shows that some of those potential biomes are not present). In fact, the yellow X indicates that the tile has to be examined more closely exactly because the top level info collected does not rule out a match.
One complication is that DF uses a data structure with "feature shells", i.e. 16*16 world tile blocks, and loading those take a significant amount of time (about half of the total search time), which has caused me to ensure the world is processed one feature shell at a time. In addition to this, movement from tile to tile is done simulating single world tile movement key input, as I don't know of any other way to get DF to load the feature shells. It is possible to use the 10 tile movement key inputs instead to speed things up, as well as diagonal movement when the preliminary match set is sparse, but that adds (a possibly negligible) overhead to the case where you have to scan more or less every tile. Also of importance is that the movement pattern is complicated enough as it is (it took me a fair while to get it to work correctly). This doesn't rule out that it might be possible to use the current pattern for the first scan and one that tries to cut the number of number of steps down for subsequent searches when all the base data has been collected, though.
A further complication is that the next version of the plugin takes "incursions" into consideration, i.e. bits of neighboring mid level tiles' biomes jutting into embarks, which would allow you to find a single tile embark with 9 biomes on it (assuming one existed in the world, which is highly unlikely). That logic, however, requires info from neighboring world tiles in all directions to find matching embarks at the edges of world tiles, but all the required background info is still collected on the first search, so a different search logic for subsequent searches would still be possible.