Topic: DFHack plugin embark-assistant (Read 100628 times)

Fleeting Frames · « **Reply #150 on:** November 13, 2019, 07:44:34 am »

The comment was in case there was idle processor time between embark-assistant and DF i.e. if embark-assistant waits a frame until DF has done its thing rather than reacting that millisecond. Of course, if there were, you could speed up search also by increasing fps, which I doubt is the case from looking at "do nothing" search.

PatrikLundell · « **Reply #151 on:** November 13, 2019, 08:36:44 am »

Given that DF isn't doing anything pre embark apart from displaying data and reacting to player key presses, it shouldn't really matter if we could save up to 10 ms (i.e. up to 7 seconds for a full size map at 100 FPS, which is the FPS I think we get, and half of that time on average). A save of 100 ms per movement would have been significant, though.
In all, a good idea that doesn't seem to that relevant in this particular case.

Edit: Something different, but not worth double posting:
An implementation I haven't been happy with is the embark matching at the actual embark level. Currently all embark combinations are examined for a world tile that may have a match in it, with embarks rejected one at a time on the detection of a mismatch. A more intelligent approach would be to invalidate all rectangles that are invalidated by an absolute criterion mismatch (e.g. "no evil anywhere") and start embark rectangle evaluation with checking for this early rejection.
I can think of two approaches to achieve this:
- Proceed as now, but as soon as an absolute mismatch is found, all starting MLTs of rectangles that would include the offending MLT would be marked as rejected, and so be skipped after checking this flag.
- Pre-process all tiles individually and mark as in the previous alternative (skipping pre-processing of tiles invalidated), storing the collected data for each tile until it can be used for compound evaluation. This may be more complicate than the other alternative and require storage of more info, but might have a slightly better performance because you don't do the full processing up until finding the rejection. However, added administration might cancel out the gain.

Rekov · « **Reply #152 on:** November 22, 2019, 03:32:59 pm »

Possible problem with waterfall detection. I tend to use the embark-assistant location finder because the options are so much more useful than built in search. I have noticed that sometimes it misses what would seem to be viable locations.

For example, I ran a search with the following values:

X Dimension: 4
Y Dimension: 4
Min River: Minor
Waterfall: Yes
Freezing: Partially Frozen

Then, I tested turning off each of the requirements one by one. This site, which definitely has a waterfall isn't included in the search results if I have Waterfall: Yes, but does show up if I have Waterfall: N/A

PatrikLundell · « **Reply #153 on:** November 23, 2019, 04:18:26 am »

Hm, I have a suspicion that there might be something related to the rivers joining that somehow scrambles the logic. If you could provide the save and a description of where the site is located I'll take a look and try to figure out what's going on.

Edit: I hope the statement about sometimes missing viable locations refers to the waterfalls. If there are other cases as well, those are other bugs that ought to be fixed, once they are made known.

Rekov · « **Reply #154 on:** November 23, 2019, 01:01:44 pm »

Yeah, I'm just talking about the finder not detecting all waterfalls.
I did a simpler search with just Min River: Stream, Waterfall: Yes

You can just generate a world, do this, and randomly trace along streams, rivers, etc, and you will find waterfalls that didn't get picked up by the search.

Here is the region this is from, but like I said, it's pretty easy to find instances like this in any world.

PatrikLundell · « **Reply #155 on:** November 24, 2019, 03:26:04 pm »

Thanks for the save: it's a lot easier when show exactly what to look for and where, even though it was rather easy to find once I knew what to look for.

OK, I've found the problem: The "active" field of the horizontal/vertical river info isn't a boolean, but -1, 0, 1, and I checked for 1, so about half of the river tiles were missed. This is a river issue, not a waterfall one, but if the river isn't detected, neither is a waterfall in it.

While thinking about what the problem might be, I realized there was something else missing: for some weird reason DF doesn't generate info for some river tiles (at the Mid Level Tile level), but deduces their presence from neighbors (to the south and east, it turns out), and the logic wasn't known (at least not by me). I've tried to investigate it and think I've gotten it to work (but I still don't understand the logical reason behind it). Searching for single tile embarks with rivers on them should match the DF display, and does so (on my system) except for two edge cases I won't bother fixing:
- River sources/sinks to the North and to the West of the river are implicit, like the bends. However, some river sources/sinks are in lakes and some sinks in oceans. However, DF does generate rivers on ocean tiles, including sources (or if they're sinks). I don't know how to determine when not to generate river indications for those cases.
- DF generates river info for glaciers, but doesn't display any rivers, which makes logical sense as they'd be frozen all year around anyway. I expect that can indicate waterfalls as well, but you'd have to wait for the next inter glacial period to embark there if you want to enjoy them...

As far as I can see, there's no issue with joining rivers, at least my looking around with single tile river matches has worked correctly for all the cases I've looked at (randomly walking around the map and look for mismatches), so that was a false lead.

RedDwarfStepper · « **Reply #156 on:** November 24, 2019, 07:19:39 pm »

@PatrikLundell: Just to let you know: I'm still at it.
The whole family caught a cold which forced my first to do theoretical experiments and later on allowed for pen-and-paper programming.
I'll incorporate the waterfall and river fix...

PS: If you feel inclined to play around with the latest work-in-progress, your invited to take a look/check out
https://github.com/bseiller/dfhack/tree/embark-assistant-index/plugins/embark-assistant
Currently it only feeds into the index, no queries so far. But that is what I'm working on in a standalone command line project, which allows for faster development as the startup time is much shorter.
Also the styles are not up to your standards - I started with it before your feedback - I change the names of members and functions as I go.

PatrikLundell · « **Reply #157 on:** November 28, 2019, 05:45:01 am »

Taking care of the real world takes precedence over the virtual one...

I'm just starting to look at what you're at, and probably misunderstand half of it, so the comments should be taken with that in mind.
- defs.h; inorganics: There are more than 256 of those, so they won't fit in unit8_t (assuming that's what's supposed to be stored).
- index.h: The plural of "index" is actually "indices".
- index.h: I don't see the point of creating maps of the names of various inorganics, as you can easily extract those from the DF structures. If it's some kind of backward indexing I'd just store the index of the corresponding inorganic (in the DF structure).
- index.h: As discussed earlier, it would make sense to merge the three (four?) inorganics vectors of maps into one, but it makes sense trying to get things to work first before trying to optimize them.

index.cpp:
- Index::setup: It seems you make the assumption all worlds have x and y dimensions that are the same. That doesn't hold (mine are 17 * 129, for instance).
- Index::createKey (and setup, maxKeyValue): Does the range matter? If it doesn't I'd just make the key out of the lower 12 bits of x and y, plus the lower 4 bits of i and k, with a sanity check to ensure the world dimensions don't exceed the limits (which they can't in vanilla, and it seems from another thread they can't approach those values). The next point indicates the value range does matter, though.
- Index::add: It seems this operation relies on the order in which the keys are added, and that order does not work well with either DF's feature shells or an efficient processing of world tiles without feature shells (unless there's some way to move directly from the last world tile of a row to the first world tile of the next one). A comment explaining what's special with 511 would be useful: I can't figure it out.
- Index::add: I'd change the code checking for duplicate keys to a one liner (which I usually avoid) and comment outthe whole line when not used, rather than just the output. The compiler will have to perform the function call unless it can somehow deduce it doesn't have side effects even if there's nothing to do when true.
- Index::add: Is there a reason for the omission of code inside the candy check? The later storage of level?
- Index::add: I'm not sure what you're after with the (explicitly redundant) checks for various layer materials (not sure if it covers veins). If activated, it would exclude gems, for instance, and regardless, if someone has hacked in something, I don't see why it should be filtered out (And I just realized someone hacking in steel as a layer would get it filtered out if all inorganics were merged, but the issue would still be isolated to metals, and it would be possible to handle metals only separately).

RedDwarfStepper · « **Reply #158 on:** November 29, 2019, 08:09:16 pm »

Oh wow - thank you very much for taking the time to look into in detail!
Ah, the code is still full of constructs that I installed to help me get my bearings and to (dis)prove my assumptions how DF and the plugin works.
A lot of those can and should be removed or properly commented if it makes sense to keep them for debugging purposes.
Now to your comments:

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

Taking care of the real world takes precedence over the virtual one...

I'm just starting to look at what you're at, and probably misunderstand half of it, so the comments should be taken with that in mind.

Very true, especially if the real world does not give you a choice about that

Having read your comments I can tell you: Even if one of them did not point out an error or misconception of mine it at least made clear that I need to externalize my thoughts (by comments!) and remove old code more eagerly. Especially since code turns into legacy code so fast, even if I'm myself the author.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- defs.h; inorganics: There are more than 256 of those, so they won't fit in unit8_t (assuming that's what's supposed to be stored).

That seems true for my test world (but might even have been wrong there) but this was a put there to try improve performance where vector<bool> seemed slow - which they are in the debug build as some optimizations only are made for release builds. Has been removed.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- index.h: The plural of "index" is actually "indices".

Changed - hehe, in my mother tongue it would have been me to point this out

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- index.h: I don't see the point of creating maps of the names of various inorganics, as you can easily extract those from the DF structures. If it's some kind of backward indexing I'd just store the index of the corresponding inorganic (in the DF structure).

I'm lazy and wanted to be able to look the name up during debugging and have it ready when writing the "report" at the end of the survey/index phase.
As those names also are used as part of the file names when writing the indices onto the disk (Index::outputContents) I thought it might be nice to have them around so the writing process was not being made more complicated or slower by the sequential (?) look up. But the idea that that could be complicated might have its root in my narrow understanding of the DF structures.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- index.h: As discussed earlier, it would make sense to merge the three (four?) inorganics vectors of maps into one, but it makes sense trying to get things to work first before trying to optimize them.

Exactly my thought - the fourth vector (inorganics) will replace the three others as soon as I'm sure that I'm not missing anything.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::setup: It seems you make the assumption all worlds have x and y dimensions that are the same. That doesn't hold (mine are 17 * 129, for instance).

Oh boy - that one is - I never even knew that is possible!
Fixed this by removing world_dims and using worldgen_parms.dim_x and worldgen_parms.dim_y instead

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::createKey (and setup, maxKeyValue): Does the range matter? If it doesn't I'd just make the key out of the lower 12 bits of x and y, plus the lower 4 bits of i and k, with a sanity check to ensure the world dimensions don't exceed the limits (which they can't in vanilla, and it seems from another thread they can't approach those values). The next point indicates the value range does matter, though.

The range of the key values by itself does not really matter - apart from the fact that there is an upper limit (MAX(uint32_t)) - but the value is currently used to reserve memory for the vector keys_in_order (of addition that is), which in turn is used to help test performance. Roaring works best if the keys come in strict ascending order. It still works fine of they are "random", but in that case it performs better if it gets a little help now and then by calling runOptimize and shrinkToFit. That is what you stumbled upon in your next comment.
Sorry, I'm still a little slow and bit manipulation was never my strong suite - would taking 12 bits from x and y and 4 bits from i and k still lead to a 32bit key? Since that's the format Roaring consumes...

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::add: It seems this operation relies on the order in which the keys are added, and that order does not work well with either DF's feature shells or an efficient processing of world tiles without feature shells (unless there's some way to move directly from the last world tile of a row to the first world tile of the next one). A comment explaining what's special with 511 would be useful: I can't figure it out.

As said above if the keys don't come in strictly ascending order helping Roaring by calling runOptimize and shrinkToFit improves the memory consumption during the survey/index phase.
511 seemed to be the offset from last position the previous 16*16 mid_level_tile survey run (i = 15 and k = 15) to the position of the current (i = 0 and k = 0) if y is odd - which is wrong and already removed from the code - but bear with me for another moment
Correct me if I'm wrong, but to efficiently process feature shells the iteration goes like this:

Code: [Select]

x=0,y=0              => x++                 x=15,y=0
>----->------>----->---->----->---->---->----
                                            ↓ y + 1
x=0,y=1              <= x--                 x=15,y=1
-----<------<-----<----<-----<----<----<----<
↓ y + 1
x=0,y=2              => x++                 x=15,y=2
>----->------>----->---->----->---->---->----
....

So every other/odd row is being processed "backwards", in descending x order, right?
.... I just added the x,y,i,k coordinates to the output of the keys_in_order data... now I really understand what you meant when you said:

Quote from: PatrikLundell on September 15, 2019, 02:59:45 am

...
Also of importance is that the movement pattern is complicated enough as it is (it took me a fair while to get it to work correctly).
...

Boy oh boy!

The iteration spans 4 feature shells in large worlds before the movement pattern repeats, wow! Mad respect for that!
Anyway this feeds Roaring a lot out of order keys which can lead to temporary sub-optimal behavior memory-wise.
I'll have to look into ways of mitigating that. This might take the form of an option for the user: Fast search which consumes more memory temporarily until the end of the survey/index phase or slow search that optimizes/"defrags" often and thus needs less RAM.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::add: I'd change the code checking for duplicate keys to a one liner (which I usually avoid) and comment outthe whole line when not used, rather than just the output. The compiler will have to perform the function call unless it can somehow deduce it doesn't have side effects even if there's nothing to do when true.

Again that's just an debugging artifact that helped me to get a better understanding of what happened - so yeah that can & will be removed.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::add: Is there a reason for the omission of code inside the candy check? The later storage of level?

Yes exactly, the empty block is already removed in my current working copy.

Quote from: PatrikLundell on November 28, 2019, 05:45:01 am

- Index::add: I'm not sure what you're after with the (explicitly redundant) checks for various layer materials (not sure if it covers veins). If activated, it would exclude gems, for instance, and regardless, if someone has hacked in something, I don't see why it should be filtered out (And I just realized someone hacking in steel as a layer would get it filtered out if all inorganics were merged, but the issue would still be isolated to metals, and it would be possible to handle metals only separately).

You mean

Code: [Select]

world->raws.inorganics[mineralIndex]->flags.is_set(df::inorganic_flags::SEDIMENTARY) ||
world->raws.inorganics[mineralIndex]->flags.is_set(df::inorganic_flags::IGNEOUS_EXTRUSIVE) ||
world->raws.inorganics[mineralIndex]->flags.is_set(df::inorganic_flags::IGNEOUS_INTRUSIVE) ||
world->raws.inorganics[mineralIndex]->flags.is_set(df::inorganic_flags::METAMORPHIC) ||
world->raws.inorganics[mineralIndex]->flags.is_set(df::inorganic_flags::SOIL)

?
Those I took from embark_assist::finder_ui::ui_setup (case fields::mineral_1) - once the inorganics are merged into one vector it can be dropped for sure - saying that when I experimented without this "filter" I got indices that had different names but the same content, for example galena and lead or garnierite and nickel - are those the pairings of the mineral containing a metal and the metal?

Ok, that was a lot - thanks again for taking the time to look into this very preliminary version of the code!
And for reading this wall of text till here

Now one more question: Is there a river size field in the DF structures on the level of embark tiles?
Or is it the same for all embark tiles of a region?

What happened in the meantime?
I spend quite some time obsessing over the performance and the memory consumption of the index.
Memory-wise I can say the current scope of the index (still missing some fields of the finder) is around 30MB serialized for a 257*257 region.
How much it actually consumes living in the RAM still eludes me - as the memory profiling does not work properly. This is complicated by the fact that an index created at run-time with out of order keys seems to be bigger than the same index that has been reloaded from serialized data - but I'll get there.
The standalone tests suggest that indexing during the survey/index phase shouldn't be a concern performance-wise, but I'll do integrated runtime checks just to be sure.

My next steps are roughly as follows:
- experiment with a variant of Index::add that takes the whole 16 x 16 mid_level_tiles at once instead of 1 mid_level_tile at a time.
- implementing the missing fields/indices (biome river_size, river_elevation, bad weather and effects,...), excluding incursions for now - I'll probably have some questions about some of those - but in essence some steps that currently reside in the matcher will have move into the survey, continuous numerical fields will either live in a vector (e.g. temperature) or probably in a map-like structure if not every embark tile has/needs them (e.g. river_elevation). Getting most or all of them will allow me to make an informed decision if the memory consumption is still tolerable. Having an analogue to the region might be a way to mitigate the memory costs for all attributes that are the same for all embark tiles with the same x/y.
- get queries running - I have a pretty clear idea of how to do that so that it performs good or even excellent in most cases. If the results aren't fast this will be a deal-breaker. The first version of this will run the query phase only after the survey/index phase have finished. (yeah I know, user feedback and responsiveness

) That way I hope to avoid all kinds of problems I might need to solve later. But the second iteration could run as soon as one region is completely indexed. The third iteration - well see the next point.
- get the incursions in there - I think this will integrate nicely with the concept of the indexing, but I'll have questions for sure. Adding incursions complicates running queries during the survey phase, but I'm have a low priority mental process working in the background on how to know when all relevant neighbors of a region tile have been processed. Solving that would allow for a third iteration of the queries, that could produce results during the survey phase that are "incursion-ready".
- as an optional feature: currently I save the indices to disk for debugging and performance tests - but seeing how easy it is it might be a nice option to allow the users to reload the indices (automatically) on a subsequent load of the world map and allowing for fast searches right away. Adding a version hash would make it possible to tell if the data is compatible with the code. What do you think? Even automatically creating the indices in batch mode might be an option.

Currently I (still) think it is feasible to move all information into the index and thus having pure queries during the search phase, which will avoid any additional iterations/calculations on the level of the embark tiles. But I'm willing to compromise on that if there are cases that lead to enormous, unproportional memory requirements or to hacky code (using an index-hammer on an algorithm-screw).
Also I haven't forgotten about your remark concerning the following possibility

Quote from: PatrikLundell on November 08, 2019, 07:48:12 am

I've been thinking, and believe it's a mistake to try to massage the current inorganics presence storage. All you really NEED is two bytes to store the first and last layer of the geo biome present for each MLT (in addition to the index of the geo biome itself, which is currently stored), as all the rest of the info is available from the geo biome itself (And since the two values are in the range 0-15, you can actually store both of them in a single byte). To get the first/last layers we could cut away a fair bit of the code from the modified Prospector code of the MLT processing (the removed code would still be needed elsewhere to extract the actual inorganics, though).
With that basic information stored, you can either process the geo biome each time you need the data, or you can try to pre process the geo biomes to speed up the information extraction.
- The layers potentially worn away by erosion are all soil layers, and DF never seems to generate more than 4 of those. It's possible to hack the geo biome to get more soil layers and/or deeper soil, and DF can erode up to 10 Z levels, if I remember correctly. The suggestion below doesn't actually make any use of this info, though.
- DF doesn't use more than 16 layers of the geo biome even if hacking has added more (DF stretches the last one to fill the gap to the magma sea if needed).
This means that one possible approach would be to make a bit array for each layer of each geo biome and then merge the ones you have in each MLT with OR operations (16 layers * 33 byte bit array * X geo biomes). Even if DF doesn't croak at a silly max size PSV world with a checkerboard layout (forcing each world tile to get its own geo biome), you'd still not use more than 35 MB to store the info in a more convenient format than the geo biomes themselves.

Okay, sleep. now.

PatrikLundell · « **Reply #159 on:** November 30, 2019, 05:23:44 am »

- 12 + 12 + 4 + 4 = 32, so that would result in a 32 bit key.
- Yes, that's what the search pattern looks like. It's sort of fractal in that the processing order between feature shells mimics the one within them (the order within world tiles doesn't have any movement cost restrictions, and thus is done in a more human oriented line by line order). I think it would be possible to get the key generation code to mimic it, i.e. to get keys generated in sequential order, but it would probably take a number of attempts to debug it (on the other hand testing would be simple: just indicate a failure if the next key isn't the previous one plus one). This is probably a case where the effort should be on the coder (i.e. you) rather than on the user...
- When it comes to the checks for flags, you've failed to take the two first lines in the block into your copy. I have to admit I don't remember why I did it that way, but looking at it (without researching it), it looks like the code selects:
- Things that are present in specific environments (probably to catch the elusive Alluvial ones)
- Things that are present inside specific materials
- Things that are layer materials.
- The plugin code looks at ores and finds the metals those can produce, so yes, a metal ore will result in the adding of the ore to the mineral vector and the metal(s) to the metal one. This is the case where there'd be some trouble if raw editing adds a metal as a layer material, as you could then get an entry in the common vector from both paths. Note that DF itself distinguishes between native gold (the ore) and gold (the extracted metal). It wouldn't be unreasonable for the plugin to require raw editors to respect this logic if they expect the plugin to work correctly (but it still shouldn't crash).
- River size: The river structure contains a list of all world tiles the river flows through, and each of those has a "flow" field for the volume of water the river carries, and there's a logic for how that is translated into stream, minor, etc. (this is documented in the XML, but is obviously in the plugin code as well). In addition to this, DF translates the flow into an in-game tile width that's stored in the MLT structure as x_min/x_max and/or y_min/y_max for entry and/or exit points along the edges. However, there's no need to use this in the plugin. On top if this the world tile data (region_map, if I remember the name correctly) has a flag to indicate that a river is a brook (i.e. can be walked on top) or a stream. Hacking can provide wide brooks, but DF only generates them when the flow is 0, if I remember correctly. Rivers follow the geography (Toady mentioned it in a talk recently) and presumably doesn't interact with regions beyond some internal DF calculation to compute how world tiles (and their biome parameters: at least rainfall, and probably drainage) contribute to the river's flow, and one region can be crossed by quite a few rivers. Oh, wait a minute: does the question about "region" refer to the world tile, rather than the regions in the DF structures? If it does, I'd suggest a change of terminology to avoid confusion.
- The current data structure has one level of data collected/summarized at the world tile level to allow for an early weeding out of world tiles that cannot have matching embarks. Info that is common for all MLTs ought to be stored there, if it's missing from there. Thus, I think the level you're looking for exists already.
- I'd be hesitant about saving matches to disk for later use for a number of reasons:
- The disk would get cluttered with index files that would have to be removed manually.
- The next DF version will have spreading evil. This will invalidate Evilness indexing in worlds where a fortress has been retired after a few decades.
- There are tools for the current version that allows you to modify the world pre embark, invalidating index file contents in various ways.
Thus, I'd check the label of this can of worms very carefully before determining they're tasty enough to open it...
- I don't know whether storing everything in indices (with the attendant lookup) is going to be faster than accessing pre processed geo biome info using first/last layer as the key pair. I do know it's going to require more memory, but I don't know the answer to the crucial question of whether it's going to require too much memory. The key gain with either approach is that you'd have to scan the world only once.
- Incursion handling with the index approach ought to be handled such that the relevant incursion info is integrated into the info for the MLT, i.e. it should be possible to store that the MLT has evil, neutral, AND good in it, for instance, that it has a partial aquifer coverage, and that it contains biome X, Y and Z. As far as I understand this isn't hard to do: all it would require is a few adjustments, but most of it seems to be ready for that (If I understand the indexing correctly, there's nothing blocking the current structure from adding the key for an MLT in more than one of the evilness indices, for instance).

RedDwarfStepper · « **Reply #160 on:** November 30, 2019, 05:31:16 pm »

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- 12 + 12 + 4 + 4 = 32, so that would result in a 32 bit key.

That seems easy enough

I got 32 as a result too when adding those values, but wanted to be sure that i understood correctly.
Would that operation be reversible, meaning could x,y,i and k be extracted from the resulting key again?
That is a requirement of the current design, to allow the creation of the proper embark_assist::defs::match_results later on - that requirement would be relevant for the next point as well.

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

... I think it would be possible to get the key generation code to mimic it, i.e. to get keys generated in sequential order, but it would probably take a number of attempts to debug it (on the other hand testing would be simple: just indicate a failure if the next key isn't the previous one plus one). This is probably a case where the effort should be on the coder (i.e. you) rather than on the user...

That's a fascinating idea! A key iterator, that generates the key values analogue to the movement pattern of the iterator of the world tiles - hm, never would have thought about that one.
I'll have to let this one simmer for a while. It would be an optimization anyway. Did you create something like that before? And of course it would need to be reversible.

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- When it comes to the checks for flags, you've failed to take the two first lines in the block into your copy.

I did, didn't I... I fixed that. But the duplicated index contents (galena and lead ...) happened when I deactivated the filter altogether. No filtering would probably be true for a merged inorganics vector as well. So that's something to consider when doing that.
Also I would prefer not force modders/raw editors to do things in a certain way. But first things first - get it running, then have some test cases that might crash it or not...
First I'll keep 3 separate vectors just to avoid any new funny, confusing errors...

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- River size: ...
Oh, wait a minute: does the question about "region" refer to the world tile, rather than the regions in the DF structures? If it does, I'd suggest a change of terminology to avoid confusion.

Yes, you are right - I'm currently very code-centric and was talking about region_tile_datum and the level of processing of those tiles. I'm aware of survey_rivers in survey.cpp and was wondering if the river size or something analogue is also available on the level of embark tiles (mid_level_tile) in the structures of details->rivers_horizontal/vertical. Or if the river_size of a region_tile_datum automatically is the same for all related mid_level_tile that have a river and they inherit it from their "parent" region_tile_datum?

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- The current data structure has one level of data collected/summarized at the world tile level to allow for an early weeding out of world tiles that cannot have matching embarks. Info that is common for all MLTs ought to be stored there, if it's missing from there. Thus, I think the level you're looking for exists already.

Again, you mean region_tile_datum, don't you? If they can be addressed/found/iterated fast and easily during the matching/query phase then yes they would be the right place. Actually I'm not sure anymore why I thought there might be the need for a new structure on the same level as region_tile_datum - perhaps it will come back to me...

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- I'd be hesitant about saving matches to disk for later use for a number of reasons:

That are really some tasty worms you got there!
Disk storage and clutter: I was thinking about storing the index within the safe-folder, which might mitigate the cluttering of orphaned/derelict index files.
Spreading evil: I thought about indexing cities and neighbors which can change over time(- can they?) - for that adding the current year to the index folder would help knowing if all mutable indices have to be rebuild. That would be true for spreading evil as well.
Tools to change a world pre embark: You are evil

I - ah well - that is a real doozy.
I see your point here, I really do - but I feel it might add value to the plugin if the user has at least the choice to reuse the indices and thus does not have to spend another 5 to 10 minutes waiting the next time... but this is nothing that has to be decided now...

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- I don't know whether storing everything in indices (with the attendant lookup) is going to be faster than accessing pre processed geo biome info using first/last layer as the key pair. I do know it's going to require more memory, but I don't know the answer to the crucial question of whether it's going to require too much memory. The key gain with either approach is that you'd have to scan the world only once.

Me neither, and yes, that key gain is what I'm after - perhaps those to approaches can be combined - again an possible optimization for when it works, that's pretty much what I wanted to say.

Quote from: PatrikLundell on November 30, 2019, 05:23:44 am

- Incursion handling with the index approach ought to be handled such that the relevant incursion info is integrated into the info for the MLT, i.e. it should be possible to store that the MLT has evil, neutral, AND good in it, for instance, that it has a partial aquifer coverage, and that it contains biome X, Y and Z. As far as I understand this isn't hard to do: all it would require is a few adjustments, but most of it seems to be ready for that (If I understand the indexing correctly, there's nothing blocking the current structure from adding the key for an MLT in more than one of the evilness indices, for instance).

Storing the same key in different indices is absolutely no problem, also a query (e.g. find a MLT with a evil biome) would be fine with it, as it does not care about the fact, that the same key is in more than one complementary index. Moving all incursion information into the MLT would be the most elegant way for sure. But right now/at the beginning I could live with a solution that handled incursions as a special case, that gets processed later on when the neighboring MLT that belongs to the adjacent region_tile_datum is being surveyed - that's how I imagine it anyway - does that match the current way of it is done? (hehe, starting with the questions about incursion already).

Okay, tomorrow I might get to program again and break some more things - today there wasn't any time, but I really wanted to clarify some of my muddled thoughts.

PatrikLundell · « **Reply #161 on:** December 01, 2019, 04:53:09 am »

Building and extracting the parts from a key that's the added up like that the 32 bit one I suggested is rather trivial:
Build: Shift X 18 steps + Y shifted 8 steps + i shifted 4 steps + k (where shift can be replaced by multiplication by 4096 * 8, etc.: I don't know if the compiler would be smart enough to realize that multiplications by constant powers of 2 can be replaced by shifts, but C is sufficiently close to assembly to have explicit shift operations).
Extract: You'd shift in the other direction and then mask out the extra bits at the top with an AND mask (or mask out the bits you don't want and then shift: the order doesn't matter).

I haven't created a key matching the movement pattern (or anything like it), but it's certainly possible (the movement pattern encodes the logic in a convoluted way, but it would have to be teased out). The logic is certainly reversible (there's a 1:1 correspondence between coordinates and keys within the valid key range), although, again, it would require some work to get it right (or the right mind to handle that kind of problem, but it's not a mind I have).

Since galena is an ore that produces both lead and silver (according to the wiki), the metal extraction checks would mark lead and silver as metals that are present.
I think it's a wise approach to get things to work first and optimize later. It is rather useless for something that's blindingly fast, but doesn't work...

The reason it's called region_tile_datum is that I used to call the 16 * 16 embark tile area "region" (hence region manipulator), until realizing DF calls the middle pre embark map "region", and before Toady said, in some FotF response, that he'd called the tiles within a world tile Mid Level Tiles, so it's a historical inaccuracy where I partially, but not completely, reworked the naming.
If I recall correctly, different MLTs do not have to have the same river width: if a river changes width the change is gradual, i.e. the change is widening or narrowing over several MLTs if needed. There's also the case of joining rivers: they can be of a different width (DF does never generate two rivers within the same world tile as far as I've seen, but joining is implicit from the entry points along the world tiles' edges).
The argument above shows that the current river width logic isn't fully correct, as it uses the world tile width (as stored in the river structure) is used for all MLTs with rivers on them, rather than actually measuring the width(s) for each MLT. I don't think that's serious enough to require fixing immediately, though.
The data you're looking for on the MLT level is the x_max - x_min and y_max - y_min values that show the actual width in in-game tiles, which would have to be translated back to river size (I think the XML comment contains that info, but I don't quite remember. If it doesn't it shouldn't be too hard to investigate it by hacking flows at the flow change points and embarking to check the river widths).

Yes, the world_tile_data is a vector of vectors of region_tile_datum elements, and so is easily accessed via (x, y) coordinates.

Cities and neighbors aren't used by the plugin... I've made an attempt to determine neighbors (there's a thread about that), and the "final" resulting script seems to work, but there are still unknown factors at play, so the data isn't good enough for "real" use.
Neighbors can definitely change over time as sites are conquered and razed, though.
Currently Savagery changes over (a long) time: I've had worlds where I've located good embark locations (Savage) with a short history and then regenerated from the same seeds with my actual desired history length only to find the location Savagery has been reduced (and it's known civs can "tame" savage areas to make it possible to settle in them).
The only factors I know change over time are Savagery and (soon) Evilness. There's no indications any of the others change, as far as I know (erosion applies only to world gen itself, not the history, for example.
The mother of all pre embark modification (and just about any other modification) is gui/gm-editor, and I didn't write that gem. I won't really accept the description of "evil", but "devious" could be accepted...
I'm not sure people would embark/search for embarks in many sessions in the same work that often, but if proper warnings about its use are provided it could be of some use. Regenerating the Evilness/Savagery data is rather easy, as that's stored at the world tile level, so you don't actually need to read the MLTs to apply it (assuming you've stored the info of which world tiles provide the biomes of each MLT).

I'm not really happy about the half assed way incursions are handled currently. To do it properly, you really need to process the world twice: first to extract the "primary" MLT info, and then to add the relevant parts from neighboring MLTs that provide incursions. This is done on the first pass for all the fully interior MLTs, but the ones at the edges are processed only if their neighboring world tile MLTs have been processed, and there's currently no second pass to process those: that's done when/if a second search is made. If all the relevant MLT info is stored (rather than generated and discarded) it would be possible to do the task in two distinct passes, i.e. first gather primary MLT info and then a second pass to apply incursions. There's a UI issue with how to display progress, though. Possibly yellow X->light green X-> green X.

Edit: I think I've managed to cobble together functions to generate a key/extract indices from a key. The script tests that the parameters fed to the key generator matches those returned from the index extractor, but it hasn't been tested to see if it actually manages to follow the survey pattern, only what I think the survey pattern is. Also, I haven't verified that the keys generated actually are within the expected value range.

Spoiler (click to show/hide)

Quote

function key_of (x, y, i, k)
local world_last_x = df.global.world.world_data.world_width - 1
local world_last_y = df.global.world.world_data.world_height - 1
local fs_y_offset = math.floor (y / 16) * 16
local y_offset
local fs_left = y % 32 < 16
local x_result = x
local key

if not fs_left then
x_result = world_last_x - x
end

if (fs_left and x % 2 == 0) or
(not fs_left and x % 2 == 1) then -- Assumes the X dimension is an uneven number, but the search doesn't work if it isn't...
y_offset = y % 16

elseif world_last_y - math.floor (y / 16) * 16 < 16 then
y_offset = y - world_last_y

else
y_offset = 15 - y % 16
end

key = i + k * 16 + (x_result) * df.global.world.world_data.world_height * 256 + (fs_y_offset + y_offset) * 256
-- dfhack.println ("key_of:", x, y, i, k, key, x_result, fs_y_offset, y_offset)
return key
end

-------------------------------

function indices_of (key)
local world_last_x = df.global.world.world_data.world_width - 1
local world_last_y = df.global.world.world_data.world_height - 1
local x
local y
local i = key % 16
local k = math.floor ((key % 256) / 16)
local x_result = math.floor (key / (df.global.world.world_data.world_height * 256))
local fs_y = math.floor (key / 256) % df.global.world.world_data.world_height
local fs_y_offset = math.floor (fs_y / 16) * 16
local y_offset = fs_y % 16
local fs_left = fs_y % 32 < 16

if fs_left then
x = x_result

else
x = world_last_x - x_result
end

if (fs_left and x_result % 2 == 0) or
(not fs_left and x_result % 2 == 1) then
y = fs_y_offset + y_offset

elseif fs_y_offset == math.floor (df.global.world.world_data.world_height / 16) * 16 then
y = fs_y_offset + ((world_last_y % 16) - y_offset)

else
y = fs_y_offset + (15 - y_offset)
end

-- dfhack.println ("indices_of:", key, fs_left, x_result, fs_y, fs_y_offset, y_offset, "=>", x, y, i, k)

return x, y, i, k
end

-------------------------------

function x ()
local x_res
local y_res
local i_res
local k_res
for x_index = 0, df.global.world.world_data.world_width - 1 do
dfhack.println (x_index)

for y = 0, df.global.world.world_data.world_height - 1 do
for i = 0, 15 do
for k = 0, 15 do
x_res, y_res, i_res, k_res = indices_of (key_of (x_index, y, i, k))
if x_index ~= x_res or
y ~= y_res or
i ~= i_res or
k ~= k_res then
dfhack.println (x_index, y, i, k, "vs", x_res, y_res, i_res, k_res)
end
end
end
end
end
end

x ()

RedDwarfStepper · « **Reply #162 on:** December 09, 2019, 04:25:52 pm »

I'm back, from some more germ SCIENCE real life felt it needed to test on me...
Hope that's it for a while with the experiments

Anyway - thanks for the key-position-script.
I'll try to incorporate it in the next session, whenever that will be.
But I uploaded one of my output files here
http://ul.to/242ekbya (beware: 556 MB unzipped!)
that should allow you to verify that the world tiles are being surveyed in the order you would expect.
When I drew the schematic of the iteration on November 29, it seemed to me that it always first went the long way in the x direction and then one step in the y direction.
That no longer seems to be the case, it seems the other way around now - there were multiple small errors in the way I decoded the naive key - which might have added up to that.
By now I process all 256 embark tiles within one loop, which helps speed things up (batching/buffering the adding to the indices), also it allows for an easier way to generate the keys - I adapt your script once I'm there...

RedDwarfStepper · « **Reply #163 on:** December 17, 2019, 05:13:19 pm »

So I found the time to do some tests with the key generation that follows the iteration pattern of the survey.
Following a sample of x,y,i,k positions (only the first and last of each x-y-group) and the associated key in CSV format that also contains a marker if the key is not continuous:

Spoiler (click to show/hide)

Code: [Select]

x;y;i;k;key;
0;0;0;0;0;
0;0;15;15;255;
0;1;0;0;256;
0;1;15;15;511;
0;2;0;0;512;
0;2;15;15;767;
0;3;0;0;768;
0;3;15;15;1023;
0;4;0;0;1024;
0;4;15;15;1279;
0;5;0;0;1280;
0;5;15;15;1535;
0;6;0;0;1536;
0;6;15;15;1791;
0;7;0;0;1792;
0;7;15;15;2047;
0;8;0;0;2048;
0;8;15;15;2303;
0;9;0;0;2304;
0;9;15;15;2559;
0;10;0;0;2560;
0;10;15;15;2815;
0;11;0;0;2816;
0;11;15;15;3071;
0;12;0;0;3072;
0;12;15;15;3327;
0;13;0;0;3328;
0;13;15;15;3583;
0;14;0;0;3584;
0;14;15;15;3839;
0;15;0;0;3840;
0;15;15;15;4095;
### discontinuity ### 
1;15;0;0;8448;
1;15;15;15;8703;
1;14;0;0;8704;
1;14;15;15;8959;
1;13;0;0;8960;
1;13;15;15;9215;
1;12;0;0;9216;
1;12;15;15;9471;
1;11;0;0;9472;
1;11;15;15;9727;
1;10;0;0;9728;
1;10;15;15;9983;
1;9;0;0;9984;
1;9;15;15;10239;
1;8;0;0;10240;
1;8;15;15;10495;
1;7;0;0;10496;
1;7;15;15;10751;
1;6;0;0;10752;
1;6;15;15;11007;
1;5;0;0;11008;
1;5;15;15;11263;
1;4;0;0;11264;
1;4;15;15;11519;
1;3;0;0;11520;
1;3;15;15;11775;
1;2;0;0;11776;
1;2;15;15;12031;
1;1;0;0;12032;
1;1;15;15;12287;
1;0;0;0;12288;
1;0;15;15;12543;
### discontinuity ### 
2;0;0;0;16896;
....

You can see, that as soon as the second column (x = 1) is being processed there is a difference in the next key value (8448) of 4352 to the expected value of 4096.

I played around with my adaptation of your script a little but haven't had any epiphany till now. It also could be an error in the c++ implementation - do you see any (obvious) errors:

Spoiler (click to show/hide)

Code: [Select]

const uint32_t embark_assist::index::Index::key_of(int16_t x, int16_t y, uint8_t i, uint8_t k) {
    const int32_t world_last_x = world->world_data->world_width - 1;
    const int32_t world_last_y = world->world_data->world_height - 1;
    const int16_t fs_y_offset = std::floor(y / 16) * 16;
    int16_t y_offset = 0;
    const bool fs_left = y % 32 < 16;
    int16_t x_result = x;

    if (!fs_left) {
        x_result = world_last_x - x;
    }

    if ((fs_left && x % 2 == 0) || (!fs_left && x % 2 == 1)) {
        // Assumes the X dimension is an uneven number, but the search doesn't work if it isn't...
        y_offset = y % 16;
    }
    else if (world_last_y - std::floor(y / 16) * 16 < 16) {
        y_offset = y - world_last_y;
    }
    else {
        y_offset = 15 - y % 16;
    }

    const uint32_t key = i + k * 16 + (x_result)* world->world_data->world_height * 256 + (fs_y_offset + y_offset) * 256;
    // alternative approach just using y as factor instead of world->world_data->world_height, which results in more wrong values as it get "negated" by (fs_y_offset + y_offset) * 256
    //const uint32_t key = i + k * 16 + (x_result) * (y + 1) * 256 + (fs_y_offset + y_offset) * 256;
    // dfhack.println("key_of:", x, y, i, k, key, x_result, fs_y_offset, y_offset)
    return key;
}

Apart from that it seems that the more coherent/continuous keys already resulted in an slightly faster (~10%) adding of the keys to the indices - but I'll have to verify that.
Next up: Queries.

PatrikLundell · « **Reply #164 on:** December 18, 2019, 08:34:40 am »

Yes, my logic is incorrect. The feature shell is its own little box of a 16 * 16 * 16 * 16 = 64 kB block (when complete: the edges typically consist of a single row and/or column), so the feature shell part of x and the local part of x have to be treated separately, and the feature shell part of y similarly has to be split from the local part.
Back to the drawing board...

News:

Author Topic: DFHack plugin embark-assistant (Read 100628 times)

Fleeting Frames

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

Rekov

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

Rekov

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

RedDwarfStepper

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

RedDwarfStepper

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

RedDwarfStepper

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant

RedDwarfStepper

Re: DFHack plugin embark-assistant

RedDwarfStepper

Re: DFHack plugin embark-assistant

PatrikLundell

Re: DFHack plugin embark-assistant