Topic: Dwarf Therapist (Maintained Branch) v.37.0 | DF 42.06 (Read 1001621 times)

thistleknot · « **Reply #675 on:** June 11, 2014, 12:11:41 am »

I hate to do another post...

but I think I have a big improvement over the layout of the grid I just proposed.

Instead of using the natural log of the mean of the range as it's priority...

I'm thinking of using the correlation of all the log's [of each range] against each other. This would give me a much smoother progression of average changes between roles.

I really think it would help with the outlier issue.

Give me a day, and I'll have it to you.

tussock · « **Reply #676 on:** June 11, 2014, 08:58:57 am »

Quote from: MeMyselfAndI on June 10, 2014, 10:57:04 am

And yet there's no way to do some of the things that I would view as "basic". Locking specific dwarves so that they won't be changed. Locking dwarves in a specific profession to specific burrows/workshops. Selecting exactly N dwarves to do something, or up to N.

I want a dwarf on several professions. Period. But there's no way to do that. You can jack up the priority, but that doesn't always work.

Do it manually, give them a Zz nickname, sort alphabetically, and just don't select them for optimising. Or make them a captain without a unit and don't optimise military. They're usually your legends anyway, so display by legends/losers and only automate the losers. Lots of ways, really, if you already know what you want.

The optimiser's just a thing to save you clicking sort on every job and selecting a few and then re-checking how many jobs everyone's got and still trying to give your young adults something to train up out of peasantry while their stats are still low, plus make sure you get a few of the things you can't be bothered checking. Works well for that, especially if you exclude the ones you already know.

Like, I don't let the optimiser choose a bunch of the quality jobs. If I've got to pick a new Clothesmaker, I'm going to do it manually, take off all their other labours, profile them into a workshop and let them make socks and hoods for a year until they're ready for dresses. When I need a bunch of mechanics (or pump operators) for a while, the workshops get profiled to keep the newbs out and everyone can be a mechanic when they're not busy. At least, everyone who's not valuable, because magma.

Quote

A simple "Assign dwarf X to this profession. Then assign the best N of what remains to this profession. Then assign the best M% of what remains to this profession. And so on." would be much better for my use. (Especially if it was either "Exactly N", "x% of what remains", or "x% of what remains, clamped to min/max")

I'm not sure if it would be better for others though.

More toggles is going to take even longer to set up, and give you more things you miss or don't set right. To some extent the automation just needs to ensure everyone stops being a peasant sooner rather than later, gets some jobs to do some of the time, and it mostly already does that if you're careful with the priorities and proportions. If you're happy with some dorfs, don't let the optimiser at them in the first place. If you want exactly six miners, pick six miners, keep them out of your military, and don't let the optimiser pick any more.

Nice tool, BTW. Even if I am still running 21.3. Anyone get the new ones compiled in Linux? I seem to be stuck in dependency hell regarding QT5 and likely a bunch of other stuff. Ug, system sooo stable, don't want to fiddle. $:-\$

Quote from: thistleknot

I think I can derive default priority values based on old raw role rating averages

The roles are garbage. Useful and stuff, but I mean, technically the numbers behind them, it's just nonsense. Deriving things from garbage data is not good for you. Like, I'm tempted to mess around and do some !!SCIENCE!! on which things really do work for some of the more important jobs, but ... hasn't someone decompiled the game and could just tell us what it's using? Meh, 2014's out soon.

splinterz · « **Reply #677 on:** June 11, 2014, 09:09:46 am »

Quote

The roles are garbage. Useful and stuff, but I mean, technically the numbers behind them, it's just nonsense. Deriving things from garbage data is not good for you.

please elaborate on this. while admittedly some of the roles could use some tweaking, saying they're generally garbage is somewhat alarming, considering the numbers behind them are based on a lot of compiled !!SCIENCE!! already done on how attributes, preferences, traits and skill rates affect jobs and combat.

thistleknot · « **Reply #678 on:** June 11, 2014, 01:12:20 pm »

The roles are not garbage. There are obvious distributions of values from the raw ranges outputted by roles showing a relationship. For example I have a custom role I use for melee dwarfs... and guess what my awesome lvl 169 fighter shows up as a good pick for the role.

thistleknot · « **Reply #679 on:** June 11, 2014, 02:57:33 pm »

I'm so dumb...

I just realized how to normalize all the ranges to each other better than a CDF does.

ECDF...

Empircal Cumulative Distribution function.

It's formula looks like this in excel.

=(COUNTIF(RANGE,"<=&"SpecificValueWithinRange))/NumberOfItems.

Basically, you count the number of values within a subset of values that is equal to or over current value and divide by the number of values to get a %.

It also transforms all data into a flat distribution curve...

thistleknot · « **Reply #680 on:** June 11, 2014, 03:59:12 pm »

Okay, so I calculated a slope from the max average and min average, and transformed normalized data, and it solves the issue with the shearer.

Here's a pic.

http://imgur.com/NsIcpfn

I want to make sure I got this slope thing down before I pass the formula onto splinterz.

sal880612m · « **Reply #681 on:** June 11, 2014, 05:32:48 pm »

So I was trying to marry off my dwarves and thought it would be nice if I could sort them into partnered and single so I came here to suggest it but Meph already did over a year ago. Is this just not possible? or is it something that may still happen in the future?

splinterz · « **Reply #682 on:** June 11, 2014, 06:47:59 pm »

Quote from: sal880612m on June 11, 2014, 05:32:48 pm

So I was trying to marry off my dwarves and thought it would be nice if I could sort them into partnered and single so I came here to suggest it but Meph already did over a year ago. Is this just not possible? or is it something that may still happen in the future?

i don't remember the request, i thought it was something to do with showing family trees. grouping by married vs single might be doable though..

thistleknot · « **Reply #683 on:** June 12, 2014, 12:31:29 am »

Okay, so I finished it.

Took me a while to find a method to derive a decent curve, and I had to figure out how much I should scale distributions against each other (i.e. set priorities around). Originally I had proposed basing it on the natural log of each distributions mean, but that caused a problem with shearer...

Then I wanted to use a regression line mapped to the means... but I decided I was going to use the shape of the log of means, and got a nice curve (we only want minimal adjustments to means).

So I based the spread of priorities on the natural log of (max mean - min mean), basing it on the the raw difference in means would be like a 40% difference, so I figured since natural logs do a nice job of deskewing data as well as shrinking values, I would use it for deriving a proper spread of priorities.

Here's a pic of the auto derived priority curve

http://imgur.com/FZDToFJ

Here's what the distributions look like after it's ran through priorities. You'll notice that they are all flat.

http://imgur.com/3ZIOQTj

Also, the shearer outlier's are still at relevant values. In this example, the margin of difference that is allowed between the highest average role and lowest average role was less than 4%.

I also used ECDF (I think I'm using it right, either way it's a simple formula) to flatten all distributions against each other. This means center point is median, and the only thing the distribution is doing at this point is listing the values in ordinal rank, the raw role %'s don't mean anything at this point. It literally is just comparing ordinal rankings to other role's ordinal rankings. This is where the thinking behind the labor optimizer comes into play.

It's generally being understood [by me] that direct comparison between roles is pretty much pointless, because... the ranges/distributions are different. The only way to compare two roles to each other properly is ordinal ranking. There are times when two SIMILAR roles can be compared with each other, but that can be done on a one to one basis (if Splinterz implements the feature). However. The differences in means will be preserved when adjusting the ranges next to each other. So if the two roles ARE similar. They will be centered around their means to each other.

If you guys want to see the sheets, I put it up on dffd.

http://dffd.wimbli.com/file.php?id=8643

So now, I propose all the roles are adjusted automagically around their median's when running the optimizer. Ordinal rankings are compared next to each other, with ever so slight tweaks that take the average raw role rating into consideration, which WILL have real world affects on the roles (consider the affect that out of 90 rows values (in the dataset I used), when a % is derived for each row. It's based on it's ordinal ranking. I.e. it's a number based on 1/91 (it's hard to explain why it's 90+1, but I did check it out and verified that it is indeed /x+1... So prior to manipulating the values by the priorities. All the percents are based on 1/91. Which means that ordinal rankings directly equal each other across the distributions. By factoring these values by a smidge (i.e. the priorities), it generates a NEW value that slightly offsets the old. Either giving a slight boost or decrease to the value, altering it's direct comparison with another role!

The beauty of this, is now priorities are pretty much auto handled on the backend. The player will still have the option of setting new priorities over the backend ones if he wants to favor one role over another (altering priority will make a range grow next to another role, therefore pushing the values up against another role). THIS IS A GREAT FEATURE BTW. Now a player doesn't have to worry about priority values, but if he wants to, he'll know he's pushing a specific roles distribution upwards against the other distributions.

I would recommend Splinterz consider adjusting the way the values in the gridview are drawn, possibly to how I use derived %'s on the ECDF worksheet

Update:

After much discussion with Splinterz, I modified cell d24 of sheet 1 by dividing it's value by 100. So now all distributions are at the same range, but a slight order is created with priorities. All priorities will now be ~1 on the backend. Only slightly offset by their raw means. This is to ensure that if a role's raw mean was actually elevated over another role, it will give it preference when comparing ordinal rankings between two roles.

thistleknot · « **Reply #684 on:** June 12, 2014, 08:08:19 pm »

[we're] on the cusp of a new release [of the labor optimizer], if I can prove it's working correctly.

There was a concern with shearer/spinner (last two roles in the spreadsheet) where when converted to ecdf... there values were boosted too high compared to other values. I.e. roles that are based on a skill, such as these, will have a large number of dwarf's that have no skill in the role, and the ecdf conversion gives them a large value just because a large group of them has 0%. ecdf is based on the sample size and the amount of dwarf's that have that value, i.e. # dwarf's that have value / # of total dwarf's.

At first I was thinking of all these crazy if checks to see if it was skewed, but then I realized most of the roles (like 58/60) approximated to a mean of ~.505, which was rounding to .51 unfortunately. I was expecting the mean to average to .5 with ecdf, but I guess not, I verified it by downloading numxl trial and checking that indeed I was doing the ecdf conversion correct [which results in non .5 means, I believe it has to do with giving the lowest value a % above 0].

My proposal to fix the issue [with skill only distributions] was to test the mean. Originally I wanted to test if the mean was not equal to .5, but found that the ecdf does not mean a mean of .5 (lol).

So after verifying that is the case, I decided to leave the formula as is, but check if the mean is outside of a range. And if it is, we'll flag that range for a different conversion vs ecdf.

I was proposing to convert the range based on it's max raw value, i.e. divide all values by max value of that range.

This does a few things though. It halts the centering of the values around the median for those ranges, HOWEVER, the values are ALREADY not centered around the median to begin with, which is the main problem.

...

So... I guess it's a non issue if I push through with my changes regardless, because either way, these roles are not going to be centered around the median. So either way, you're going to tap into lower than median categories for these low rated dwarf's.

thistleknot · « **Reply #685 on:** June 12, 2014, 09:05:24 pm »

god, what a pain in the ass...

I got it though.

so those skewed distributions...

I tested if the (abs(mean of a distribution)-.5) > .1, then we flag it, and if so, we divide it's values by it's max instead of ecdf.

I was able to preserve the high values, and the low values

I rock.

labor optimizer improvements move forward.

From this day forward, I will be known as Spreadsheet Warrior!

http://dffd.wimbli.com/file.php?id=8643

Update:

So... the check against a .1 value is clearly a magic number, I would love for some input on what to set it to... I think .1 is about double what the non skewed ranges were producing, so I figured it was okay, the skewed ranges were about +.3 from .5 mean).

thistleknot · « **Reply #686 on:** June 13, 2014, 07:50:41 pm »

going over the labor optimizer issues with shearer/spinner.

Identified that roles that are based on skill only, or maybe a preference, have really funky distributions that even when running through an ecdf function, throw off the normalization for that role. It takes a really low mean from a raw avg and flips it after converting to ecdf. Giving it a high mean vs 50%~ (the average of each ecdf conversion should be between 75% to 50% (depending on sample size from 2 to any higher number, and mean will aproach 50%), what this means is there was a successful split of ordering the distributions around each other from their median ranked value.

However, with shearer and spinner, I went from an average of .1 to like .8 after ecdf conversion. It was because of a large number of values that were the same, for example shearer role that is based on shearer skill. Out of like 90 dwarves, only a few had any skill, and some had a great deal of skill in, and majority had no skill (funny, we discovered a bug with negative raw role values and correct it. So the distribution was lopsided (biased), but after ecdf conversion it turned all those low values into a high 84%, and the rest of the values were above these.

We figured the way the original raw distributions are, are infinitely different based on the variables involved with a role. So direct comparison between two roles is impossible without some normalization, because each distribution has it's own unique shape (thanks statistics) and signature. So we learn that the raw values don't have direct comparison with each other (unless you have two similar roles, see how this is [possibly] achievable below, this is essentially how the old optimizer worked, which is a fringe case use and is not recommended using all roles, instead, it's recommended you make a custom optimization job with just those two roles, and run it separate, or just sort the roles next to each other using their raw values in a grid view).

The inverted skewed distributions of Shearer, and Spinner clearly was unacceptable, what would be acceptable is scaling it using min max method, because it's raw median value of 0 (most likely, or whatever value, was artificially bloated to above 50%). It's not a direct representation of the population, due to a large # (i.e. the median skill or value of that role is probably 0) being at the low amount, 0 is counted as 84% of the values. So just starting, I had 84% of 90 dwarf's that had a high rating. Clearly this had to be fixed.

But how?

Well, we decided on error correction. We decided that if a ecdf role (after conversion from raw to ecdf) had a mean - .5 (ie -50%) that was off by more than 27.5%. In reality, means are around 50%, but as the sample size goes down, the error % can go up to as high as an average of .75; so we decided to manipulate the ecdf rating on the lowest possible sample. 2 variables average is 50% + 100% ecdf rating. So it comes out to be 150% / 2 = 75%. If Min=Max, then both values are set to 100%

So I tested what the mean of the highest avg of a skewed distribution by running a few samples through ecdf, I ran a 1,1,2 and a 1,1,1,2, I measured their ecdf mean's, and noticed the smaller one 1,1,2 was like 77.777% average, difference of 27.777%

I believe the 1,1,1,2 was an average ~85%, which has a difference from 50% as 35%

So I set our threshold to 27.5%.

Which means even if I have 2 dwarf's that I run an optimization plan on, it will still be able to correctly identify skewed distributions.

We also stated that if min=max (then all values are set to 100%).

hrmm, just realized something...

Spoiler (click to show/hide)

We identified skewed distributions by one more step, we checked if the frequency of of each value within the distribution. We can compare it's ecdf rating to the role and identify the frequency that # appears in the distribution.

By a rule, if a distribution has such a number, an ecdf conversion will boost that value by whatever % of the distribution it takes up (i.e. count of that # compared to the set dwarves [being selected by the labor optimization plan] within the role, example. 5 dwarf's, and 3/5 dwarve's have 0 labor in the skill. However 2 have some value. Automatically, 0 is going to start at 3/5% 60%. This is a problem when you have even more with no skill in it.

So, we also check the % that a value is repeated within a role (i.e. dataset of # dwarf's). If that value exceeds 50%, then we run a simple (x-min)/(max-min) conversion instead, which preserves the low value of the 0% as well as the value of the highest %'s (i.e. 100%).

So now, every role in the optimizer is equally distributed next to each other. This means that almost each value directly relates to another correlated value within the grid you see. From highest to lowest. There percent's are back adjusted so they are all within ~.0001%. They are offset by their ranking in highest to lowest raw outputted average (i.e. whatever the role editor spits out).

In other words:
The pre-adustment set's all priorities next to each other and sorts them by order from highest to lowest average. Then a formula is ran to derive a straight line slope that runs through the ordinal ranking of the of the order of the distributions that will serve as priority. How steep this line is is Infinitesimally small. It's sole purpose is to be as minimal as possible but to set the ordinal ranking of the distributions. Because that will give that distribution a clear advantage over all the other distributions on a line by line basis. It will give that distribution the opportunity to directly compare to all values left and right of itself (by the priorities set by the means of each role compared to each other).

The beauty is, it's dynamic to every selection you make within the optimizer, and now can detect skewed distributions and compare them appropriately. Some people might not like this, as a value of 1, 2, 5, 100, 1200 really translate to a straight line % increase from 20% 40% 60% 80% and 100% respectively, another role might be a tighter distribution like 1, 2, 3, 4, 5.

You also have control from one role to the other in adjusting priorities. Which means you can move a distribution up against another distribution when going from top value to lowest value in comparing distributions . Think of it as altering the median location of one flat distribution against another flash distribution, it affectively factors the distribution values higher or lower than the other values, so small values are recommended; however really strong values won't hurt anything. The beauty is, the auto recommended values work best JUST IN CASE raw distribution average does matter when compared to another distribution (such as when you had all your melee_dwarf's at a lower raw average rating than say farming. However, maybe now you have awesome fighters who far exceed your farmer's. So what is the difference between a run using equal priorities or those adjusted by the distributions raw mean? It slightly offsets one role next to another, but nothing to mess up the direct comparison between all roles. So each distribution is aligned next to each other when this optimizer runs. Before, roles that produced high raw % values (which are arbitrary) could potentially always be selected first through no fault of their own. I think we really hit on equal representation of the data.

It will assign the role with the highest ordinal value first, but compared to each other left to right, down the distribution as long as it has roles it can fill up. More or less, actually all the values are merged together in a superlist, and compared from highest to lowest, the Infinitesimally small priority adjustments applies a signature to the distribution to inform it that it has a lower mean than another distribution by just a little, and therefore and equally adjusted rating between each value (aka 90 dwarf's = 1/90% difference between dwarf's, * Infinitesimally small priority rating). Then it starts from the highest rating and starts assigning downwards

Some would argue that the optimizer ignores the differences in range.

Well, I would argue that it preserves the ordinal ranking between roles, and is truly representative of the population you have at hand. For one to say it promotes one over the other role by too little when it needs to be more, I argue it's truly representative of the dataset you have available. I.e. it's an empirical test of each distribution's density function, so it can be argued it's truly representative of your population. It treats every value as equally attainable, so it truly matches the curve density function of the current distribution. So yes, it does recognize differences between roles, a different understanding needs to be made that it measures the density function of each distribution curve of your dataset (which is based on role editor constructed multivariable.

Therefore you flatten the distribution, account for skewed distributions, and preserve their ordinal values, and not promote 0%, and compare them left to right like reading a newspaper.

I believe Splinterz is also going to include the ability to compare two roles within the grid view by their raw role rating. I thought this was a good idea so a player can retain the ability to compare two very similar roles next to each other side by side. Examples are: armorsmith, weaponsmith, speardwarf, sworddwarf, etc. Where maybe the only difference between two roles is maybe one additional variable (such as a skill, or preferences). This will allow a player a little more control over proper comparisons between two roles.

There's also a log that can be outputted (maybe only in beta?) that shows the outputted values of all these calculations.

You can check source of how it operates here.

https://github.com/splintermind/Dwarf-Therapist/blob/master/src/laboroptimizer.cpp#L147

https://raw.githubusercontent.com/splintermind/Dwarf-Therapist/master/src/rolestats.cpp

thistleknot · « **Reply #687 on:** June 14, 2014, 07:01:40 am »

I have to say, the labor optimizer is working pretty good (test build).

It's a little getting used to. A player is going to have a hard time understanding why what's drawn != ecdf rating != raw rating.

I was proposing instead of cdf rating, use ecdf method to draw roles, but... it's not essential, because the inplace vertical ordering of values is the same between ecdf and cdf version. Just on the backend, hows numbers are compared to each other will be different than what is visually seen with cdf ratings from left/to right (i.e. inbetween columns), and that has EVERYTHING to do with normalization.

But looking at what labors a dwarf does have assigned, eyeballing the cdf scores from left to right of the labors/roles assigned, gives an idea that the best normalized %'s were picked on the backend.

The ability to compare two or more raw role values is now is achieveable by adding the role to the gridview, hovering your mouse of the grid square, and you'll see the raw role rating. This is useful when comparing very similar roles, like speardwarf, and swordsdwarf.

indyofcomo · « **Reply #688 on:** June 14, 2014, 11:20:46 am »

Is there a place I can see the pre-defined roles Therapist is using?

How does it calculate the % suitability a dwarf has for a given role? Traits and stats?

tussock · « **Reply #689 on:** June 14, 2014, 12:28:39 pm »

The roles are in the etc/game_data.ini file.

Quote from: splinterz on June 11, 2014, 09:09:46 am

Quote
The roles are garbage. Useful and stuff, but I mean, technically the numbers behind them, it's just nonsense. Deriving things from garbage data is not good for you.
please elaborate on this. while admittedly some of the roles could use some tweaking, saying they're generally garbage is somewhat alarming, considering the numbers behind them are based on a lot of compiled !!SCIENCE!! already done on how attributes, preferences, traits and skill rates affect jobs and combat.

Shearer's a fine example, with it being mentioned here. There's no role there for if the dorfs like wool, sheep, goats, alpacas, and whatever else you can shear, because no one's put that in yet. Spinner doesn't care if they like/hate wool or hair or yarn. That's free happy thoughts, even if it doesn't make them work faster. Shearing is also typically a long-distance walking job, so benefits more from agility than other jobs.

There's no roles at all for the hauling labours, when dorfs who like cages should be priority Animal Haulers, dorfs who like minecarts should be vehicle pushers, dorfs who like assorted furniture should be furniture movers, weak dorfs shouldn't bother hauling stone but fat dorfs should. They're all just lumped on young dorfs who have no other skills yet, when really some of your multi-skilled dorfs would be happier doing some of it rather than sitting on "no job".

Like I say, they're useful as is, and I do appreciate the thought and effort that's gone into producing them. But when you do fancy math on numbers that are non-calibrated guesses, what you get is even worse numbers. Normalising them so they all run 1-100 or whatever, that's got thistleknot thinking he can use them as a comparison tool when they are not comparable.

I'm using "garbage data" as a semi-technical term, eh. You can't gain real information by doing math on guesswork. You can't actually compare if someone's going to be a more effective speardorf or sworddorf because we don't even know which weapons are better (or we do, and unless you come with a like for swords or a lot of sword skill you should be a speardorf).

News:

Author Topic: Dwarf Therapist (Maintained Branch) v.37.0 | DF 42.06 (Read 1001621 times)

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

tussock

Re: Dwarf Therapist (Maintained Branch) v.21.8

splinterz

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

sal880612m

Re: Dwarf Therapist (Maintained Branch) v.21.8

splinterz

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

thistleknot

Re: Dwarf Therapist (Maintained Branch) v.21.8

indyofcomo

Re: Dwarf Therapist (Maintained Branch) v.21.8

tussock

Re: Dwarf Therapist (Maintained Branch) v.21.8