Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 ... 43 44 [45] 46 47 ... 222

Author Topic: Dwarf Therapist (Maintained Branch) v.37.0 | DF 42.06  (Read 999715 times)

splinterz

  • Bay Watcher
    • View Profile
    • Dwarf Therapist Branch
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #660 on: June 08, 2014, 02:57:15 pm »

alright but if you could see the average raw ratings in the optimization plan, and if that average was modified by the priority when you changed it, wouldn't that be enough? why would you still need to see the raw ranges, and/or the raw * priority ratings for every dwarf?

the role groupings for comparison would take some work to setup, but might be a good idea. at the very least maybe comparing them all to each other wouldn't be a bad idea to start with, i'll have to look at it again. i mean at least you would know that your cheesemaker really is legendary, and not just the only dwarf with more than 10xp.

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #661 on: June 08, 2014, 03:19:06 pm »

to be honest, there's a lot of information that is useful (like median, min, max, etc; but you don't need that if you had the output of the roleset you wanted as a CDF display).

The purpose of priority is to scale role ratings against each other.  Without seeing the curve of things (i.e. full range of values within a greater set), one would merely be using the avg to scale priorities with each other.

Say I had one legendary dwarf, and the rest of my 19/20 dwarves have no skill.  The curve is skewed.  Without seeing the range of values, I wouldn't be able to see that.  By looking at [all] values within a CDF function, I can see the scale of values compared with other values.  I could better see what I would need to do to scale roles against each other.

So same example.  I could figure.  Okay, the average of 1 legendary (say level 20), and 20 no skillers is like lvl 1.  But if I were to look at median, I would realize it was 0.  That's a big difference.

By having a setup like this, I could see how priorities would adjust ordinal ranking.  With the 1 legendary.  I could use the average, but then 19 dwarf's wouldn't really get much representation.  I would find that I would have to up the priority considerably for it to have an affect on the other 19 dwarf's.

If I saw the range, I would see a bunch of values that had low %'s, and 1 outlier, with a high percent.  I could exclude the outlier when calculating my priority, and scale the rest up to 100% of the highest priority I desired.

It's kind of confusing, but I think it would help tremendously.  It would allow a dynamic view of the population with two or more specific roles.  If all my miner's had low raw %'s, I could see what the max % was when comparing it with say ambusher, and woodcutter.  Say i'd want to scale the max percent between those 3 labors to 100%.  I would need to know the max %, and not just the mean.  However, the mean would be a huge first step.

It's a visual representation of data.  I think it would be useful, especially when dealing with non normal curves of data
« Last Edit: June 08, 2014, 03:32:48 pm by thistleknot »
Logged

splinterz

  • Bay Watcher
    • View Profile
    • Dwarf Therapist Branch
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #662 on: June 08, 2014, 04:20:16 pm »

i'm not questioning the transparency of what the optimizer is doing, i'm questioning where it's visibility is the most useful. if you're going to go through each role for each dwarf, you're probably better off doing it manually and it might not even be possible to get a plan that works precisely how you want it to.

but this actually gets me wondering if the way the priority is being applied couldn't be improved; maybe it's got to do more than simply scale ratings? i mean i'm terrible at the maths, but what effect would applying the priorities to adjust all the ratings, and the reapplying a cdf of the new ratings against each other have, and would it be an improvement?

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #663 on: June 08, 2014, 04:38:29 pm »

well, I think you would have to do some dry runs and compare.

If you ran a set of roles through a CDF into values from 0 to 100, I would recommend looking at what changing priorities would do to scale the results against each other.  That's the only way your going to realize if it's useful or not.  I think it would be useful.  If I saw roles that were being suppressed visually, I could adjust the priority to make it stretch out and match my other roles.

Seeing all the values from a range of 0 to 100 for a specific labor optimization plan would allow me to do that.  I would see the values within my CDF results scale up and down against each other as I adjusted the priority.

Update: my msthbwas wrong on my example scenario. One level 20 dwarf and 19 lvl 1s avg is 1.95... Using avg only doesn't show the skew...
« Last Edit: June 09, 2014, 08:08:04 am by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #664 on: June 09, 2014, 08:40:58 am »

Although I'd refer a grid view option, as it would let me see if dwarf a is better at weaponsmithing vs armorsmithing.

However, if the optimization plan listed the min, mode, median, and max (and maybe sdev) for each role used... I could tell how the distributions are skewed against each other and could adjust my priorities better.

This of course could be expanded to include a range involving standard deviations and quartiles.. As values within 2 standard deviations of the mean equals something like 66%+ of all values...

There could even be a button that sets priorities based on the mean and standard deviation.

http://en.m.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

We could auto adjust priorities so the values are adjusted to match say 95% of values

Update: to be honest I don't think the sdev would be useful in setting priority. The only thing priority could adjust is the overall avg of the output... Or overall max...
« Last Edit: June 09, 2014, 08:55:54 am by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #665 on: June 09, 2014, 12:36:45 pm »

All this talk about listing min,mode/median,max really gets at the issue of not understanding the ranges of the raw values outputted by role ratings that are used in the labor optimizer.  Every role will have different ranges, and the ranges determine when the %'s are listed in the overall mash of role %'s that are used in the labor optimization (i.e. it places all roles for an optimization plan into one super list of %'s and counts down from highest % to lowest % when assigning labors, so the range of a specific role's outputted %'s would fit specifically within this superlist of %'s...  so you can see understanding each role's outputted range of %'s is important when trying to figure out priorities, and how one would want to adjust them to lower ranges against each other.)

Currently, priority can only target the max % of a specific role, or the mean.  It would be nice if one could stretch ranges of %'s to match other's role's range of %'s.

If I have a role where the range of raw %'s outputted by a role rating (ex... if all my miner's have outputted role ratings between 0 and 40%) compared with another role (say farming), where my range is from 20% to 60%...

That means when the labor optimizer runs from highest % down to lowest %, I'm going to see that my farming is going to start getting picked first before I get to miner's (i.e. 60% is the max for farming, and 40% is the max for mining).

If I had the ability to adjust the role ratings scale somehow pre-optimization run.  I could give say mining a higher priority.  As is, priorities only scale down %'s in an equal fashion.  Up for discussion would be if transforming raw role rating % ranges would be useful.  Say if I wanted to expand the mining range from 0 to 40% up to 0 to 80%.  That way mining will get picked first in the optimization plan over farming...

It's a whole other bug of issues I'm talking about.  As how would one stretch the ranges?  It would be something similar to just listing the values as they currently are drawn, i.e. each role is ran through it's own CDF.  However, this isn't always optimal, as it gives every role a 50% mean.  What I would propose is mean manipulation prior to optimization running.  Maybe even min/max manipulation, which would involve targetting new proposed means, min's, and max's for outputted role raw values just prior to optimization.

If these changes to ranges could be made (i.e. transform them before they are fed into the optimizer), and visually outputted into grid views to show how the optimization plans would work, would allow a player to fine tune what role's are given priority when doing labor optimizations.

I could give you a better visual breakdown of the issue if I could get a spreadsheet output of some raw role ratings...  Then I could make some side by side candlestick ranges of roles to show the real issue.  You would get a perfect image of what I'm talking about).  you would find that some role's ranges (i.e. a candlestick range depicting a role's range) have completely different ranges than others.  This is important when considering the way the optimizer works, that starts with highest % value and works it way down.  The ability to modify those candlestick ranges against each other would allow a player to fine tune how the optimizer works.  By providing a mean, I could scale a range up/down, but it would be proportional to the adjustment.  Another option would be to simply +/- all the outputted role rating values to move the range up/down.  Another option is to actually allow transforming the range (i.e. stretching it) to match another range of values (say another role).
« Last Edit: June 09, 2014, 12:41:34 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #666 on: June 09, 2014, 01:02:20 pm »

Here's a display of the issue

http://imgur.com/XRFQCca

Currently. Priority can scale up or down a range of raw %s outputted by a role. It will either shrink the range and lower the mean or expand the range and raise the mean.

Looking at the picture I had provided.

Ambusher %'s would be congregated towards the top of the labor optimization, followed by ranged (merely because it has a higher mean), then carpentry, mining, and melee; therefore, the role % ranges have an affect on the listing/order within the optimization plan).  Preferences (for roles that use them) also have a suppression effect on role ranges...  As a dwarf will never have all preferences for a role, therefore a role with a lot of preferences will generally give a role's range lower outputted values... so being able to see these types of Whisker Box Diagram for ranges and the ability to adjust them, would help alleviate this affect.

The problem is kind of mitigated by allowing the # of labors to set per dwarf.  Yes... Ambusher (range) might be higher than say mining, but if you assign 6 to 7 labors per dwarf, the issue is mitigated by the fact that a dwarf can only be assigned to ambusher once.  However, when you start adding in 20-30 roles into your optimization plan, you can get an idea, that some ranges are going to be completely exhausted by the time you get to the bottom %'s of the optimization plan.  Hence, Ambusher would have recieved a preference in assigning roles/labors to dwarf's.

The issue can be mitigated if one can allow for modifying the ranges (not these ranges are based on a current population set only btw, and should always be related to the current population.  Trying to calculate all possible ranges would be impossible, as that's what the role's are meant to do, give us a %, it's just common sense that 100% is never attainable).

A few suggestions:

A. The ability to raise/lower the entire range (i.e. a simple addition/subtraction).
B. The ability to stretch both sides of a range.  Either raise the portion above/below the mean.

So I think the ability to target the mean at a new center point, and the ability to stretch the min to a lower point, as well as stretch a max to a new higher point are some possible ideas.

Some issues I could see with this, is moving a range too high, or too low, giving some values above/below 100%/0%, so I would think doing something CDF with these values...  Kind of similar to how we cap attribute values percent wise (i.e. when a dwarf is created with maximum possible starting attribute value, we did some math where he is at 95%+, the extra 5% was the max attribute he could grow to, or something like that, Splinterz knows what I'm talking about).

In other words, if a player could adjust the ranges, the ranges would have to be kept within a 0 to 100% range after the adjusting.  I think CDF's and SDEV's could be used to accomplish this, not exactly sure how, or... instead just scale the range to now allow values under 0 or over 100% after stretching.).

There's a few things to consider though.  Like... should we care where the majority of the values fall within a range?  I used skinny lines to draw the max/min of a range as it extended from the 1st and 4th quartile, and the box to draw the majority of the values (i.e. inbetween both quartiles). 

So some math could be done to use SDEV quartiles to target say 50% of the values within a range, and just move that portion.  Scaling everything beyond those points if they fall outside of 0 or 100%.

See

http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev4.shtml

alternatively, we could use mean's and sdev... but I think using medians, and quartiles would work better.  Better yet, instead of quartiles, why not use 1/32-iles to grab 93.75% (i.e. the stick end of the whisky diagram could be drawn to represent 1/32 of the data from the box drawn).

some more info on whisker diagrams

http://www.barnstablecountyhealth.org/ia-systems/information-center/data-and-statistics/guide-to-box-whisker-diagrams
or...
http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev6.shtml

to add means to box-whisker diagrams...
http://peltiertech.com/WordPress/excel-box-and-whisker-diagrams-box-plots/

Update:

On further analysis, I'd have to throw my support behind median vs mode...

https://epilab.ich.ucl.ac.uk/coursematerial/statistics/summarising_centre_spread/measures_spread/comparing%20measures%20of%20spread.html

Quote
If the mean is not a meaningful summary of the centre of the data, then it follows that the standard deviation, which is calculated from distances around the mean, will not be a useful summary of the spread of the values.

Therefore, if distributional assumptions (data is symmetric) can be made and there are adequate numbers in the sample to check those assumptions (as a rule of thumb it is often said that a sample size of at least 20 would be adequate), then the mean and standard deviation should be used to quantify the centre and spread of the measurements.

Alternatively, if the data distribution is skew and/or the sample size is small then it is preferable to use the median and interquartile range to summarise the measurements.

https://epilab.ich.ucl.ac.uk/coursematerial/statistics/summarising_centre_spread/measures_centre/comparing_mean_median.html
« Last Edit: June 09, 2014, 02:27:04 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #667 on: June 09, 2014, 07:53:14 pm »

Thanks for the link splinterz.

and this guy for his template

http://www.vertex42.com/ExcelTemplates/box-whisker-plot.html

So I used the file you sent me of 97 dwarf's... and mapped their ranges in these really nice whisker box charts that even shows outliers!

Interpretation of DPF is in the excel file just in case you don't have excel

Spoiler (click to show/hide)

http://dffd.wimbli.com/file.php?id=8628

better link (v2) , original link has excel sheet and source url to google doc

http://dffd.wimbli.com/file.php?id=8629

I sorted based on average and it produced an interesting ordering

http://dffd.wimbli.com/file.php?id=8630


v4

http://dffd.wimbli.com/file.php?id=8631

Some items to take note of were (it's a shame I can't figure out how to sort these by median rating...)

Spoiler (click to show/hide)

I think a good labor optimization plan, would allow you to target these values (i.e. quartile & median targetting I suppose. 

If you note, outlier's are listed as red x's... outlier's are calculated as beyond 1.5 the quartile's point.  I think because these values are already based from 0 to 100, targetting these outlier targets for labor optimization would be a good approach.  Say we targetted 1.5 out from the 1st quartile (i.e. some value below 0% and the the 1st quartile) and 1.5 from the 3rd quartile (i.e. a value between 3rq quartile and 100%), we can target proper numbers for priority and mean adjustments properly.

Just an idea, but the graphic shows the extent of the problem with the current optimization plan.

I think showing these values in game would be a tremendous help if allowed to alter the mean and scale it's range down; however, I think if the range were scaled to what was proposed with 1.5 and quartiles above, I think the ranges could be scaled to each other for proper optimization. 


Notes:

Spoiler (click to show/hide)

is there a way to derive what range the outlier's are caught at?

Okay, found the solution to what inner and outer fences (i.e. outlier's) measure to

Source: http://www.syntricity.com/datablog/-/blogs/thinking-outside-the-boxplot

"Inner fences represent mean +/- 2.698 standard deviations or 99.30% of the data, while outer fences represent mean +/- 4.7215 or 99.9998% of the data."

Spoiler (click to show/hide)

Update:

So I think if you wanted to scale ranges around each other, this kind of math could be used.

if lower inner fence = .35%, upper inner fence = 99.65.

We could somehow cap outlier's based on these values... (if you wanted to get funky with things)

Update:

I'm thinking if you scaled all values of a role down to it's inner fences range, you'd get a fair breakdown of the range of a role.  You could scale .35% starting at the lower fence range (i.e. Median - (IQR * 1.5)) and 99.65% at the upper fence range (i.e. Median + (IQR * 1.5)

I think there is more benefit on scaling these roles to each other on a similar scale before applying optimization.  That way 50% of a roles values are above the center, and 50% are below.  I know direct comparison between roles will be lost over a long outlook on a fort (as skills will most likely go up), but priority can be used to give a bias to more desired roles...

I'm not sure.  I think deriving these values off the raw role ratings, and then transforming them into median Inter Quartile Ranges to derive acceptable Inner and Outer Fences gives us a range to transform the #'s into standardized values from 0 to 100%.  For those values that FALL OUTSIDE of the inner fences, I propose scaling them from 0 to .35% (lower fence) and 99.65% (upper fence) and 100% respectively. 

This would mean the Min value would be equal to 0% vs it's raw role rating, and it's max would be scaled to 100%, where-as the value defined as the upper limit (Median + (IQR * 1.5)) would be  = 99.65%.

This would do some weird things, it means outlier's WOULD ALWAYS have values above 99.65% (or below .35%).  So in the pdf provided.  I added up that there were 438 outlier's out of 5400 values... that resulted in 8.111%... hrmmm.... well either way, that's a ~92% of values that are within a comparable range while preserving some significant order.  Telling the user what the prior avg was, and allowing him to pick a priority based on this knowledge I believe would be a good answer to the priority dilemna, the priority could be applied after the values are scaled to a new %.

Based on those values... I would say my thinking could use some peer review, because 8.111% is a lot larger than (100% - 99.65%)

I think targeting proper "centers" by stretching the values next to each other [in the manner previously described] gives the most valued number of combinations when comparing roles to each other.  By allowing for priority's to further weight down the values in a preference order (priority would be applied AFTER a % is derived with the previously described method).  This would tell a player that he is directly affecting the MEAN/MAX value when using priority.  I know it sounds complicated, but I think it's really doable, and preserves the best interests of the game as well as player.  Direct role comparison isn't achieved here, but considering all labors are equally important (barring priority), it would allow for a fair comparison of values.

I'm not saying it's the best answer, but it's one I came up with.  I think trying to stretch the values any other way is too arbitrary, this allows a systematic approach to scale, and then use priority to factor (i.e. priority( a new layer of ordering over.

It's also nice, because using quartile and median, I can use standard scaling methods (i.e. no need to use standard deviation, it's a simple inflation or deflation of values that preserves the same order and ratio between values
« Last Edit: June 09, 2014, 10:27:23 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #668 on: June 09, 2014, 10:57:37 pm »

So I think I have an idea.

Barring any ideas on how to determine skew, and use skew to normalize a distribution (i.e. using raw role ratings to accomplish it).

I would propose we hard skew the data back using median, and (interquartile range * 3), which has shown to cover 92% of the values in the spreadsheet most recently posed (look to the #'s to the far right of the sheet of v4).

It involves heavy manipulation of raw role ratings, but it would accomplish a comparable dataset when comparing roles against each other for labor optimization.  I have an idea that would give a fair shake to 92% of the values.  Which mean that 8% of the values outlier's are being suppressed somewhat when comparing roles (be it low or high outlier's, but there's more high outlier's than low).

So I figured I could create bins...

Since all possible values are 0 to 100% (due to the role rating), we don't have to worry about large number conversions.

So...

Here is what I propose, convert the raw role rating value by (see below table).  This would be based on a per role basis.

Note: (IQR = Inter Quartile Range)

0 to 4 % = 0 to (1st Quartile - (IQR * 1.5))
3 % to 25 % = (1st Quartile - (IQR * 1.5)) to 1st Quartile
25% to 50% = 1st Quartile to Median
50% to 75% = Median to 3rd Quartile
75% to 97% = 3rd Quartile to (3rd Quartile + (IQR * 1.5))
96% to 100% = (3rd Quartile + (IQR * 1.5)) to 100

If the calculation for IQR * 1.5 leads to a value above or below 100 or 0, then the value would be capped at 100/0, and the next bin would be skipped. (simple logic check here).

At first I was worried about converting the data this way, I thought it might destroy the bell curve.  However, the curve that comes with the distribution will be heavily transformed into a bell curve, but will retain it's original curve between these key bins, but stretched to match this forced curve.

Update:

A simple form would be to use the values outputted by roles currently (i.e. one role fed into a CDF; although I still recommend allowing grid views for roles that can be combined for CDF ratings, so one can compare two or more [similar] roles directly with each other), then notifying the player of the avg raw rating, and simply allowing him to factor down the rating, using the former average raw rating as guidance.

Update:

This sounds really crazy, but I think I can derive default priority values based on old raw role rating averages...  Giving a similar ordering of roles to the previous non normalized layout.  I'll see if I can do a prototype and get an image up showing a Whisker Box of the transformation.
« Last Edit: June 09, 2014, 11:28:10 pm by thistleknot »
Logged

splinterz

  • Bay Watcher
    • View Profile
    • Dwarf Therapist Branch
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #669 on: June 10, 2014, 05:30:32 am »

i'm just going to reiterate what i've already mentioned in the pm to you, for the benefit of anyone else reading this.

i think this is becoming overly complex and will guarantee that nobody but a selected few will ever use the optimizer. the biggest issue is that priority right now is an arbitrary number and there's no transparency as to what it's actually doing.

i proposed that we simply sort based on priority, rather than using it as a multiplier. you'd end up with a final list, sorted by priority, and then role rating. this makes the optimization plans instantly much more understandable because priority is actually determining what order the optimizer will fill jobs.

i think that the desire to have optimization gridviews, while interesting, is a workaround for the real problem explained above: you can't tell what priorities do at the moment, so you need a lot of extra information (averages, mean, median, mode, min, max, gridviews). now, i'm not opposed to changing things, but it's got to improve the current behaviour, and if possible add a minimum of complexity to the already complex optimization plans. even better would be if it can make things more transparent (ie. priority does something expected).

dumping every thought you have into this thread as it comes to mind may help you think things through, but sifting through everything to find what you're really trying to say can be tedious. i'm also not a statistician nor a mathematician, so most of what you're posting means little, or nothing to me. it doesn't help me to see why this is better, why it makes priority more understandable, or how it will compare to the current method. the amount of effort you're putting into your posts is great but maybe split your posts into a brief explanation of what/how things change and the possible effects, and then show those details to support your findings.

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #670 on: June 10, 2014, 08:30:28 am »

yes... this stuff is getting too long.  I've been posting my research on here just in case anyone I invited would have something to say.

I promise to go back and clean up the posts, as most of the questions/ideas I posited before are answered later.

Ideally, what i'm trying to address are the varying of ranges that are fed into the optimizer.  So far, I have two ideas for normalizing the ranges.  (The benefit of normalization allows all ranges to be relevant to each other). One is to figure out how to do lambda estimation and use box-cot (power transformation) to achieve the lowest possible standard deviation of a data set, or... to use the bin formula previously described (not as nice).

Then modify the priority of the roles based on non normalized raw averages for each role (this allows somewhat of a preservation of differences between ranges).  Finally, a player can adjust those default suggested priority ratings.

Voila, simple.  Only complicated part is on the backend, and normalizing the ranges.
« Last Edit: June 10, 2014, 08:33:03 am by thistleknot »
Logged

MeMyselfAndI

  • Bay Watcher
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #671 on: June 10, 2014, 10:57:04 am »

Personally? It's too complex.

And yet there's no way to do some of the things that I would view as "basic". Locking specific dwarves so that they won't be changed. Locking dwarves in a specific profession to specific burrows/workshops. Selecting exactly N dwarves to do something, or up to N.

I want a dwarf on several professions. Period. But there's no way to do that. You can jack up the priority, but that doesn't always work.

A simple "Assign dwarf X to this profession. Then assign the best N of what remains to this profession. Then assign the best M% of what remains to this profession. And so on." would be much better for my use. (Especially if it was either "Exactly N", "x% of what remains", or "x% of what remains, clamped to min/max")

I'm not sure if it would be better for others though.
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #672 on: June 10, 2014, 01:11:47 pm »

Okay, I gave up on trying to figure out how to do box-cot; however, I found the cdf of raw role ratings normalized the ranges to an acceptable level.

Here is the magic.

http://imgur.com/9T5kcs2

basically what your looking at:

Top image is how the raw role ratings look when ran through the optimizer.

Center image is how the CDF values are drawn inside Dwarf Therapist when looking at roles (see the nice ranges match!)

Bottom image bases priorities on the CDF values by using the ln(means of raw ranges)...  Maintaining a sense of original order seen in the 1st image, but preserving ordinal mean differences between ranges :)  These would be in the form of "suggested" starting priorities.  I scaled the values back up so it fell within a 1 to 0 range, but I can explain all that later.  Let me know if you guys like it.
« Last Edit: June 10, 2014, 01:14:54 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #674 on: June 10, 2014, 07:46:27 pm »

the next piece of work I would like to contribute to dwarf therapist is to find [distributions of] roles that are skewed and instead of using a [normal] probability distribution function to figure out it's %, try to use box-cots or skewed normal distribution [using it's skew values] to deskew it, and then derive a proper %.

I read I could use power transformations, or logs.  However, a great website I have saved on my phone told me certain deskew methods should be used for certain data collection methods.  I consider this a "polluted" distribution (mainly for the optimizer).  Which means that multiple mini distributions make up the whole distribution.  Which makes things hard to compare.  However, generally speaking, when a distribution is skewed, it's skewed on an individual role level, and I could just deskew that role before throwing it into the optimizer.  Same goes for the way things are regularly drawn.  It would greatly enhance the drawing of the values I would think.

I just checked some numbers, and most values are still fairly represented with the default suggested priorities... It would be nice to be able to flag outlier's and preserve some of their initial value, because as seen with the shearer, his max score drops from 89% to like 16%.

However, that is also because the average score for shearer was like 5%, so when the priority was calculated (and even after it was ran through a natural log, it scaled down the range significantly).

The formula for identifying outlier's I believe is calculating the Inter Quartile Range (i.e. 3rd Quartile - 1st), and then 1.5 distance of the IQR minus the 1st Quartile and 1.5 distance out from the 3rd Quartile.  According to my tests, this accounted for 92% of the values.

So... if outlier's were preserved at their initial value, and then were scaled down with everything else on a separate global priority... not exactly how I'd scale that value down...  Yeah, I don't know how that would work...


On second thought.  Not everything is perfect.  I just checked the outlier's between the cdf ratings and this new proposed method, and there were only 6% of outlier's.

I think the focus should be on deskewing skewed distributions, and that would better address outlier's.

Update:

I found a youtube tutorial on how to do lambda!

https://www.youtube.com/watch?v=sEZh8HCSaxk

it works for left/right tailed skews!

Here's how to do it!

Spoiler (click to show/hide)

will update with what constitutes a skewed distribution by measuring slope and kurtosis
Update: source: http://en.wikipedia.org/wiki/Data_transformation_%28statistics%29 (see above spoiler)

Update:

On final thought, it might just be easier to check if it's skewed, and then just run a natural logarithm on it vs trying to figure out the rest.  Or maybe... if there isn't enough data values, you could use a natural logarithm;, and if you wanted to check the value of lambda, for larger distributions, you could enable the check if their are enough data values...

If this method were used... the natural logarithm would be ran on a skewed distribution's raw values BEFORE they are fed into a CDF, then after the CDF, when priorities are calculated, the TRUE MEAN of the skewed distribution is taken into account, not the transformed natural logarithm mean.

A post test could be performed to see if running a distribution through a deskew function actually results in an appopriate skew rating...

Or... it could just be skipped altogether.

Another option is trying to derive a skewed normal distribution value using a distributions skew and kurtosis...

Or...


just not use any of these and keep it as it currently is proposed...

update:
turns out... deskewing shearer resulted in practically the same skew value using natural logarithm... so instead, I just propose keeping it as is, but reminding a player that the labor optimizer does about 90% of the work.

Splinterz:

I'm not overly concerned about it, but you have negative numbers in your raw file you sent me.  So min was reporting as -1.88...

negative values are fine for most things, but for logarithms, there bad.  It's recommended to scale negative values up before applying a natural logarithm.

Update

I think if a player wants to add a new role to a labor optimization plan using the new proposed method, that he be shown the new average.  That way he can set his priority to it, or above/below it, when he wants to adjust the scale of the range that will be applied before labor optimization.
« Last Edit: June 10, 2014, 11:52:59 pm by thistleknot »
Logged
Pages: 1 ... 43 44 [45] 46 47 ... 222