Thanks for the link splinterz.
and this guy for his template
http://www.vertex42.com/ExcelTemplates/box-whisker-plot.htmlSo I used the file you sent me of 97 dwarf's... and mapped their ranges in these really nice whisker box charts that even shows outliers!
Interpretation of DPF is in the excel file just in case you don't have excel
The ends of the whisker are set at 1.5*IQR above the third quartile (Q3) and 1.5*IQR below the first quartile (Q1). If the Minimum or Maximum values are outside this range, then they are shown as outliers. The normal convention for box plots is to show all the outliers, but to simplify this template, only the Min and Max outliers are shown.
http://dffd.wimbli.com/file.php?id=8628better link (v2) , original link has excel sheet and source url to google doc
http://dffd.wimbli.com/file.php?id=8629I sorted based on average and it produced an interesting ordering
http://dffd.wimbli.com/file.php?id=8630v4
http://dffd.wimbli.com/file.php?id=8631Some items to take note of were (
it's a shame I can't figure out how to sort these by median rating...)
Outlier offender's
Miner, 5 upper outlier's
Shearer, 24
Stone & Rune Engraver, 8
Archaeoligist, 10
Hunter, 11
Beast Disection, 13
Prayer, 53
and
Nurse, 60
I think a good labor optimization plan, would allow you to target these values (i.e. quartile & median targetting I suppose.
If you note, outlier's are listed as red x's... outlier's are calculated as beyond 1.5 the quartile's point. I think because these values are already based from 0 to 100, targetting these outlier targets for labor optimization would be a good approach. Say we targetted 1.5 out from the 1st quartile (i.e. some value below 0% and the the 1st quartile) and 1.5 from the 3rd quartile (i.e. a value between 3rq quartile and 100%), we can target proper numbers for priority and mean adjustments properly.
Just an idea, but the graphic shows the extent of the problem with the current optimization plan.
I think showing these values in game would be a tremendous help if allowed to alter the mean and scale it's range down; however, I think if the range were scaled to what was proposed with 1.5 and quartiles above, I think the ranges could be scaled to each other for proper optimization.
Notes:
I googled what the quartile range looked like for SDEV's... so 50% of the values are within these two.
2/3rd of 1 sdev = 1 quartile range.
q1=-.68 and q3=0.68
is there a way to derive what range the outlier's are caught at?
Okay, found the solution to what inner and outer fences (i.e. outlier's) measure to
Source:
http://www.syntricity.com/datablog/-/blogs/thinking-outside-the-boxplot"Inner fences represent mean +/- 2.698 standard deviations or 99.30% of the data, while outer fences represent mean +/- 4.7215 or 99.9998% of the data."
Beyond the basic boxplot, however, is Dr. John Tukey’s exploratory data analysis (EDA) boxplot that includes the notion of “fences” and “outside values.” An outside value is a value which is below the lower or above the upper fence. Fine, but how are fences defined? First, note that the interquartile range (IQR) is defined as the difference between the 75th and 25th percentiles: That is, IQR = 75th percentile – 25th percentile. Also, note that there are at least two types of fences: inner and outer. Inner fences are defined as: lower inner fence = 25th percentile – 1.5*IQR and upper inner fence = 75th percentile + 1.5*IQR. Outer fences are defined as: lower outer fence = 25th percentile – 3*IQR and upper outer fence = 75th percentile + 3*IQR.
If you assume a Gaussian (normal) distribution, how can we interpret these fences? A Gaussian distribution’s 75th percentile corresponds to the mean + 0.6745 standard deviations, and its 25th percentile corresponds to the mean – 0.6745 standard deviations. This means the IQR represents 1.349 standard deviations. Inner fences represent mean +/- 2.698 standard deviations or 99.30% of the data, while outer fences represent mean +/- 4.7215 or 99.9998% of the data.
Update:
So I think if you wanted to scale ranges around each other, this kind of math could be used.
if lower inner fence = .35%, upper inner fence = 99.65.
We could somehow cap outlier's based on these values... (if you wanted to get funky with things)
Update:
I'm thinking if you scaled all values of a role down to it's inner fences range, you'd get a fair breakdown of the range of a role. You could scale .35% starting at the lower fence range (i.e. Median - (IQR * 1.5)) and 99.65% at the upper fence range (i.e. Median + (IQR * 1.5)
I think there is more benefit on scaling these roles to each other on a similar scale before applying optimization. That way 50% of a roles values are above the center, and 50% are below. I know direct comparison between roles will be lost over a long outlook on a fort (as skills will most likely go up), but priority can be used to give a bias to more desired roles...
I'm not sure. I think deriving these values off the raw role ratings, and then transforming them into median Inter Quartile Ranges to derive acceptable Inner and Outer Fences gives us a range to transform the #'s into standardized values from 0 to 100%. For those values that FALL OUTSIDE of the inner fences, I propose scaling them from 0 to .35% (lower fence) and 99.65% (upper fence) and 100% respectively.
This would mean the Min value would be equal to 0% vs it's raw role rating, and it's max would be scaled to 100%, where-as the value defined as the upper limit (Median + (IQR * 1.5)) would be = 99.65%.
This would do some weird things, it means outlier's WOULD ALWAYS have values above 99.65% (or below .35%). So in the pdf provided. I added up that there were 438 outlier's out of 5400 values... that resulted in 8.111%... hrmmm.... well either way, that's a ~92% of values that are within a comparable range while preserving some significant order.
Telling the user what the prior avg was, and allowing him to pick a priority based on this knowledge I believe would be a good answer to the priority dilemna, the priority could be applied after the values are scaled to a new %.Based on those values... I would say my thinking could use some peer review, because 8.111% is a lot larger than (100% - 99.65%)
I think targeting proper "centers" by stretching the values next to each other [in the manner previously described] gives the most valued number of combinations when comparing roles to each other. By allowing for priority's to further weight down the values in a preference order (priority would be applied AFTER a % is derived with the previously described method). This would tell a player that he is directly affecting the MEAN/MAX value when using priority. I know it sounds complicated, but I think it's really doable, and preserves the best interests of the game as well as player. Direct role comparison isn't achieved here, but considering all labors are equally important (barring priority), it would allow for a fair comparison of values.
I'm not saying it's the best answer, but it's one I came up with. I think trying to stretch the values any other way is too arbitrary, this allows a systematic approach to scale, and then use priority to factor (i.e. priority( a new layer of ordering over.
It's also nice, because using quartile and median, I can use standard scaling methods (i.e. no need to use standard deviation, it's a simple inflation or deflation of values that preserves the same order and ratio between values