the way the %'s are calculated currently is based on pre-defined models of how we believed the data falls into frequency bins.
Traits are mapped into their own frequency bins based on
http://dwarffortresswiki.org/index.php/DF2012:Personality_trait straight into a %
Attributes are a bit trickier. Initially we based it on their frequency bins as defined in
http://dwarffortresswiki.org/index.php/DF2012:Attribute, attributes have 6 distinct frequency bins; then Splinterz did the hard work of scanning mods that have different frequency bins and took the mods varying castes into account (yeah, he did all that), so we got a more true representation of all possible values. However, we realized that attributes can increase! (unlike traits). So we had to come up with a way to scale the attributes from something like 5% to 95% based on created maximum possible value. We decided then to scale up the % based on what a dwarf can train up to, so from whatever value you get from within 5 to 95%, the % can go up a bit more based on the amount of attribute potential a dwarf can train up to.(a dwarf can double his starting attribute through training), so we reserved something like 95% to 99.5% can be what a dwarf can train up to. tjos is based on some tricky sigmoid function math that takes the amount that a dwarf can train up to and ensures we never exceed the threshold 99.5%, then the last .5% is reserved if a player cheats his dwarf's values even higher.
Skills were a simple xp/max xp, but then I believe Maklak said we should be basing skills on the level of the skill vs exp. However, there is training rate in some mods. Some dwarfs train skills faster than other dwarf's. So we did the sigmoid magic again to scale up the value based on this training rate. You'll have to find the old post for it, I'm not going to go into it right now.
So then you get a 0 to 100% rating for each category, attributes, traits, skills.
Then there's preferences.
I'm not sure how Splinterz did Preferences, I believe he was just going to go with a simple additive value to the other 3. I'm not 100% sure. Preferences were hard to categorize.
the initial idea was to average the three values together using a weighted average based on the weights set for attributes, skills, traits. With preferences adding at the end, or being part of this weight.
Not to sure.
I'm not sure if Splinterz will like a new idea I have to propose, but instead of doing all that hard fitting of data to pre-defined ranges as we have done, I was going to propose just scaling the values from 0 to 100% compared to the current forts distribution of values. We did something like that initially, using statistics and a cumulative distribution function, but we found that statistics assumed a normal distribution. However, I honestly believe that can be avoided by using an empircal cumulative distribution function.
But... I haven't heard from him on my new proposal of replacing the way we calculate %'s based on raw frequency categories to a new method that uses the ecdf of the current population set.
Trickiest part with attributes and skills (vs say traits, which supposedly NEVER change from embark value), is that attributes and skills change after a dwarf is created.
The great thing about attributes, is the amount a dwarf can "train" up to IS ALSO SET AT EMBARK, AND IS BASED ON THE INITIAL (aka VALUE) of the attribute; however, an attribute can also "decay" below starting value, which means that even below the lowest possible embark possible value (aka an ecdf of all possible starting values would produce a percent from 0 to 100% AND STILL wouldn't be descriptive of all possible ranges a fort can have). What's hidden in the attribute starting value (which is stored in memory) is the amount a dwarf can increase/decay from. So when we hard modelled the data, we had to incorporate that.
An ecdf of the current values could take that into account, but some similar comparison on initial value would have to be performed on the values' current ordinal position compared with the rest of the data in the set, and what it's max ordinal [derived from the actual starting value of the attribute] position is compared with the rest of the data. This means that a lot more would be stored in an ecdf conversion for an attribute. Keeping track of it's initial, max, and current value would give it a 3 input variable that would be converted into a relative new # that is fed into an ecdf. I guess you would take the current formula, and just before you transform it into a %, you just run those #'s through an ECDF function.
Skills could be done the exact same way, but flagged skewed data before it's ran. 0 values always [should] remain 0.
Flagging is possible by using an ECDF conversion of a set of values produces a abs((mean average) - .5 )>.275, which is our error check function, flags the distribution as having an unnaceptable new mean. Means should be within .25 from .5 to be considered normalized to each other (I actually tested this).
There's another test for skew: if the data has one value that is repeated >50% across the set of values; simple formula if (median of data set / count of data set) > 50%, then the distribution is skewed.
Then instead of ecdf, we do max-min conversion from 0 to 100% of the original skill exp or maklak formula, doesn't matter, as long as 0 still remains 0%. aka we do a (x - min) / (max - min) conversion (one issue with this is the conversion sets the lowest value to 0%, which was only intended for the value of 0 itself.
Update: I proposed running the non 0 values through their own ecdf (by removing 0's), then reinputting 0 values into the list.The great thing about ecdf, is we can apply this same skewed concept to
traitspreferences. we can rank dwarf's from least # of matching
traits preferences to most # of matching
traits preferences and get a good 0 to 100% rank of our dwarf's who have or have not any preferences. I'm still thinking on that one.
Yeah, I could talk all day about this, but I've got a lot of thinking to do about ecdf conversions right now.