I don't know if you're still in on the conversation or not Maklak.
But we identified something with your original skill simulation formula Splinterz and I were discussing.
Part of the formula works with the assumption that we should be capping things at level 20.
However, other parts of the formula are not capped. Resulting in simulated levels being returned of 30 vs a raw level of 1
which is causing havoc [well in some examples] when using your formula. As a level 8 and level 18 are passed up by lower %'s when doing a labor optimization. I've verified it's with the skills specifically and the skill rate weight, otherwise my weights work appropriately.
Splinterz pointed out that the simulated formula returns values above 20 (as high in 30 in my case).
So, I thought since your initial desire was to stop comparing once legendary is achieved, shouldn't the values reported by the function also only consider a max of level 20 in all instances as well?
Here's reference code and the variables I was thinking about
are we basing sim_xp on 29000 or the highest xp a player has? If it's the highest xp a player has, then it's reporting a value not capped at 20.
Also.
does get_double_level_from_xp cap at 20? If so... then one part of the formula isn't capped at 20 and the other part is.
In fact, if any number in this formula that returns a value that is later based on a level, should be capped at 20.
double simulate_skill_gain(int xp, int rate)
{ if (xp >= max_xp)
return 20.0 / 20.0;
if (rate == 0)
return get_double_level_from_xp(xp) / 20.0; // Obviously stays the same.
int sim_xp = max_xp; // 29k seems like a good value to me.
sim_xp = (sim_xp / 100.0) * rate; // This is how much XP will go towards skill learning.
int total_xp = sim_xp;
double ret = 0.0;
//int curr_level = get_level_from_xp(xp);
double curr_level = get_double_level_from_xp(xp);
int curr_xp = xp;
while ((sim_xp > 0) && (curr_level < 20.0))
{ int xp_gap = xp_levels[int(curr_level)+1] - curr_xp; // How much XP till mext level?
if (xp_gap > sim_xp)
xp_gap = sim_xp;
ret += xp_gap * curr_level;
curr_level++;
curr_xp = xp_levels[int(curr_level)];
sim_xp -= xp_gap;
}
if (sim_xp > 0)
ret += 20 * sim_xp;
ret /= total_xp;
ret /= 20.0;
return ret;
}
However, I am also working on a transformation formula for skills in general. Above Median = 50% to 100%, and <= median equals the old value factored by a number that targets an overall mean of .5, this should address the issue with [>median] skills all being grouped at 95% to 100% range.
Update:I got a formula that transforms the lower range <=median (post ecdf/rank %).
These numbers work with the ecdf/rank% we derive, not the raws.
x = x,y in our skills*dwarfs matrix
It's based on the : sumif(x > median) / total count (aka dwarfs * skills).
difference from the same formula post conversion for >Median (which we derive first).
So this difference measures the % change in the upper (aka >Median) values, that needs to be carried over when transforming the lower (aka <= median) values, so we double the difference [of the means], let's call this factorValue.
factorValue = 1 /factorValue for values <=median.
This preserves ~.5 mean and spreads all the >median skills to 50 to 100%, and those values that were <= median are 0 to 50% respectively.
I sent Splinterz a spreadsheet breakdown (sorry, I don't have matlab, closest thing I could get my hands on would be r-project, but I don't think it's necessary).
I think we could auto check each category, by checking if min = median,
then we run this double transform. Hell, variations could be used to transform in other methods if needed, such as if min = 1st quartile, we could derive a way to fit each quartile range into it's own distribution transform. Luckily, so far the mixture models that have shown themselves only have 1 extra spike in the data, ~0 for skills and preferences, and the rest of values (aka >~0). So we're kind of splitting the ecdf/rank % into two separate distributions that we throw into a 0 to 100% range.
An opposite transform could be used for negative skews if we ever encounter any.
In the end, this allows us to "flatten" a distribution from 50 to 100%, aka, the %'s are evenly spread, just as they are with all attributes with this given range, and we're still centered around the median.
Update:
Here's some psuedo code on the method
//so skills are converted into %'s using rank% averaged with ecdf%, this becomes our SkillsECDF[] <list>.
//variables
list SkillsECDF[];
list overMedianSkillsECDF[];
list underMedianSkillsECDF[];
int totalSize = SkillsECDF[].sizeof();
float averageOfOverMedianOfTotal;
float averageOfUnderMedianOfTotal;
float lowerListFactor = 0.0
bool runFlag = 0;
//test to see if we even need to perform this deskew?
//skillsECDF-Median = SkillsECDF[].median();
if ( ( SkillsECDF[].median() == SkillsECDF[].1stquartile() ) && ( SkillsECDF[].min() != SkillsECDF[].max() ) )
{
runFlag = 1;
}
if (runFlag)
//split distributions up in two (overMedianSkillsECDF & underMedianSkillsECDF)
{
//sort list based on % value (important for last function)
//name references dwarf's name)
SkillsECDF[].sort(name)
//var: SkillsECDF[].median()
float median = SkillsECDF[].median();
//update overMedianSkillsECDF with upper values of SkillsECDF
//run through SkillsECDF
for (int x = 0; x < SkillsECDF[].sizeof(); x++)
{
//keeps track of size of overMedianSkills[]
int y =0;
//keeps track of size of underMedianSkills[]
int z =0;
//upper, overMedianSkillsECDF
if ( SkillsECDF[x] > median )
{
overMedianSkillsECDF[y] = SkillsECDF[x];
y++;
}
//lower, underMedianSkillsECDF
if ( skillsECDF[x] <= median)
{
underMedianSkillsECDF[z] = SkillsECDF[x];
z++;
}
}
//derive means of each new distribution against total count of SkillsECDF[]
averageOfOverMedianOfTotal = overMedianSkillsECDF[].sum() / totalSize;
averageOfUnderMedianOfTotal = underMedianSkillsECDF[].sum() / totalSize;
{
//local var's
float upperListNewMeanofTotalOld = overMedianSkillsECDF[].sum() / totalSize;
float upperListNewMeanofTotalNew;
//transform upper list
for (int x = 0; x < overMedianSkillsECDF[].sizeof(); x++)
{
//countif (list, criteria), works same as excel function
//rank (list, value, order), works same as excel function
// divides by 2 to scale to 0 to 50%. Add .5 to scale back up to 50% to 100%.
//basically a rank% + ecdf % averaged , divided by 2, and .5 is added.
overMedianSkillsECDF[x] =
(
( rank(overMedianSkillsECDF[], overMedianSkillsECDF[x], 1) / overMedianSkillsECDF[].sizeof() )
+
( countif(overMedianSkillsECDF[], "<=" &overMedianSkillsECDF[x]) / (overMedianSkillsECDF[].sizeof() ) )
/ 2
) + .5;
upperListNewMeanofTotal = overMedianSkillsECDF[].sum() / totalSize;
lowerListFactor = 1 + ( (upperListNewMeanofTotalOld - upperListNewMeanofTotalNew) * 2);
}
//transform lower list
for (int x = 0; x < underMedianSkillsECDF[].sizeof(); x++
{
underMedianSkillsECDF[x] = underMedianSkillsECDF[x] * lowerListFactor;
}
//replace values of SkillsECDF[] with the values generated by underMedianSkillsECDF[] & overMedianSkillsECDF[] respectively
{
/*
we then replace the values in SkillsECDF[]
with the values we generated in: underMedianSkillsECDF[] & overMedianSkillsECDF[];
so, they'll have to be mapped to a name pre/post conversion.
*/
//remember the list was sorted prior to conversion, so this should map back based on name
for (int x = 0; x < SkillsECDF[].sizeof(); x++)
{
//loop thru the upper list
for (int y = 0; y < overMedianSkillsECDF[].sizeof();
{
if (SkillsECDF[x].name == overMedianSkillsECDF[y].name)
{
SkillsECDF[x] = overMedianSkillsECDF[y];
}
}
//loop through the lower list
for (int y = 0; y < underMedianSkillsECDF[].sizeof();
{
if (SkillsECDF[x].name == underMedianSkillsECDF[y].name)
{
SkillsECDF[x] = underMedianSkillsECDF[y];
}
}
}
}
}
}
Here's a pic of it in action
http://imgur.com/rls1joXand a link to my dffd sheet of it in action
http://dffd.wimbli.com/file.php?id=8705Btw, I double tested this with two different data sets, one very large ~2400, and this one with ~35 values. Both times my targeted mean of within .005% from 50%.