So I've been working on at least 30 spreadsheets to finally end up with a proper Monte Carlo simulation of Dwarf Fortress statistics based on 40.13 (I pretty much stopped playing once I started on the MC).
BTW, I'm working on the edge of my mathematical understanding. I know my stats to a point, but I know nothing of weighted factoring. It seems to preserve similar ranked values. As in, comparing a weighted factoring method with a weighted average method produces the same ranked ordering.
I didn't want to spam the thread as I was basically learning how to use Excel to do Monte Carlo. Excel is great at a lot of things, and it's scripting environment makes it awesome, but... there are a lot of things to be wanted. For one, I don't have yet index and match mastered, although I do make extensive use of vlookup and hlookup (but that's just me atm, I just wish some of my so to be desired functions were already built in).
So I get the Monte Carlo bootstrap wonder of a machination going, and pretty much emulate attributes and traits based on my inputted population proportion. Initially I was doing a min to max rand() but keeping values relevant around proportions of ranges, but then I found a way to do it around new centers using normal curves... I could use normal probability curves around a midpoint, deriving the midpoints was easy, but it was the standard deviations that would be hard.
So... by sorting each Attribute Column by value, I derived "moving average" midpoints
i.e...
ordered pair (a,b);
midpoint = (a+b) / 2;
range = b-a
since range is symmetrical to midpoint, it makes generating normal probability curves around that point a lot easier. I figured out my own way to derive a standard deviation, but a simpler way would be to take the range/4; i.e. (b - a)/4.
to generate a random # at that midpoint. I already had algorithms in place to give me lookup tables for probability, but in this case I didn't have to really use it. I only had to do a (n being count of total elements of vector attempting to simulate).
randbetween(1, n - 1)
This is because each midpoint takes a pair, and values are re-used to make another pair. Ultimately resulting in n - 1 pairs.
Once you have a randbetween, you do 2 vlookups based on this randbetween. One for the midpoint, and one for the midpoints standard deviation. Then you do a Monte Carlo:
norminv(rand(), midpoint, stdev)
I then proceeded to regen/bootstrap/MC anywhere from 10 to 1000 dwarf's where I calculated:
s transform's, minmax transforms, rankecdf transforms, etc.
I even set up skills. All in all, I generated a 24MB masterpiece of a Dwarf Fortress statistic simulator. I'm a little impressed with my incredible waste of time, but ultimately, I'm glad I learned how to do Monte Carlo in Excel.
Either way, I did have a point in compiling this simulation. To check out a Weighted Product vs a Weighted Average. Which I'm still not entirely 100% sure if I'm doing correctly (in fact I don't think I am according to the wiki, but what I am doing is true to my understanding of how factoring elements works vs adding them).
Variability in Weighted Averaging vs Weighted Factoring and how to addressNear the end of the setup, I noticed that weighted factoring and weighted averaging had a proportional relationship to each other. However, something else stood out.
A weighted factor seems to be affected moreso by the # of items it is factored against. This is because it will ultimately be reduced lower than the lowest factor. This is not the case with weighted averages. In fact, weighted averages seem to proportionate the loss equally below and above the new value.
This is where the parallel occurs between weighted averages and weighted factors. The more numbers of elements in the adding or factoring stage, the more the variability is reduced in terms of upper and lower limit (for weighted averages). It seems [the output of] weighted averages hover around a midpoint, where as weighted products just reduces the max. With factoring the CEILING is literally lowered by whatever one is factoring against. However with averaging it is in a less so way. The factored output can never be higher than the lowest factored.
Point being... the variability is affected by the # of elements. Mitigation steps have already been put in place to ensure that the [lack of] # of elements specified in a role don't become a factor. Such as when a role is missing an aspect (such as when a global weight is disabled), what will happen is a .5 value would be substituted for any weighted average values involving a disabled weight. Without this, a role with only a skill specified would be displayed as a higher value because it didn't have any other variables to hold it down by).
Initially I thought my issue with weighted factor was specific to weighted factors, but then I realized that no, steps had already been taken to mitigate it in the weighted average formula. So that's where I'm at right now, I have my weighted product demo ready, but their is a correlated relationship between the # of elements specified in a role and the variability in the min/max the role can achieve. There are a few ways to address this, but I think ultimately, the roles might be biased against each other in variability by the # of elements assigned in them.
I think ultimately to get a "fair" picture of elements between roles... is to "fake" missing elements. One idea I had was to supply .5 for the missing element. I was attempting to do a
max count of the # of elements each "aspect" in each role had, aspect defined as Attributes, Personality/Traits, Preferences, Skills.
So I'd ultimately end up with
Attributes Max: ex: 5
Preferences Max: ex: 7
Skills Max: ex: 3
Traits Max: ex: 2
That way, instead
emulate a .5 for each missing aspect/element... i.e. one could just emulate that # of elements that would be needed to equalize all roles