So turns out all my work on s transform and integral's has lead me full circle.
Doesn't look like an approximated integral (using trapezoid rule) doesn't yield me any difference in results than from an initial minmax transform.
DT uses a slight variant from a minmax transform, and puts the mean at 50% by doing a minmax transform around the mean for 0 to 50% and 50% to 100%.
Then a similar transform is done again to recenter on the median.
Resulting in [more or less] a nicely distributed set of values.
I was hoping that by approximating "integral's" I would be able to find a better "curve". It appears that the concept of using the trapezoid rule to approximate integrals produces %'s that are exactly the same as a minmax transform. I was struggling with why my integral's 50% point did not represent the mean, but rather the midrange: (max - min) /2.
However, I know why [now], because that's what a minmax transform is, and an approximated integral based on the trapezoid rule produces the same #'s.
So I just wasted a whole lot of time going full circle.
However, what I would like to focus on. Is how to deal with datasets whose overall average doesn't = 50%.
I fear that combining them together will slightly skew the overall weighted average of values.
The data is already normalized <>50% on a 50/50 split. (
except for extremely skewed datasets that contain a lot of null values:
skills/preferences, but there mean is ~.5)
So I was thinking, since the drawing method has been updated to draw from min/max of weighted average outputs, and 50% = median.
Why not do something similar, but instead of adding up the %'s as if they are 0 to 100%.
Subtract .5 from each %
and get a set of values that are
~-50% to ~+50%
but has a perfect even average of 0.updateyou know what, I don't think that would do anything. One would have to subtract from the datasets mean, and that would have a different issue
Since we end up doing a final transform on the outputted data, and it seems that the #'s on the backend never really add up to 0 or 100% (hence why the new drawing method is used). We can still work with these #'s, albeit in a slightly better mean adjusted approach.