Are there cases that caused fatal injuries far more or less rapidly than the mean?
Yes, exactly. You can basically get anything from the first hit being a lethal blow to the neck or head, to the attacker spending a long time wailing away totally ineffectually until the target passes out from exhaustion and gets the helmet pulled off. In this test the dispersion is particularly high because instead of using standardized strength, I used a distribution of strengths following the population distribution formula (setting the strength according to the actual formula used to randomize attributes, such that the 24 attackers are nearly perfectly representative of a large population of actual goblins). This means the strongest attackers are 5x stronger than the weakest, a difference of 5x in strength is absolutely huge in terms of ability to penetrate armor and deal damage through armor.
I also did a series where all the attackers had exactly average strength, this resulted in a much lower standard deviation, however it's badly unrepresentative of actual goblin attackers which do vary from very weak to unquestionably strong.
I'm happy to give some frequency distributions, these are all for the "vs steel armor" case:
As you can see some of the distributions are not that ab-normal, though do tend to have long tails / outliers. Except the whip which tries to be uniform (I think it's this way because of the strength distribution). Also tbh the spear looks kind of like a lognormal distribution.
And cut off the exhaustion outliers and plot using the same x-axis:
The spear does okay (well, it's still actually terrible) but the mean is being heavily effected by the long tail.
If we plot the frequency histograms for the mace and spear,
excluding outliers:
We see the mace and spear have the same peak, but quite different skews, which the mace more often getting faster kills.
Outliers are always tricky to account, because excluding them makes the samples more normal, but the outliers are also real samples, it could just be justified that maybe they should be excluded because something they are obviously of no threat, but if the armor is rendering some of the enemies no threat that's something that definitely should be accounted for and not discarded.
Finally: Aggregated all the data for each weapon and made frequency histrograms with same x and y axis for a visualization of how easily the weapons are dealing with the armor:
Anyway, as I said in the warhammers thread:
Good questions and I'll give some thought on how best to design tests for armor.
Both the design of the test and the approach for analyzing the samples are definitely tricky and it takes some feedback and iteration to get quality results. Exhaustion especially is a major source of abnormality, exhaustion does definitely matter, but the 1v1 doesn't capture exhaustion in the same way as a real battle where a dwarf might be fighting 1v5, meaning the dwarf would have to receive 5x as many (attempted) blows before becoming exhausted. So like, a good test design might be to have 1 dwarf, facing 6 goblins, with a representative strength distribution, so then if the weakest goblin isn't contributing it just means the dwarf takes a little longer to die instead of having highly skewed samples.
Though ideally, it'd be nice to be able to account for armor that makes a dwarf *totally invulnerable*, something that "number of blows to kill" obviously can't, since it'd result in a mean trending towards infinity.