Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Armor Material Science: how do materials compare against goblin weapons? 50.07  (Read 1589 times)

Panando

  • Bay Watcher
    • View Profile

Hypothesis

My hypothesis is that high quality armor (steel or better) doesn't actually provide much advantage in goblin vs dwarf combat, this is from weapon testing that shows that even at equal material levels edged weapon damage mostly gets converted to bludgeoning that bypasses armor.

Test Setup

Attacker: The Attacker is equipped with a weapon and is skilled. Their strength attributes follows a standardized distribution based on the strength attribute ranges for goblins: there are 24 attackers, so 3 are in each strength interval, so basically 3 are weak, 3 are extremely strong and so in in between. I felt it important to have some very strong attackers, since strength can make a large difference in the ability to penetrate armor when materials are equal.
Target: The Target is equipped with full metal armor and is proficient in combat skills but has normal stats otherwise they'd just be too hard to kill (the test would take way too long for a 1v1), they have a failed mood to make them a target dummy. Note in a real game your professional military dwarves would be much harder to kill than in this test due to much higher attributes and skills: but they'd also often be very badly out-numbered.

I decided to test a fairly representative sample of goblin weapons:

Iron Spear
Iron Battle Axe
Iron Mace
Iron Whip

Armor Materials:

Copper
Bronze
Iron
Steel
Adamantine


For all tests there are 24 pairings per weapon, and the test is repeated 4 times, for 96 samples per weapon. The mean is for the number of hits required to strike down the target, the lower and upper bound are the 95% confidence interval for standard error of the mean (as an oversimplification, "it can be estimated this experiment had a 95% chance of capturing the true population mean within these bounds")

Vs Copper Armor
Code: [Select]
| weapon          | mean | stdev | lower_bound | upper_bound |
|-----------------|------|-------|-------------|-------------|
| iron spear      | 13.0 | 7.4   | 11.5        | 14.5        |
| iron battle axe | 20.2 | 10.8  | 18.0        | 22.3        |
| iron whip       | 29.9 | 20.0  | 25.9        | 33.9        |
| iron mace       | 62.1 | 79.7  | 46.1        | 78.0        |

The spear slices effortlessly through the copper armor, and it also doesn't give the axe too much pause.

Vs Bronze Armor
Code: [Select]
| weapon          | mean | stdev | lower_bound | upper_bound |
|-----------------|------|-------|-------------|-------------|
| iron whip       | 32.2 | 16.0  | 29.0        | 35.4        |
| iron mace       | 52.1 | 52.5  | 41.6        | 62.6        |
| iron battle axe | 88.7 | 38.5  | 81.0        | 96.4        |
| iron spear      | 95.3 | 67.3  | 81.9        | 108.8       |

The improvement from copper armor is dramatic, the spear went from a highly lethal murder device to being practically incapable of inflicting damage.

Vs Iron Armor
Code: [Select]
| weapon          | mean  | stdev | lower_bound | upper_bound |
|-----------------|-------|-------|-------------|-------------|
| iron whip       | 31.7  | 18.1  | 28.1        | 35.4        |
| iron mace       | 69.2  | 58.3  | 57.6        | 80.9        |
| iron battle axe | 89.0  | 43.1  | 80.4        | 97.6        |
| iron spear      | 100.3 | 69.1  | 86.5        | 114.1       |

The difference between bronze and steel is small, and generally well within error bounds, but material properties do suggest iron should be a little better.

Vs Steel Armor
Code: [Select]
| weapon          | mean  | stdev | lower_bound | upper_bound |
|-----------------|-------|-------|-------------|-------------|
| iron whip       | 29.8  | 18.4  | 26.1        | 33.5        |
| iron mace       | 86.3  | 51.5  | 76.0        | 96.6        |
| iron battle axe | 92.3  | 36.5  | 85.0        | 99.6        |
| iron spear      | 115.2 | 97.2  | 95.7        | 134.6       |

Steel seems to be slightly outperforming iron, especially against the mace.

Vs Addy Armor
Code: [Select]
| weapon          | mean  | stdev | lower_bound | upper_bound |
|-----------------|-------|-------|-------------|-------------|
| iron whip       | 30.6  | 18.3  | 27.0        | 34.3        |
| iron mace       | 90.9  | 65.0  | 77.9        | 103.9       |
| iron battle axe | 96.9  | 42.0  | 88.5        | 105.3       |
| iron spear      | 127.1 | 87.6  | 109.6       | 144.7       |

If you were hoping for dramatic improvements from addy, I hope you were also prepared for disappointment. Addy might be offering a very slight advantage over steel, though the whip simply does not care.

Remarks

Maces REALLY suck. That the Mace was barely better than the Battle Axe even when facing armor which is literally impenetrable to the Axe shows just how bad maces are. Maces consistently perform poorly. Granted it would be expected that low contact area blunt weapons: war hammer and morningstar, would perform better, but the concept that bludgeoning weapons are good against armor is an oversimplification, since apparently bludgeoning with a battle axe is almost as effective as with a mace. Why exactly is that the case? Drilling into the logs explains why.

Generally speaking the goblins were defeated by passing out from pain or exhaustion, once the goblin was exhausted and the dwarf was not, the dwarf would pull the helmet off the goblin and then finish them off. The mace enjoys such low lethality that usually the combatants pass out from exhaustion. The reason why the mace could sometimes get a faster kill, is sometimes it could more often cause debilitating pain even through the armor, causing the goblin to pass out from pain, then the helmet would be removed allowing solid hits to the skull. The mace couldn't kill through the armor, but passing out from pain could happen faster than passing out from exhaustion. The spear of course, was even more reliant on just waiting for the goblin to pass out from exhaustion.

Another fun thing from the logs was hundreds of reports like this:
Code: [Select]
Goblin 96 lashes Dwarf 96 in the upper body with his iron whip, bruising the muscle and chipping the left false ribs through the XXsmall adamantine breastplateXX!
The XXsmall adamantine breastplateXX breaks!

That is not only was the whip destroying the dwarf through the armor, but it was effortlessly destroying the armor too! Note that only the rigid armor was subject to being destroyed, the addy mail shirt was fine.

Conclusions

Armor offers truly minimal protection from goblin whips, with there being practically no difference between copper and addy, but whips will effortlessly rip armor to shreds which really sucks in the case of addy armor since it has a bad melt ratio for recycling. Also while iron or bronze offer dramatically better protection than copper, the advantage of steel or addy over iron are quite small, but probably real. The smallness of this advantage is because iron is already very good at deflecting any edged attack from an iron weapon, while bludgeoning damage is largely bypassing armor anyway at least when it comes to pulling joints and such.

At a strategic level, this means that priority should be given to issuing dwarves with at least bronze or iron armor, actually upgrading to steel can be left at a very low priority since the advantages are not big. This contrasts with weapons: steel weapons are drastically better than iron weapons, in fact the difference between steel and iron is greater than the difference between iron and copper, so edged weapons that pretty much bounce off iron armor turn into lightsabers when made of steel.

There are very particular scenarios where steel armor would be more valuable. While steel weapons are rare in the hands of the enemies of dwarves, they are not unheard of, in particular cavern dwellers can often be found with steel spears, steel armor will offer dramatically improved protection from these steel spears.

In general: Iron or Bronze does an excellent job of protecting from the highly lethal edged weapons: spear, axe, sword etc, but armor of any material does a poor job of protecting from the less lethal bludgeoning weapons, although perhaps still good enough against most bludgeoning weapons (not whips) that the main threat becomes passing out from exhaustion.
« Last Edit: March 14, 2023, 04:18:40 am by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Asdfe

  • Escaped Lunatic
    • View Profile

Good to know. Save the steel for weapons and use bronze and iron armor, unless you've got steel to spare.
Logged

Panando

  • Bay Watcher
    • View Profile

Yes.

Also something worth talking about is what pieces to prioritize steel for.

In testing the breastplate, helms and greaves prove very difficult to penetrate with axes and swords (large contact area attacks), while the limbs and neck are rather easier to penetrate - an axe hack or sword slash will deflect a lot, but how it works is hits to the helm, breastplate and greaves nearly always deflect, while hits to the hands, feet and neck might only deflect half the time. This means that the armor pieces that protect the limbs - gauntlets, high boots and mail shirt - should probably be prioritized. This doesn't really apply to the small contact area attacks: spear and sword stab, though these still have an easier time getting through a single layer of armor, such as the gauntlets, boots or helm, than through both a breastplate/greaves and mail shirt.

Overall then I'd be inclined to prioritize steel for gauntlets, high boots and mail shirt, then the helm, and finally the greaves and breastplate.
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Superdorf

  • Bay Watcher
  • Soothly we live in mighty years!
    • View Profile

Overall then I'd be inclined to prioritize steel for gauntlets, high boots and mail shirt, then the helm, and finally the greaves and breastplate.

Useful stuff!

I've been kinda reluctant to get back into DF after the Steam release, but all this !!SCIENCE!! has me sorta interested again... if only I had time :-X
well done, good sir*

*or madam
**or whatever
Logged
Falling angel met the rising ape, and the sound it made was

klonk
tormenting the player is important
Sigtext

Urist McNobody

  • Bay Watcher
    • View Profile

There's something suspicious about this data.  I'm picking on one row in particular for exposition, but this anomaly applies to all of your reports.

Code: [Select]
| weapon          | mean | stdev | lower_bound | upper_bound |
|-----------------|------|-------|-------------|-------------|
| iron mace       | 62.1 | 79.7  | 46.1        | 78.0        |

Upper and lower bounds from a 95% confidence interval, typically defined as mean +/- 2*standard deviations.  The reported mean is centered in-between the bounds (avg(lower, upper) == mean).  But the separation on the bounds implies a one-sigma standard deviation of about 8 ((upper - lower) / 4).  Where does 79.8 come from?
Logged

Panando

  • Bay Watcher
    • View Profile

There's something suspicious about this data.  I'm picking on one row in particular for exposition, but this anomaly applies to all of your reports...

The lower bound and upper bound is for the standard error of the mean. https://en.wikipedia.org/wiki/Standard_error#Standard_error_of_mean_versus_standard_deviation

Basically the standard error is estimated as the sample standard deviation divided by the square root of the sample size.

In some cases I've posted all the values dumped as in this post: http://www.bay12forums.com/smf/index.php?topic=181479.0 (margin_of_error = standard error * 1.96)

If the sample size is contextually small such as binomial distributions (like win/loss) I use appropriate methods (from scipy) to generate an unbiased standard error.
« Last Edit: March 14, 2023, 04:48:05 am by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Urist McNobody

  • Bay Watcher
    • View Profile

OK, I'm familiar with RMSE.  Using standard deviation divided by sqrt(N) is perfectly reasonable when the samples are uncorrelated and gaussian-distributed.

I'm struggling with the level of dispersion required to calculate mean of 62 with sample standard deviation of 80 from positive numbers alone.  Is it highly clustered?  Are there cases that caused fatal injuries far more or less rapidly than the mean?
Logged

Panando

  • Bay Watcher
    • View Profile

Are there cases that caused fatal injuries far more or less rapidly than the mean?

Yes, exactly. You can basically get anything from the first hit being a lethal blow to the neck or head, to the attacker spending a long time wailing away totally ineffectually until the target passes out from exhaustion and gets the helmet pulled off. In this test the dispersion is particularly high because instead of using standardized strength, I used a distribution of strengths following the population distribution formula (setting the strength according to the actual formula used to randomize attributes, such that the 24 attackers are nearly perfectly representative of a large population of actual goblins). This means the strongest attackers are 5x stronger than the weakest, a difference of 5x in strength is absolutely huge in terms of ability to penetrate armor and deal damage through armor.

I also did a series where all the attackers had exactly average strength, this resulted in a much lower standard deviation, however it's badly unrepresentative of actual goblin attackers which do vary from very weak to unquestionably strong.

I'm happy to give some frequency distributions, these are all for the "vs steel armor" case:



As you can see some of the distributions are not that ab-normal, though do tend to have long tails / outliers. Except the whip which tries to be uniform (I think it's this way because of the strength distribution). Also tbh the spear looks kind of like a lognormal distribution.

And cut off the exhaustion outliers and plot using the same x-axis:



The spear does okay (well, it's still actually terrible) but the mean is being heavily effected by the long tail.

If we plot the frequency histograms for the mace and spear, excluding outliers:



We see the mace and spear have the same peak, but quite different skews, which the mace more often getting faster kills.

Outliers are always tricky to account, because excluding them makes the samples more normal, but the outliers are also real samples, it could just be justified that maybe they should be excluded because something they are obviously of no threat, but if the armor is rendering some of the enemies no threat that's something that definitely should be accounted for and not discarded.

Finally: Aggregated all the data for each weapon and made frequency histrograms with same x and y axis for a visualization of how easily the weapons are dealing with the armor:



Anyway, as I said in the warhammers thread:

Quote
Good questions and I'll give some thought on how best to design tests for armor.

Both the design of the test and the approach for analyzing the samples are definitely tricky and it takes some feedback and iteration to get quality results. Exhaustion especially is a major source of abnormality, exhaustion does definitely matter, but the 1v1 doesn't capture exhaustion in the same way as a real battle where a dwarf might be fighting 1v5, meaning the dwarf would have to receive 5x as many (attempted) blows before becoming exhausted. So like, a good test design might be to have 1 dwarf, facing 6 goblins, with a representative strength distribution, so then if the weakest goblin isn't contributing it just means the dwarf takes a little longer to die instead of having highly skewed samples.

Though ideally, it'd be nice to be able to account for armor that makes a dwarf *totally invulnerable*, something that "number of blows to kill" obviously can't, since it'd result in a mean trending towards infinity.
« Last Edit: March 14, 2023, 01:01:27 pm by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Schmaven

  • Bay Watcher
  • Abiding
    • View Profile

If there is some way to create standardized target dummies in adventure mode, you could then test strikes to specific pieces of armor, but testing would take a lot longer for similar sample sizes if you had to personally direct each strike.

I hypothesize that certain pieces of armor might be just as (in)effective regardless of material (gauntlets?), whereas others would be a higher priority use of better metals, such as breastplates
« Last Edit: March 14, 2023, 08:38:57 pm by Schmaven »
Logged

Urist McNobody

  • Bay Watcher
    • View Profile

Thank you, this is very helpful.  I thoroughly appreciate frequency histograms like this for non-Gaussian problems.  A cumulative probability distribution is also nice, since it allows you to recover something meaningful from the vertical axis independent of the number of bins.  Therefore the bin size can just be the natural discretization level of the data: 1 wide.  It also lets you answer strategic questions like "what percentage of the time will a defender die in N or fewer blows in this situation?"  To the extent that all of the cases under test have long fat tails** thanks to the strength distribution of a Standard Goblin Horde, the "core" of the cumulative distribution should still help to tease out meaningful details between the armor classes.  Its hard to tell just by eyeballing the frequency diagrams and trying to imagine what their integral looks like, but there does appear to be a shift to the right for the bronze->iron->steel armor sequence.


** Those lashers, though.  If it wasn't for the fact that my silly dorfs have no respect for target priority, those guys just bubbled up to the top of the target priority list.
Logged