There is the age old debate of what is the best material for war hammers? Unfortunately, most tests have far too few samples to be able to have any confidence that the ranking is anything more than luck. Though there has been some good and recent
mechanical analysis which indicates density should matter only a tiny bit.
So I decided to design a new test:
Test Design:
- An Arena with 100 cells, in each cell is a 1v1 between a Dwarf and a Goblin
- The Dwarf is armed with a War Hammer made of one of the following materials: Copper, Silver, Steel, Platinum or Lead. I tried to select materials with a broad range of properties. In total there are 20 dwarves with each kind of war hammer.
- The Goblin has full iron coverage: breast plate, mail shirt, helmet, 2x gauntlets, 2x high boots, greaves, plus a leather cloak. The Goblin has no shield or weapon, and has been given a failed mood to make it a "target dummy". This is to reduce noise and make the tests much quicker to run, if two combatants disable each other it can make it takes dozens of times longer to conclude the test.
- The Dwarf and Goblins are all completely standardized using DFHack, with all size differences, personality, preferences, traits etc eliminated. This is important, because with only 20 dwarves per weapon, if some of them were larger on average this would definitely skew the results and we would just end up measuring which group had the largest dwarves relative to the goblins: seriously, we would.
- The Dwarf is given 75% of maximum physical stats, making it "very strong and very agile", it is also a proficient hammer user.
- The game is saved, and the test is re-run 25 times. All combat reports are logged by DFHack, and the logs are analyzed by a python script.
- In order to compare the weapons, we count the number of hammer bashes required to strike down the goblin. This is not intended to be representative of the number of hammer bashes to strike down a goblin in a real game, but is a useful metric to compare the performance of different materials.
- I can only see one limitation of measuring number of hammer bashes: it is possible that a heavier hammer might cause the dwarf swinging it to tire out and become exhausted faster, by measuring actions rather than time (which is difficult to measure with logs) time spent exhausted is "invisible". The main test didn't take long enough for any dwarf to become exhausted so the data isn't useful for measuring exhaustion rate but I do look at this briefly in one of the short tests.
Results500 samples. Number of hammer bashes required to strike down the ironclad Goblin:
(The lower bound and upper bound are for the 95% confidence interval for the mean)
| weapon | mean | stdev | confidence_level | margin_of_error | lower_bound | upper_bound |
|---------------------|-------|-------|------------------|-----------------|-------------|-------------|
| platinum war hammer | 16.80 | 9.81 | 0.95 | 0.86 | 15.94 | 17.66 |
| steel war hammer | 17.19 | 9.53 | 0.95 | 0.84 | 16.35 | 18.02 |
| copper war hammer | 18.14 | 10.18 | 0.95 | 0.89 | 17.25 | 19.03 |
| lead war hammer | 18.37 | 9.90 | 0.95 | 0.87 | 17.50 | 19.24 |
| silver war hammer | 18.39 | 9.53 | 0.95 | 0.84 | 17.55 | 19.22 |
The performance of all the different materials was extremely close, with the worst performer, the Silver War Hammer, requiring on average only 9% more bashes to destroy the Goblin than the best performer, the Platinum War Hammer.
Looking at the 95% confidence intervals we really can't conclude whether Copper or Silver performs better because of the near total overlap of the confidence intervals, nor whether Steel or Platinum performs better, however there is a reasonable possibility that Steel and Platinum outperform Silver.
One reason I chose to add lead (instead of iron) is because the material properties of lead are truly abysmal for weapons, being far worse than weapon grade materials. This did not seem to matter in the slightest.
This test helps to demonstrate, yet again, just how close the performance of different kinds of dense metal is for war hammers.
Quick TestsI'm also going to add some quick tests, involving only 100 fights per weapon per test instead of 500. A small number of samples can help to identify trends if they are very strong. Also it will help to demonstrate my point about the importance of large sample sizes.
Dwarves with max physical stats, legendary combat skills:| weapon | mean | stdev | confidence_level | margin_of_error | lower_bound | upper_bound |
|---------------------|-------|-------|------------------|-----------------|-------------|-------------|
| steel war hammer | 12.66 | 7.64 | 0.95 | 1.50 | 11.16 | 14.16 |
| platinum war hammer | 13.13 | 8.16 | 0.95 | 1.61 | 11.52 | 14.74 |
| lead war hammer | 13.76 | 7.43 | 0.95 | 1.46 | 12.30 | 15.22 |
| copper war hammer | 14.58 | 8.04 | 0.95 | 1.58 | 13.00 | 16.16 |
| silver war hammer | 15.53 | 7.18 | 0.95 | 1.41 | 14.12 | 16.94 |
Dwarves with max physical stats, legendary combat skills, both Dwarves and Goblins 25% bigger than average| weapon | mean | stdev | confidence_level | margin_of_error | lower_bound | upper_bound |
|---------------------|-------|-------|------------------|-----------------|-------------|-------------|
| silver war hammer | 13.97 | 7.75 | 0.95 | 1.39 | 12.58 | 15.35 |
| platinum war hammer | 14.76 | 8.35 | 0.95 | 1.49 | 13.26 | 16.25 |
| copper war hammer | 15.07 | 8.49 | 0.95 | 1.52 | 13.55 | 16.59 |
| lead war hammer | 15.13 | 8.11 | 0.95 | 1.45 | 13.67 | 16.58 |
| steel war hammer | 16.07 | 7.78 | 0.95 | 1.39 | 14.67 | 17.46 |
Dwarves with feeble (35% of max) physical stats, lvl1 skills:| weapon | mean | stdev | confidence_level | margin_of_error | lower_bound | upper_bound |
|---------------------|-------|-------|------------------|-----------------|-------------|-------------|
| steel war hammer | 84.52 | 45.58 | 0.95 | 8.93 | 75.59 | 93.45 |
| silver war hammer | 91.08 | 48.84 | 0.95 | 9.57 | 81.51 | 100.65 |
| platinum war hammer | 92.22 | 51.91 | 0.95 | 10.17 | 82.05 | 102.39 |
| lead war hammer | 93.33 | 44.34 | 0.95 | 8.69 | 84.64 | 102.02 |
| copper war hammer | 97.26 | 42.78 | 0.95 | 8.38 | 88.88 | 105.64 |
Compare with the 100% physical high skill dwarves: these dwarves took 6.7x longer to kill the Goblin. (Note: the same weak dwarves armed with Steel Picks would have no trouble striking down the ironclad goblins in about 15 hits)
In this test the weak dwarves had trouble killing the Goblin before passing out from exhaustion, some would pass out from exhaustion multiple times, this made the tests take *much* longer to run to conclusion.
Number of times passed out from exhaustion:
| Weapon | mean | stdev | 95% CI |
|----------|------|-------|-------------|
| Steel | 1.05 | 0.95 | [0.63,1.47] |
| Lead | 1.25 | 1.08 | [0.78,1.72] |
| Silver | 1.37 | 1.26 | [0.82,1.92] |
| Copper | 1.4 | 1.35 | [0.81,1.99] |
| Platinum | 1.56 | 1.27 | [1.00,2.12] |
While the platinum warhammer users did pass out from exhaustion the most, there is extreme overlap within the confidence intervals, and the platinum war hammer is roughly twice as heavy, if there was a dramatic effect it should be more obvious.
Summary of quick testsYou may have noticed the rankings jumped around wildly, like Silver and Steel totally swapped places between the two "peak physical" tests where the only thing changed was making everyone 25% bigger. This is very likely not bceause Silver performs better against larger targets, but because the sample size is much too small to draw conclusions on which material is better, the only thing being tested is who got luckiest. However the point of these tests was mainly to root out if there's any really strong effect, like weak dwarves totally sucking with platinum warhammers or large beastly strong dwarves being murder machines with platinum warhammers. No statistically significant effect was detected.
ConclusionThe Null Hypothesis that "It doesn't matter what dense metal war hammers is made from" holds up pretty well in this testing. There is no strong statistical evidence for any dense metal being better than another for striking down ironclad goblins.
However there is statistically weak evidence for Platinum and Steel being superior, a conclusion that would be supported by prior, albeit weaker, testing and some theoretical analysis based on accepted combat formulas.
As a note: in 50.07 bludgeoning weapons perform abysmally compared with steel edged weapons unless you are fighting fully steel clad dwarves from a rival dwarven civilization. If you are trying to optimize artifact weapon creation for effectiveness (rather than value), you should ALWAYS try to have the dwarf use steel rather than platinum, if the outcome is an edged weapon the result will be an extremely good weapon, but if the result is a bludgeoning weapon it won't be measurably worse as steel than if it had been made of platinum.