Having refined my techniques and scripts some, I have decided to do some hopefully solid tests of quality.
My main goal is testing how much more damaging the weapon is at higher quality.
Some biases I am trying to avoid:
- There is team/unit order bias, where low# creatures get to move first and thus have an advantage in fights. For this reason I always have the creatures whose performance is being measured on team 1, and compare performance between reloads with equipment being modified by DFHack.
- Even Arena created creatures that have identical physical and mental attributes, do not perform identically in battle, though I am improving my use of DFHack to make them more conformant. But until such time I'm confident I can create totally standard creatures, using the exact same creatures and reloading seems a good solution.
My hope is that by using Reload+DFHack I can have the best possible control over all parameters, being confident that only one parameter is being changed per test run.
Test 1: Pitting the worst enemies of dwarves against each other. 1000x Goblin vs Elephant fights.- Goblin vs Elephant fight. The creatures are both totally standard Arena, with no skills and no armor or clothing.
- The Goblin is simply equipped with an Iron Axe, which is set to either No Quality or Masterwork using DFHack.
- The Arena has 100 Goblin vs Elephant pairings in individual cells. The game is saved and reloaded 10 times, each instance the game is allowed to run until there are 0 or 1 creatures in each cell. In total for each quality level, there are 1000 fights.
- I have a python script which can digest the combat reports, split them into individual fights and generate statistics.
- We are interested in knowing the probability of killing the Elephant (we don't care whether the Goblin lives), and the number of hits it took to kill the Elephant.
Probability of killing the Elephant (CI is determined using adjusted Wald technique):No-quality Axe: 0.310 : 95% CI [0.282, 0.339]
Masterwork Axe: 0.371 : 95% CI [0.342, 0.401]
With the masterwork axe, the Goblin won 19% more often. The 95% confidence interval can make us reasonably but not too confident that the weapon quality made a difference.
Considering fights where the Elephant dies, the number of hacks required to defeat it:No-quality Axe: 47.5 ± 9.9 : 95% CI [46.4, 48.6]
Masterwork Axe: 38.82 ± 5.1 : 95% CI [38.3, 39.3]
With the masterwork axe the goblin required only 82% as many hacks to kill the elephant. The number of hacks provides statistically stronger evidence that the masterwork axe is providing an advantage, and the masterwork axe also seems to be more consistent.
Overall the masterwork axe appeared to give approximately a 20% advantage to these average attribute unskilled Goblins.
Second Test: Revenge of the Goblins. 500x Axe Goblin vs Armored Dwarf fights.This test is constructed a bit differently.
- In each of the 100 cells, is 1 goblin vs 5 dwarves, however the dwarves have all been rendered incapable of attacking by giving them a failed mood so the Goblin is in no risk of losing. The creature AI means the Goblin fights each dwarf in series, only moving to the next dwarf once its current target is dead.
- The Goblin is armed with a Battle Axe and the Dwarves have full iron armor coverage.
- The weapon has its material and quality and sharpness changed by using DFHack.
- There are 500 fights in total, but the Goblins do get tired and increase in skill during the combat. This is somewhat representative of dwarves slaughtering multiple enemies in battle.
We are measuring how many hacks with the axe it takes to kill each Dwarf.
| Weapon | Mean ± SD | 95% CI |
|----------------------|-------------|--------------|
| No quality Iron Axe | 22.7 ± 15.8 | [21.3, 24.1] |
| Masterwork Iron Axe | 21.4 ± 13.2 | [20.2, 22.5] |
| No quality Steel Axe | 8.4 ± 3.7 | [8.1, 8.7] |
| Masterwork Steel Axe | 8.2 ± 3.7 | [7.9, 8.6] |
There is a dramatic improvement going from Iron to Steel Axe, requiring much fewer hits and even more consistent performance.
But strangely in this scenario weapon quality has a not very statistically significant effect, though the masterwork weapons did perform better the sample isn't big enough that we can draw conclusions.
Perhaps weapon:armor interactions do not account for weapon quality or sharpness: by the known formulas the weapon sharpness *should* matter but this really doesn't seem to be manifesting. Future testing should be able to expand knowledge in this area.
Weapon DurabilityWhile being masterwork didn't measurably improve the damage of the weapons, what about their durability? In the process of chopping up 5 iron-armored dwarves, the iron battle axes would take some wear (though the steel axes didn't).
The quality of the axes at the end of the fights:
| Quality | | x | X |
|------------|----|----|---|
| No quality | 88 | 10 | 2 |
| Masterwork | 92 | 8 | 0 |
The no quality axes did seem to take more wear and were the only ones to ever get to X wear, though it's hard to make any definite conclusions with this data. If I'd had the foresight I would've used DFHack to dump the exact durability values for all axes, which I think would have been conclusive.
100x Peak Axe Goblin vs Zombie ElephantCompared with previous experiments, for this one my techniques had further improved. It turns out that in Arena, while creatures are created with standardized physical attributes, their size is variable. I used DFHack to standardize the size of all combatants.
- Fights are 1v1 Axe Goblin vs Animated Dead Elepehant.
- The Goblin has all physical attributes maxed, has Grand Master combat skills and is armed with an Iron Battle Axe.
- The Elephant is Animated dead with a failed mood to make it a target dummy. It is healed with DFHack `full-heal`, this is because zombies seem to be substantially harder to destroy if some of their soft parts are missing, probably because the head has to take a certain amount of damage and if the brain is missing that damage has to be done to the skull.
- I only did 100 fights per quality level. The results are clear enough with this number of fights.
Hacks required to destroy the Elephant:
| qual | mean | sdev | 95% CI |
|------|------|------|--------------|
| 0 | 77.4 | 28.0 | [71.9, 82.9] |
| 1 | 69.8 | 22.7 | [65.3, 74.2] |
| 2 | 64.3 | 18.9 | [60.6, 68.0] |
| 3 | 59.1 | 21.5 | [54.9, 63.3] |
| 4 | 51.4 | 22.7 | [46.9, 55.8] |
| 5 | 40.6 | 25.1 | [35.7, 45.5] |
In these trials, the Masterwork Axe required only 52% as many hacks to destroy the Animated Dead Elephant (speculation: against sufficiently tough opponents, damage is proportional to sharpness?). Furthermore, per quality level the improvement is fairly linear, with no obvious huge advantage for being masterwork rather than exceptional. The confidence intervals allow for quite definite statements about the benefits of quality.
Unlike in the armor trials, having to "mangle" the head of a large, durable, unarmored creature gives a huge advantage to the higher quality axes.
Hammers- The setup is the same as above, but I forgot to set the Goblin to grand master in hammer skill (this may not be the biggest deal since the zombies aren't trying to evade).
- The axe was transformed into a hammer by using DFHack. This may seem unnatural but in my testing transformed weapons work completely normally.
Bashes to defeat the elephant:
| qual | mean | sdev | 95% CI |
|------|------|------|--------------|
| 0 | 57.6 | 15.6 | [54.5, 60.6] |
| 5 | 56.7 | 16.5 | [53.5, 59.9] |
The masterwork hammer did not perform measurably better than the no quality hammer. Unfortunately, other preliminary tests with blunt weapons supports the hypothesis that blunt weapons do not benefit from quality, at least in terms of damage inflicted upon a successful hit.
Note: Though I only count bashes in the above statistics, the total combat duration including misses and such was just as similar, the masterwork hammers were not hitting more often or something.
Conclusions so farHigh quality edged weapons seem to provide a large advantage when hacking into flesh and bone, but low or no advantage against standard quality armor. This is perhaps
most important when fighting zombies when extra sharpness up to doubles the speed at which the zombie is sufficiently "mangled" to be struck down.
Blunt weapons do not seem to benefit from quality in terms of damage inflicted on a successful hit, nor does the masterworkness seem to alter the chance of hitting/missing a failed mood target dummy.
My speculation is that quality by itself does nothing in terms of damage, and all benefits come from the extra sharpness for a higher quality weapon. By the formulas I feel that sharpness
should be applied when cutting through armor, but it seems it isn't, I'd guess this could either be a bug or a deliberate balancing mechanism to not make edged weapons excessively powerful relative to blunt weapons when at high quality.
Future TestingI'll probably do a weapons roundup. I am getting very fond of the "failed mood target dummy" and counting number of combat actions it takes to defeat the target dummy. I believe this greatly reduces the noise compared with fighting an opponent that can fight back. Also the question of "how quickly can an enemy be struck down so the next enemy can be attacked" is very useful. However, if masterwork weapons give accuracy bonuses I don't think a failed mood target would properly represent that as they don't seem to use their defensive abilities.