Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Scientific Testing of Weapon Quality (50.07)  (Read 2272 times)

Panando

  • Bay Watcher
    • View Profile
Scientific Testing of Weapon Quality (50.07)
« on: February 23, 2023, 11:55:05 am »

Having refined my techniques and scripts some, I have decided to do some hopefully solid tests of quality.

My main goal is testing how much more damaging the weapon is at higher quality.

Some biases I am trying to avoid:
  • There is team/unit order bias, where low# creatures get to move first and thus have an advantage in fights. For this reason I always have the creatures whose performance is being measured on team 1, and compare performance between reloads with equipment being modified by DFHack.
  • Even Arena created creatures that have identical physical and mental attributes, do not perform identically in battle, though I am improving my use of DFHack to make them more conformant. But until such time I'm confident I can create totally standard creatures, using the exact same creatures and reloading seems a good solution.
My hope is that by using Reload+DFHack I can have the best possible control over all parameters, being confident that only one parameter is being changed per test run.

Test 1: Pitting the worst enemies of dwarves against each other. 1000x Goblin vs Elephant fights.

  • Goblin vs Elephant fight. The creatures are both totally standard Arena, with no skills and no armor or clothing.
  • The Goblin is simply equipped with an Iron Axe, which is set to either No Quality or Masterwork using DFHack.
  • The Arena has 100 Goblin vs Elephant pairings in individual cells. The game is saved and reloaded 10 times, each instance the game is allowed to run until there are 0 or 1 creatures in each cell. In total for each quality level, there are 1000 fights.
  • I have a python script which can digest the combat reports, split them into individual fights and generate statistics.
  • We are interested in knowing the probability of killing the Elephant (we don't care whether the Goblin lives), and the number of hits it took to kill the Elephant.

Probability of killing the Elephant (CI is determined using adjusted Wald technique):
Code: [Select]
No-quality Axe: 0.310 : 95% CI [0.282, 0.339]
Masterwork Axe: 0.371 : 95% CI [0.342, 0.401]

With the masterwork axe, the Goblin won 19% more often. The 95% confidence interval can make us reasonably but not too confident that the weapon quality made a difference.

Considering fights where the Elephant dies, the number of hacks required to defeat it:
Code: [Select]
No-quality Axe: 47.5 ± 9.9 : 95% CI [46.4, 48.6]
Masterwork Axe: 38.82 ± 5.1 : 95% CI [38.3, 39.3]

With the masterwork axe the goblin required only 82% as many hacks to kill the elephant. The number of hacks provides statistically stronger evidence that the masterwork axe is providing an advantage, and the masterwork axe also seems to be more consistent.

Overall the masterwork axe appeared to give approximately a 20% advantage to these average attribute unskilled Goblins.

Second Test: Revenge of the Goblins. 500x Axe Goblin vs Armored Dwarf fights.

This test is constructed a bit differently.

  • In each of the 100 cells, is 1 goblin vs 5 dwarves, however the dwarves have all been rendered incapable of attacking by giving them a failed mood so the Goblin is in no risk of losing. The creature AI means the Goblin fights each dwarf in series, only moving to the next dwarf once its current target is dead.
  • The Goblin is armed with a Battle Axe and the Dwarves have full iron armor coverage.
  • The weapon has its material and quality and sharpness changed by using DFHack.
  • There are 500 fights in total, but the Goblins do get tired and increase in skill during the combat. This is somewhat representative of dwarves slaughtering multiple enemies in battle.

We are measuring how many hacks with the axe it takes to kill each Dwarf.
Code: [Select]
| Weapon               | Mean ± SD   | 95% CI       |
|----------------------|-------------|--------------|
| No quality Iron Axe  | 22.7 ± 15.8 | [21.3, 24.1] |
| Masterwork Iron Axe  | 21.4 ± 13.2 | [20.2, 22.5] |
| No quality Steel Axe | 8.4 ± 3.7   | [8.1, 8.7]   |
| Masterwork Steel Axe | 8.2 ± 3.7   | [7.9, 8.6]   |

There is a dramatic improvement going from Iron to Steel Axe, requiring much fewer hits and even more consistent performance.

But strangely in this scenario weapon quality has a not very statistically significant effect, though the masterwork weapons did perform better the sample isn't big enough that we can draw conclusions.

Perhaps weapon:armor interactions do not account for weapon quality or sharpness: by the known formulas the weapon sharpness *should* matter but this really doesn't seem to be manifesting. Future testing should be able to expand knowledge in this area.

Weapon Durability

While being masterwork didn't measurably improve the damage of the weapons, what about their durability? In the process of chopping up 5 iron-armored dwarves, the iron battle axes would take some wear (though the steel axes didn't).

The quality of the axes at the end of the fights:
Code: [Select]
| Quality    |    | x  | X |
|------------|----|----|---|
| No quality | 88 | 10 | 2 |
| Masterwork | 92 | 8  | 0 |

The no quality axes did seem to take more wear and were the only ones to ever get to X wear, though it's hard to make any definite conclusions with this data. If I'd had the foresight I would've used DFHack to dump the exact durability values for all axes, which I think would have been conclusive.

100x Peak Axe Goblin vs Zombie Elephant

Compared with previous experiments, for this one my techniques had further improved. It turns out that in Arena, while creatures are created with standardized physical attributes, their size is variable. I used DFHack to standardize the size of all combatants.

  • Fights are 1v1 Axe Goblin vs Animated Dead Elepehant.
  • The Goblin has all physical attributes maxed, has Grand Master combat skills and is armed with an Iron Battle Axe.
  • The Elephant is Animated dead with a failed mood to make it a target dummy. It is healed with DFHack `full-heal`, this is because zombies seem to be substantially harder to destroy if some of their soft parts are missing, probably because the head has to take a certain amount of damage and if the brain is missing that damage has to be done to the skull.
  • I only did 100 fights per quality level. The results are clear enough with this number of fights.

Hacks required to destroy the Elephant:
Code: [Select]
| qual | mean | sdev | 95% CI       |
|------|------|------|--------------|
| 0    | 77.4 | 28.0 | [71.9, 82.9] |
| 1    | 69.8 | 22.7 | [65.3, 74.2] |
| 2    | 64.3 | 18.9 | [60.6, 68.0] |
| 3    | 59.1 | 21.5 | [54.9, 63.3] |
| 4    | 51.4 | 22.7 | [46.9, 55.8] |
| 5    | 40.6 | 25.1 | [35.7, 45.5] |

In these trials, the Masterwork Axe required only 52% as many hacks to destroy the Animated Dead Elephant (speculation: against sufficiently tough opponents, damage is proportional to sharpness?). Furthermore, per quality level the improvement is fairly linear, with no obvious huge advantage for being masterwork rather than exceptional. The confidence intervals allow for quite definite statements about the benefits of quality.

Unlike in the armor trials, having to "mangle" the head of a large, durable, unarmored creature gives a huge advantage to the higher quality axes.

Hammers

  • The setup is the same as above, but I forgot to set the Goblin to grand master in hammer skill (this may not be the biggest deal since the zombies aren't trying to evade).
  • The axe was transformed into a hammer by using DFHack. This may seem unnatural but in my testing transformed weapons work completely normally.

Bashes to defeat the elephant:
Code: [Select]
| qual | mean | sdev | 95% CI       |
|------|------|------|--------------|
| 0    | 57.6 | 15.6 | [54.5, 60.6] |
| 5    | 56.7 | 16.5 | [53.5, 59.9] |

The masterwork hammer did not perform measurably better than the no quality hammer. Unfortunately, other preliminary tests with blunt weapons supports the hypothesis that blunt weapons do not benefit from quality, at least in terms of damage inflicted upon a successful hit.

Note: Though I only count bashes in the above statistics, the total combat duration including misses and such was just as similar, the masterwork hammers were not hitting more often or something.

Conclusions so far

High quality edged weapons seem to provide a large advantage when hacking into flesh and bone, but low or no advantage against standard quality armor. This is perhaps most important when fighting zombies when extra sharpness up to doubles the speed at which the zombie is sufficiently "mangled" to be struck down.

Blunt weapons do not seem to benefit from quality in terms of damage inflicted on a successful hit, nor does the masterworkness seem to alter the chance of hitting/missing a failed mood target dummy.

My speculation is that quality by itself does nothing in terms of damage, and all benefits come from the extra sharpness for a higher quality weapon. By the formulas I feel that sharpness should be applied when cutting through armor, but it seems it isn't, I'd guess this could either be a bug or a deliberate balancing mechanism to not make edged weapons excessively powerful relative to blunt weapons when at high quality.

Future Testing

I'll probably do a weapons roundup. I am getting very fond of the "failed mood target dummy" and counting number of combat actions it takes to defeat the target dummy. I believe this greatly reduces the noise compared with fighting an opponent that can fight back. Also the question of "how quickly can an enemy be struck down so the next enemy can be attacked" is very useful. However, if masterwork weapons give accuracy bonuses I don't think a failed mood target would properly represent that as they don't seem to use their defensive abilities.
« Last Edit: February 23, 2023, 12:10:30 pm by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Dwarf_Fever

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #1 on: February 23, 2023, 01:00:04 pm »

Fantastic info. Weapon quality does impart a "to hit modifier," so that's probably hard to quantify with certainty against catatonic targets without further tests.

I think it's been commonly assumed so far that material tier trumps weapon quality, but it's good to know that, once you take the armor out of the equation, quality does have significant impact on flesh.
Logged
"Whatever exists, having somehow come into being, is again and again reinterpreted to new ends, taken over, transformed, and redirected by some power superior to it; all events in the organic world are a subduing, a becoming master, and all subduing and becoming master involves a fresh interpretation, an adaptation through which any previous 'meaning' and 'purpose' are necessarily obscured or obliterated."

Panando

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #2 on: February 24, 2023, 08:59:46 am »

Fantastic info. Weapon quality does impart a "to hit modifier," so that's probably hard to quantify with certainty against catatonic targets without further tests.

This is certainly commonly believed, and Toady One did state with reference to 0.31 that Masterwork Weapons have a 2x modifier to the armor deflect roll. Assuming that is still the case, then masterwork should only provide a benefit against armored opponents, perhaps when armor user is higher than weapon skill.

500x Armored Human with Mace vs Armored Goblin

I tried a setup of 100 fully armored humans with a Shield and Iron Warhammer, vs 100 Fully Armored Goblins with no Shield or Weapon. All standard sizes and peak physical. All creatures were Grand Master Dodger and Armor User but with no offensive skills, in case this bonus only helps when disadvantaged. The Goblins absolutely trounced the Humans (you suck humans) despite having no shield or weapon. I tried it with the humans having all no quality gear, only a masterwork hammer, and all masterwork gear.

I ran 5x 100x 1v1, the outcomes in terms of surviving humans per trial:
Code: [Select]
|            | 1  | 2  | 3  | 4  | 5  | mean | sd  | 95% CI       |
|------------|----|----|----|----|----|------|-----|--------------|
| No Quality | 27 | 27 | 26 | 21 | 21 | 24.4 | 3.1 | [21.7, 27.1] |
| Mas Hammer | 23 | 19 | 24 | 21 | 26 | 22.6 | 2.7 | [20.2, 25.0] |
| All Master | 29 | 31 | 29 | 31 | 35 | 31.0 | 2.4 | [28.9, 33.1] |

The humans survived even less often with the masterwork hammers, though of course it's all well within the confidence intervals. On the other hand, the masterwork armor seemed to give them a statistically significant survivability advantage against the Goblins.

Unfortunately when analyzing the logs, I could find no statistically strong evidence that the armor was having any effect, in terms of deflections, misses or anything else. For example, one would expect if the masterwork armor was making the Human harder to kill, it would take longer for the Goblin to kill the Human. But of those fights where the Goblin won, the number of actions required to win the fight were about the same:

Code: [Select]
Total actions, fights where the Goblin won:
Normal Armor: 295.6 ± 74.8 : 95% CI [288.2, 303.1]
Master Armor: 301.9 ± 70.0 : 95% CI [294.5, 309.3]

Total actions, fights where the Human won:
Normal Armor: 199.2 ± 89.9 : 95% CI [182.8, 215.5]
Master Armor: 214.7 ± 99.4 : 95% CI [199.2, 230.1]

The Masterwork Armor didn't seem to make the human live longer, nor able to kill the Goblin faster.

500x Dwarves vs Goblins

In my next trial I pitted "Competent" Goblins against "Competent Dwarfs", I used a Competent skill level for everything in case the supposed to hit bonus requires some amount of skill to use or be used against.

I only ran two trials: In the first everyone's gear was No Quality. In the second, the Dwarves gear was entirely Masterwork.

Code: [Select]
|            | dwarves killed | death chance | 95% CI        |
|------------|----------------|--------------|---------------|
| No Quality | 219 / 500      | 0.438        | [0.395,0.482] |
| Masterwork | 228 / 500      | 0.456        | [0.413,0.500] |

The Masterwork gear performed generally worse than the no quality gear, though the confidence intervals mostly overlap.

This is a doubly whammy against masterwork: The Dwarf had both a masterwork weapon, and masterwork armor, while the goblins had only no quality weapon and armor. Even if either the weapon or armor quality mattered that should have manifested in improved success for the Dwarves.

Furthermore, additional analysis of the logs in terms of outcomes per hit show no statistical advantage for the masterwork equipment in terms of fight duration, deflections etc. The masterwork armor did not seem to allow the dwarf to live longer.

Conclusion so far

It would appear that blunt weapons gain no combat advantage from being masterwork, it would also appear that armor provides no combat advantage from being masterwork. In fact, it would appear the ONLY instance where masterwork provides an advantage, is when an edged attack strikes flesh and bone.

So far I have found zero evidence for masterwork weapons having any kind of to-hit advantage. It might just have been I have not found the peculiar scenario where this supposed bonus manifests as performance improvements, but I am leaning towards dumping this bonus in the myths bin, AFAIK all statements from Toady One are from much older versions of the game and there have since been substantial combat overhauls aiming at improved realism and "physics based" combat, perhaps quality modifiers were hard to account for with physics based rules.

When there is an actual advantage, like edged attacks against flesh and bone, then it manifests very readily and doesn't require many trials to be quite confident of its existence. But if thousands or tens of thousands of trials are needed to be statistically confident that a masterwork bonus exists, then it's so small that it may as well not exist.
« Last Edit: February 24, 2023, 09:29:43 am by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Eric Blank

  • Bay Watcher
  • *Remain calm*
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #3 on: February 24, 2023, 06:16:40 pm »

This is neat.

I'm curious about the fight between unarmed goblins and humans; since the goblins are unarmed, they're probably going to resort to wrestling at some point in the fight, which ignores armors presence and quality at least when a strangulation is performed. The most likely cause of death is strangulation anyway, short of passing out from pain and getting the helmet removed, or, occasionally, blood loss from broken joints. In adventure mode in .47.05, you can absolutely strangle a fully armored target. On top of that, from prior work people have done in the kisat dur thread in the adventure mode forum, being engaged in wrestling distracts the target in certain cases. I wouldn't be surprised if the goblins faired so well because their attempts to lock limbs of humans distracted them from attacking and strangulation ignored the presence of armor. I wonder if arming the goblins, with any sort of weapon, would decrease their odds against the humans. What was the actual most common cause of death for the humans, and also for goblins?
Logged
I make Spellcrafts!
I have no idea where anything is. I have no idea what anything does. This is not merely a madhouse designed by a madman, but a madhouse designed by many madmen, each with an intense hatred for the previous madman's unique flavour of madness.

Panando

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #4 on: February 25, 2023, 02:42:24 am »

I'm curious about the fight between unarmed goblins and humans

Humans were killed 319 times by being "Struck down" (usually a punch to the head), 19 times by suffocation (mostly strangulation of the throat but a few spinal injuries), and 11 times by bleeding out. Proportions were about the same for the Goblins. Fists are basically just hammers after all.

The way these fights basically went is because the combatants had insane defensive skills relative to offensive skills they missed each other almost all the time, but the humans would get the odd lucky hit in with the war hammer, many bashes would get deflected by armor but it still got the humans some kills.

Then the humans would pass out from exhaustion and the goblins would be able to get kills on them, mostly by unopposed punches to the head (helped by pulling helms off, though some kills were punches through the helm). Of course Goblins would also pass out from exhaustion, but seemed much less prone to it than the humans and didn't pass out in clear phases, perhaps swinging a hammer is more exhausting.

Then the humans would wake up again, and the Goblins would go back to missing nearly every time.

Then the humans would pass out from exhaustion again, and the goblins would get kills again.

And so on through 5 such phases of exhaustion until the battle was over.

Anyway, the goal of this test was seeing if the Masterwork Hammer would give the humans any advantage in overcoming the high defensive skills, and it didn't seem to. The Goblins were more effective than intended, but I found the result amusing. If I want non-failed mood targets to be ineffective I usually assign them a training weapon.
« Last Edit: February 25, 2023, 02:48:24 am by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Hans Lemurson

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #5 on: February 25, 2023, 05:57:55 am »

I'm not sure if the test subjects passing out from exhaustion makes the testing conditions less valid or MORE valid.
Logged
Foolprooof way to penetrate aquifers of unlimited depth.  (Make sure to import at least 10 stones for mechanisms)
Toughen Dwarves by dropping stuff on them.  (Nothing too heavy though, and make sure to wear armor.)
Quote
"Urist had a little lamb
whose feet tracked blighted soot.
And into every face he saw
his sooty foot he put."

Panando

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #6 on: February 28, 2023, 05:34:02 pm »

I did yet another attempt to demonstrate the existence of the mythical quality bonus.

This setup involved 100x 1v1 duels between a Dwarf and Goblin, both were equipped with full iron armor and a copper War Hammer, I decided to use a worse material for the weapon than the armor in case that would coax the bonus into appearing.
Both Dwarf and Goblin had identical competent combat skills.

50 of the dwarves had their war hammer quality set to Masterwork. The other 50 dwarves, and the 100 goblins, all had the quality left at no quality. We are comparing the performance of the two groups of dwarves, by counting how often they die, if the masterwork hammer performs better it should allow the wielder to kill the goblin more quickly, reducing the chance of dying.

I reloaded the save 16 times, logged the combats, and counted the deaths.

The masterwork copper warhammer dwarves died 418 times, death probability = 0.5225
The no quality copper warhammer dwarves died 405 times, death probability = 0.50625

(reminder: dying means bad performance in combat)

95% confidence interval for death probability:
Masterwork: [0.490, 0.557]
No Quality: [0.472, 0.541]

Once again I must conclude that if there is any kind of accuracy/deflect bonus related to a masterwork weapon, it is extremely small. If it exists it should have a statistically significant impact over 800 fights, otherwise it may as well not exist. To be able to disappear into statistical noise like this, I am concluding that the bonus works out to no more than 2%, and I would like to contrast this with the sharpness bonus for edged weapons, which easily measures at providing a 20% to 100% advantage in combat.

People love to quote a "2x" bonus for Masterwork. Lets look at deflections:

deflections/total bashes:
no quality: 1882/10234 = 18.4%
masterwork: 1812/9718 = 18.6%

So the MASTERWORK hammer bashes got deflected ever so slightly more often than the no quality happen. Yeah, that sure looks a 2x bonus at work to me \s.
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Laterigrade

  • Bay Watcher
  • Is that a crab with
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #7 on: February 28, 2023, 08:26:03 pm »

That is a real pity. Intuitively, item quality should affect combat positively in most cases — maybe not to a massive extent, but still a noticeable one. Excellent scientific work though.
Logged
and the quadriplegic toothless vampire killed me effortlessly after that
bool IsARealBoy = false
dropping clothes to pick up armor and then dropping armor to pick up clothes like some sort of cyclical forever-striptease
if a year passes, add one to age; social experiment

Panando

  • Bay Watcher
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #8 on: March 02, 2023, 05:36:26 am »

That is a real pity. Intuitively, item quality should affect combat positively in most cases — maybe not to a massive extent, but still a noticeable one. Excellent scientific work though.

Yeah, I'd probably say a 20% boost to general combat effectiveness would be reasonably balanced, kind of what sharpness gives when fighting unarmored living targets, the extreme boost to performance against unarmored undead when wielding high contact area weapons seems to be due to undead still using more of a "hitpoints" model rather than dying due to destroyed organs (albeit it seems to be hitpoints per layer, with deeper layers needing to take enough damage), and twice the sharpness allows twice the "hitpoint damage" to be done, perhaps in some cases even more by allowing deeper layers to be reached.

Dwarf Fortress does have some challenges with respect to quality bonuses though due to is simulator bent when dealing with combat, because combat is based on the laws of physics bonuses can't just be made up purely for the sake of balance. In Dwarf Fortress it would seem even a pretty decent human blacksmith would make a "no quality" weapon, hence it can't be thought that "no quality" is bad. Dwarves are just exceptional at crafts. It's easy to imagine very refined metalworking techniques making extremely sharp swords that hold an edge very well, but it's much harder to justify how a hammer or mace could perform dramatically better than an appropriately shaped lump of metal on a handle, though slight liberties could be taken with the velocity multiplier.

Realism wise, one of the best ways to address quality would be making higher quality weapons more durable, if no quality weapons have a strong tendency to break in battle especially when wielded by absolute beasts of dwarves, you'd have strong incentive to use high quality weapons so your legendary hammerdwarf doesn't need to become a wrestler mid-battle. Weapons do currently break, especially edged weapons striking superior armor, it's pretty easy to break iron weapons by hitting steel-clad targets or even larger iron clad targets, but on the whole in fortress mode it's not a major consideration, partly because dwarves have steel and no-one else does, and steel weapons are practically indestructible when striking iron. At the moment the interface is quite a bit clunky when it comes to dealing with equipment durability, but I think it could be a good direction to go to give incentive to make masterworks.

In terms of combat effectiveness, it'd probably be best handled using the magic system, rather than trying to keep it about the laws of physics, just allow dwarven artisans to be so good that they can craft things that literally are better than the laws of physics allow, at least by a little.
« Last Edit: March 02, 2023, 05:39:59 am by Panando »
Logged
Punch through a multi-z aquifer in under 5 minutes, video walkthrough. I post as /u/BlakeMW on reddit.

Ftor

  • Escaped Lunatic
    • View Profile
Re: Scientific Testing of Weapon Quality (50.07)
« Reply #9 on: March 02, 2023, 06:29:08 pm »

Thank you for such interesting and extensive research!
Recently I created a thread about blunt artifact weapons made of non weapon-grade metals and was given a link here.
I tried to make few tests on arena, but I did not use dfhack and I lack any programming skills to make such extensive testing.
It is not hard to predict that bladed wepon made of tin will be almost useless compared even to copper, but what about whips, maces, warhammers of highest craftdwarship? Is it possible to create artifact weapons using dfhack? Platinum and pig iron, for example, in theory could be better than silver.
Are you interested in making more research in this direction?

Acide from that, I think that masterwork weapon should be not only be more sturdy, but should have better handling characteristics, be more handy, have better balance compared to no quality specimen, if we imply that handle length and
head weight and shape are the same between different quality weapons. That should translate at least to better control (read - accuracy), and, maybe, effort needed to swing a weapon - read attacks per second and (or) how much more tired operator becomes after each swing.
If we imply that different quality weapons have different parts (longer or shorter handle, heavier or lighter head, different shape of the head), then damage and armour penetration could differ to some extent.
At least, that is in real world.


Logged