I spent 4 hours testing all in-game metals for combat effectiveness. I started by randomly picking two metals, pitting them against each other, and then picking another and pitting it against both to find out where it stood, and so on and so forth down the line. I'd pick a new metal, and pit it against the best so far, and if it failed I'd keep moving it back until it won.
The tests were two groups of male dwarves arranged in to two 5x5 squares with a 5 space gap between them. Very standardized.
Every dwarf was a Proficient Fighter, Proficient Swordsdwarf, Skilled Armour User, Skilled Shield User, Adequate Dodger
Every dwarf was armed with a Short Sword, Shield, and Mail Shirt of the metal type being tested.
Why the mail shirt? The test was done primarily to test the effectiveness of the weapon, but certain material attributes are more valuable to armour than weapons, and I also wished to prevent a lucky insta-kill heart/spine stab. I wanted the fight to go on for at least 2-3 pages in the combat logs so I could determine the limb-slicing capabilities of the material.
After the fight, I took the proportion of dwarves that had won with a metal and ran it through a significance test. If a side won, but so narrowly it isn't statistically significant, I ran it again and again until it was.
A dwarf that suffered an injury that would result in a bleed-out or that would not be repaired to functionality after a reasonable bout in a hospital is counted as a loss even if the dwarf did not die in the fight: Spinal Injury, Heart Injury, a lost body part bigger than a finger or two. My way of determining it: if the dwarf spent the year in the best dwarven hospital imaginable, but either died anyway or survived but without a reasonably similar functionality, he's a casualty.
Margins:
In a 25 VS 25 fight, a victory of 10 or more is a Steep Margin, 10-5 is a Medium Margin, and >5 is a Small margin. If the margin is very small, I may re-do the test until I'm satisfied.
I mainly ran this test so you know what metals to keep your weaponsmith away from when he goes into a mood.
Results, from least effective to most effective:
Lay Pewter (2 Tin, 1 Copper, 1 Lead)
Fine Pewter (3 Tin, 1 Copper)
Trifle Pewter (2 Tin, 1 Copper)
Nickel Silver (2 Nickel, 1 Zinc, 1 Copper)
Lead
Zinc
Nickel
Silver
Sterling Silver (3 Silver, 1 Copper)
Tin
Copper
Gold
Aluminum
Bismuth
Electrum (1 Silver, 1 Gold)
Rose Gold (3 Gold, 1 Copper)
Billon (1 Silver, 1 Copper)
Black Bronze (2 Copper, 1 Silver, 1 Gold)
Brass (1 Copper, 1 Zinc)
Platinum
Bismuth Bronze (2 Copper, 1 Tin, 1 Bismuth)*
Bronze (1 Copper, 1 Tin)*
Iron
Pig Iron
Steel
Adamantine
*It's worth noting that for the Bronze VS Bismuth Bronze, I had to run the test SEVEN times before I could get a reasonably accurate outcome. I don't want anybody to see this and decide that they're not going to use that Bismuth they just mined out because they think it's not worth it: The difference is so small that the extra armour you'd get would make up for it: Enough Copper and Tin bars to make 6 Bronze Bars could make 8 Bismuth Bronze bars if you had Bismuth available. That swings the tide in favor of Bismuth Bronze in the real game.
Another thing worth noting: the Black Bronze VS Bismuth Bronze and Black Bronze VS Brass tests were fairly close, but Platinum absolutely destroyed Black Bronze.
It's also, I think, worth adding [ITEM_WEAPON] to the Brass entry, because it's much better than copper and nearly as good as bronze: I had to run the test twice to get an outcome.
If anybody wants to know the margin of a test, or thinks a re-test is in order, let me know.
I didn't see any instances of A > B and B > C but C > A, but I think you might if you test Blunt weapons.