Seems worthwhile to me. Sounds like you know what you should do too. Errors need to be fixed obviously, much easier to start anew than try to find them as you suggested. If you can make shorter words(and therefore names) that still sound like they fit then that would help a lot with being able to recognize individuals in the fort. Both those things seem to provide pure benefits. Once we have that in then it's a question of balance between more words and the loading times that come with it.
Yep all reasons I wanted to do this rewrite sooooooo bad.... really it was my first DF mod and I hacked and slashed my way through a lot of stuff, that I really should of slowed down and read.
I would be interested in seeing this utility app to translate in to new languages. I have been working on my own race mod and have been dreading going through the 18 000 line language files to create a unique language.
Igfig's DFLang - basically you give it the already existing language files, and it builds the new language translation file off those. there's options for it to use a partial list of translation words (like I would do for dwarfs, it just takes the larger word file, sees which ones are missing from the current translation file, and adds the new translations to the dwarf file, based on the words already present. It will also remove translations if a word is removed from the language_word.txt file). It can also produce entire language from scratch and a base of 100+ words you enter into a list. so just start typing what I would call a ramble, ignore punctuation and just put in what you want the words to sound like. Boltgun used this one to produce the succubus language by entering a couple hundred demon names from history into it. can't decide? get a random list of words from some known language (I'm thinking french for a saucy sexy sounding language, or creole for a saucy sexy rugged sounding language) and shove it into the program. It works great. but it has its limits, it wont work and crash if there isn't enough syllables to match the number of words in the word list within the designated length.... which is what I was talking about above.
In terms of feedback I would much rather have the words in your second list than the first. Would rather have places and people named for concepts than animals. Obviously we would want words like bird, rodent, lizard and what not but I think specific animals(like blue jay, cockatiel, and rainbow trout) would be appreciated less than words like arcane, grace, and massacre. Basically I guess I am saying you could cull a lot of specific animals names out of the dictionary.
I have a friend named bluejay.... her mom is flower, and her dad sunbeam.... hippy children from the american 60s.... lol.
interesting thought. in the original word list from vanilla DF there was several animals and plants, even some that don't actually occur in game. of course dwarfs wouldn't be named after plants and animals, their symbols don't match. you have to assign the animal to a symbol group, then select the symbol for the entity file with these tags:
[SELECT_SYMBOL:{target}:{symbol}] - prefer these symbols for target
[SUBSELECT_SYMBOL:{target}:{symbol}] - prefer these symbols as adjectives for target
[CULL_SYMBOL:{target}:{symbol}] - refuse to use these on these targets.
where noun is ALL, REMAINING, BATTLE, BRIDGE, CIV, LIBRARY, MILITARY_UNIT, RELIGION, ROAD, SIEGE, SITE, TEMPLE, TUNNEL, VESSEL, WALL, WAR
so like with dwarves in masterwork, all these symbol tags are used in this order:
[SELECT_SYMBOL:WAR:NAME_WAR] - so war gets to be named with NAME_WAR symbols.
[SUBSELECT_SYMBOL:WAR:VIOLENT] - adjective for violent wars
[SELECT_SYMBOL:BATTLE:NAME_BATTLE]
[SUBSELECT_SYMBOL:BATTLE:VIOLENT]
[SELECT_SYMBOL:SIEGE:NAME_SIEGE]
[SUBSELECT_SYMBOL:SIEGE:VIOLENT]
[SELECT_SYMBOL:ROAD:NAME_ROAD]
[SELECT_SYMBOL:TUNNEL:NAME_TUNNEL]
[SELECT_SYMBOL:BRIDGE:NAME_BRIDGE]
[SELECT_SYMBOL:WALL:NAME_WALL]
[SELECT_SYMBOL:REMAINING:ARTIFICE] - all remaining targets get artifice (tools weapons some archaic stuff)
[SELECT_SYMBOL:REMAINING:EARTH] - same but with earth stuff (rocks, stones, mountains, hills, etc)
[CULL_SYMBOL:ALL:DOMESTIC] - get rid of all domestic terms
[CULL_SYMBOL:ALL:SUBORDINATE] - dwarves aren't subordinates
[CULL_SYMBOL:ALL:EVIL] - dwarves don't use evil symbols etc
[CULL_SYMBOL:ALL:FLOWERY]
[CULL_SYMBOL:ALL:NEGATIVE]
[CULL_SYMBOL:ALL:UGLY]
[CULL_SYMBOL:ALL:NEGATOR]
[SELECT_SYMBOL:TEMPLE:NAME_BUILDING_TEMPLE] - earlier artifice was added then all the stuff after removed... now add name_building_temple to temples
[SELECT_SYMBOL:LIBRARY:NAME_BUILDING_LIBRARY] same for libraries.
so lets say an OGRE is a symbol VIOLENT UGLY. well at first its added to the adjective lists for wars etc... then because cull_symbol:all:ugly its removed from all those lists, even though it is violent. plants generally are tossed in flowery, animals in nature or domestic, food -> domestic, some animals are nature and violent (ape lion etc), a lot of aquatic stuff animals etc are marked symbol aquatic... well we never added that group, so unless the animal has some other symbol it wont show up in dwarfs. what this does do is that elves who don't cull on domestic and select on nature, flowery, will end up with more flowery names. The first list also includes items, food, weapons, tools, that would end up in violent categories, artifice categories, etc. it also includes all the inorganic rocks which go into EARTH category, so you will have dwarfs name gneiss and granite, jet and adamantine. it also contained career names so like humans do today, you will have dwarves who show up with names like smith and carpenter, mechanic and engineer as these will be included in certain symbol groups.
anyways until symbols are laid new words wont even show up often in most names, randomly yes, but once symbols are laid, the concern of dwarves being called trout would be rare, as long as trouts aren't put in the earth symbol. given the word a symbol makes it more likely to get selected or culled from word selection, and random prefers not to select outside the preferred symbols for the civ.
another cool feature of the symbol system is you can create your own symbol groups. lets say we do want the professions to be under one symbol group called DWARF_PROFESSIONS we can do that in the language_SYM file. then in the dwarf entity file, toss in a [SYMBOL_SELECT:REMAINING:DWARF_PROFESSIONS] or we could just call it PROFESSIONS, and then mark some things say cheesemaker as say DOMESTIC also then after we symbol select for professions the cull of all domestic words afterwords would remove the option for cheesemaker. the main issue is order. if you cull before you add a symbol, it wont block cheesemaker...
the main thing is to add more options to current symbols and potential new symbols for future additions and deletions. we could group numerals as numerals, then cull or add them (add them to a race of drones. so that everything is numbered....).
so the animals and plants would increase for the hippy... I mean elf children names. (the dwarven kingdom has reached a peace agreement with king ranbow trout flowerchild).
I guess I haven't seen the dictionary as an issue to be fixed, or big benefits versus the effort, but I don't tend to play other races either. Swear/vulgar words have popped up a few times in my fortresses with MW, and I find it immersion-breaking and not particularly hilarious (coming from someone with a pretty foul mouth), but I can see how some people find humour in it/youtube lets plays etc. I'd at least make those opt-in.
really, I see it as a break in the monotony when odd names pop up (industrialfreak once showed up on a randomly selected mechanic...), it makes MW different as we don't end up with hundreds of urists etc.
yeah most of the vulgar words and anatomy words are out. really the fortress of fiery vaginas is not really a name I want showing up...
now double entendre words? yeah those make for interesting names. Words that have one meaning, but also a subtle meaning...
I could go either way with some of those words. I don't really notice them myself but that might be because I don't really play adventure mode. Maybe it would make sense to have them included, assign them to a "vulgar" symbol and then cull that symbol in the civilized races? I think that should keep races from using those in place names.
As to the benefits, I would say the largest one is more names being available. It can be really hard to remember which orc is which when there are so many with the same first name.
More of what I was talking about above. we can always cull the words from different groups. the biggest problem with the huge dictionary was that getting things into symbol groups was impossible. new symbol groups can be created, then a switch later to turn off the symbol groups will be possible. Say a [CULL_SYMBOL:ALL:EXPANDED_DICTIONARY] could turn off the use of the additional words all together, with a switch adding the line to all the entity files. similar for "vulgar" etc. also symbol groups such as SUCCUBUS_PREFER and SUCCUBUS_DETEST can be added to cull or select specific words for a particular civilization.