Okay, here's what I'm going to do. I'm going to describe some of the changes I'm thinking of making, and you tell me which ones you're interested in.
First, though, here's how DFLang works right now:
Read the symbol file (language_SYM)
Generate as many 2-4 letter words as there are symbols
Randomly assign a word to each symbol. (This word is that symbol's root.)
Read the meanings file (language_words)
Generate as many words as there are meanings in the words file
For each meaning,
Determine the symbols to which it belongs
For each such symbol,
Go through all the generated words
If you find a word containing that symbol's root,
Assign the word to this meaning
Skip ahead to the next meaning
If you don't find any words with any of the symbol's roots, assign a word at random.
When every meaning has been assigned a word, write the language file (language_DWARF)
This isn't terribly efficient or effective. My first change will be to make it start with a generated word, look for which roots it contains, and select a meaning that belongs to as many of those roots' symbols as possible. Change the order in which it searches, essentially. This will be quite simple to do, and it should have a small but noticeable effect on how well a word matches its associated symbols.
Another change that wouldn't be too difficult would be to allow roots to be expressed as
regular expressions. DFLang wouldn't generate them on its own, but you the user could add them to the roots file by hand. This should be pretty easy to implement--in fact, you
might already be able to do it in a limited way--and it'll make it easier to find roots in a word, which will affect how
many words match their associated symbols.
In a similar vein, I could change the format of the roots files to allow more than one root per symbol.
An idea I've been throwing around a bit recently would be to change roots so that they represent not just symbols, but specific meanings within those symbols. DFLang would choose one or more meanings from a symbol's list and decide that those are vital concepts in this culture's understanding of that symbol. Roots, in other words. Those meanings would automatically get those roots as their entire words. For example, the symbol FIRE might receive GLOW, FIRE, and CONFLAGRATION as its roots. All the other words in FIRE would translate to something-GLOW or CONFLAGRATION-whatever. (You would of course be able to choose these words yourself in the roots file.) This wouldn't improve DFLang's accuracy at all, but it would be kinda neat to know that APPLE literally means CANDLEBERRY in the dwarven tongue.
To that end, I could have DFLang write literal translations of words in the margins of the language files. If a word could be literally translated as something, it'll tell you what that translation is.
I could take that concept even further with something I like to call "dynamic false roots". What that means is if, in the course of language generation, DFLang generates a short word that's unrelated to any of the existing roots, it'll count that word as a root of its own when the time comes to write out the literal translations. This will make the individual translations more complete, and also more wacky.
If you really like the idea of roots-as-words, I could try to make DFLang better at selecting relevant meanings to be a symbol's roots. I'm not sure of the exact details, but it would work by looking at language_words.txt to see how each word can be used in a name. If it can only be used in a couple of positions, then it's probably not general enough to be a good root.
Lastly, the biggest change I could make would be the rewrite of the generation script that I keep talking about. Moving from a regular Markov Model to a Variable-Order Markov Model would make DFLang better at generating realistic words (although I have no idea by how much), but would take a shitload of work to get right. Still, it would open up a lot of new possibilities: prefixes and suffixes, detection of natural roots (roots that already exist in the real-world language), generating words directly from roots instead of looking for roots in pregenerated words... maybe even some kind of context-free-grammar-based generation. It'd be pretty cool. But hard.
So, those are the ideas I've had. Your thoughts? Any ideas of your own?