Topic: DFLang: generate realistic language raws, every time. (Read 75487 times)

MasterMorality · « **Reply #120 on:** July 29, 2011, 03:14:49 pm »

Just a note: if you're using a mac, there's no actual need to fiddle around with python (I tried and didn't get anywhere) just use it through WINE. Works perfectly.

Brilliant!

Wwolin · « **Reply #121 on:** December 28, 2011, 02:56:43 pm »

Lol, I used english as the root and I got the word cock for back! Watch your cocks everyone!

Greiger · « **Reply #122 on:** December 28, 2011, 05:43:56 pm »

Hehe If I recall that is exactly why Toady does not have randomly generated languages in game right now. Still an awesome program though, great for us folks that don't wanna use an existing vanilla language but can't be arsed to spend 4 hours writing one up from scratch.

trees · « **Reply #123 on:** December 28, 2011, 09:08:02 pm »

Yeah, definitely. I use this all the time, I think it's the most useful modding utility out there.

612DwarfAvenue · « **Reply #124 on:** December 29, 2011, 03:20:22 am »

Hell yes, i have fun just throwing random languages into it and seeing what comes out.

You still working on it, Igfig?

Zaerosz · « **Reply #125 on:** January 07, 2012, 10:39:24 pm »

I'm looking for a couple extra language files to throw into this, like that Scottish Gaelic wordlist mentioned god-knows-how-long-ago. Anyone got any that aren't included in the base version?

trees · « **Reply #126 on:** January 07, 2012, 10:49:55 pm »

Someone made a pack of real-world languages a while ago. They're not formatted for DFLang, of course, but it's not hard to to cut them down to something that DFLang can use.

Zaerosz · « **Reply #127 on:** January 07, 2012, 11:00:48 pm »

Well that's handy. Thanks.

Putnam · « **Reply #128 on:** January 08, 2012, 03:51:03 am »

Since apparently I didn't post as I thought I did, I'm posting right now to say that this is a wonderful utility and that all the languages in my troll mod are made using this.

IIRC, the troll language is based on what-may-be old norse (I know it's norse, just not sure how old it is

) mixed in with the troll names from Homestuck.

Archmonk · « **Reply #129 on:** September 13, 2012, 05:09:47 pm »

Necroing a little to say THANK YOU.

Thank you!

I did a new language manually in ~5 hours. Then I found DFLang and did 7 more languages in ~45 minutes.

Akjosch · « **Reply #130 on:** September 26, 2012, 06:31:57 pm »

Quote from: Zaerosz on January 07, 2012, 10:39:24 pm

I'm looking for a couple extra language files to throw into this, like that Scottish Gaelic wordlist mentioned god-knows-how-long-ago. Anyone got any that aren't included in the base version?

Chris Pound's Name Generation page (which also hosts the original lc tool; if I remember correctly, DFLang is based on it) has quite a few as well.

Igfig · « **Reply #131 on:** September 26, 2012, 08:11:08 pm »

So I haven't been working on this in a long time, ever since I tested out the Variable-Order Markov Model approach and discovered that the words it generated weren't noticeably better than the regular one I was already using.

Just the other day, though, I had a thought: maybe the problem was with my corpus. The one I got from Chris Pound's page (and yeah, DFLang is based on a modification of his lc script) was pretty small and focused on flowery words. This time I used a bigger and better corpus, containing the 100,000 most common words in the English language. The difference was remarkable:

Spoiler: Original Single-Order Markov Model (click to show/hide)

Spoiler: New Variable-Order Markov Model (click to show/hide)

I'm not sure it's practical to actually use the larger corpus (it takes way longer to generate a language), and the whole thing is biased toward longer words at the moment, too... but it's piqued my interest, so you may yet see something new here.

Akjosch · « **Reply #132 on:** September 27, 2012, 01:01:45 am »

Quote from: Igfig on September 26, 2012, 08:11:08 pm

So I haven't been working on this in a long time, ever since I tested out the Variable-Order Markov Model approach and discovered that the words it generated weren't noticeably better than the regular one I was already using.

Not sure what you mean by "variable-order Markov model", but constructing words by phonemes - or a workable approximation thereof - instead of letters does work remarkably better in creating believable words

As an approximation, you can replace multi-letter constructs which are usually a single phoneme or a diphthong with a pseudo-letter before constructing the Markov chains. then reverse the process for word output. The list can be done automatically (if a specific two- or three-letter combination is very common in the input corpus) or manually as desired.

For English, such a list could start with "th", "ch", "sh", "ph", "qu", "ea", "au", "ee" and some doubling of consonants (especially sonorants like "ll", "rr" and "mm").

For German, that could be "pf", "sch", "eu", "ch", "ph", "qu", "tz" and again consonant doubling.

For Polish, that would be "ch", "cz", "sz", "rz", "dz", "dź", "dż" and some palatised combinations with "i" ("si", "ci", "dzi", "ni", "zi" and so on).

Fantasy languages could use some of this too.

EDIT: Actually, I just ran the English translation of "War and Peace" through a little analyser, and the most common diphongs there are "th", "ng", "ou", "ea", "ll", "wh", "sh", "ch", "ow", "ss", "ai", "ee", "oo", "gh", "ay", "rr", "tt", "ts", "ff", "ck", "pp", "au", "qu", "oi", "aw", "nn", "ue", "ui", "eo", "mm" and "yi" - in that order.

Andux · « **Reply #133 on:** September 27, 2012, 01:09:30 pm »

Quote from: Akjosch on September 27, 2012, 01:01:45 am

Quote from: Igfig on September 26, 2012, 08:11:08 pm
So I haven't been working on this in a long time, ever since I tested out the Variable-Order Markov Model approach and discovered that the words it generated weren't noticeably better than the regular one I was already using.

Not sure what you mean by "variable-order Markov model", but constructing words by phonemes - or a workable approximation thereof - instead of letters does work remarkably better in creating believable words

As an approximation, you can replace multi-letter constructs which are usually a single phoneme or a diphthong with a pseudo-letter before constructing the Markov chains. then reverse the process for word output. The list can be done automatically (if a specific two- or three-letter combination is very common in the input corpus) or manually as desired.

In fact, Mark Rosenfelder has written a Javascript vocabulary generator which does just that. His Sound Change Applier may also be of interest.

Akjosch · « **Reply #134 on:** September 27, 2012, 03:55:08 pm »

Quote from: Andux on September 27, 2012, 01:09:30 pm

In fact, Mark Rosenfelder has written a Javascript vocabulary generator which does just that. His Sound Change Applier may also be of interest.

That's actually a different (but also very useful) type of language generator, one using a context-free grammar, not a Markov chain.

News:

Author Topic: DFLang: generate realistic language raws, every time. (Read 75487 times)

MasterMorality

Re: DFLang: generate realistic language raws, every time.

Wwolin

Re: DFLang: generate realistic language raws, every time.

Greiger

Re: DFLang: generate realistic language raws, every time.

trees

Re: DFLang: generate realistic language raws, every time.

612DwarfAvenue

Re: DFLang: generate realistic language raws, every time.

Zaerosz

Re: DFLang: generate realistic language raws, every time.

trees

Re: DFLang: generate realistic language raws, every time.

Zaerosz

Re: DFLang: generate realistic language raws, every time.

Putnam

Re: DFLang: generate realistic language raws, every time.

Archmonk

Re: DFLang: generate realistic language raws, every time.

Akjosch

Re: DFLang: generate realistic language raws, every time.

Igfig

Re: DFLang: generate realistic language raws, every time.

Akjosch

Re: DFLang: generate realistic language raws, every time.

Andux

Re: DFLang: generate realistic language raws, every time.

Akjosch

Re: DFLang: generate realistic language raws, every time.