Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Language Creation and Development (kind of rambly)  (Read 512 times)

FearfulJesuit

  • Bay Watcher
  • True neoliberalism has never been tried
    • View Profile
Language Creation and Development (kind of rambly)
« on: May 28, 2013, 11:59:14 am »

There's been some talk of expanding the current dwarven language to make it more "language-like", with its own morphology and syntax.

This suggestion expands on that. Namely, languages should be procedurally generated in year 0 of worldgen, and then, as the years roll by, change, developing into different dialects and languages in their own right. In theory, every civilization would start with its own language (though, for example, all goblin languages could be related, back in the mists of time), and destroying a civilization would destroy the language, though it might survive in small pockets of refugees or as borrowed words in neighboring languages.

Phonology

This is where we start. Now, the limitations of the character set mean that we can't create languages with huge numbers of sounds like Ubykh, but one with a medium number of sounds like English or even Russian would likely be doable. For the sake of illustration, let's create a dwarven language named Kulluq. A theoretical realistic phonology would look like this:

Code: [Select]
p t ch k q
m n ń ng
s sh h
v l ly r
y

i     u
   ə
   a

(That schwa, the sound in but, we'll write <e>.

So far so good, we've created a list of proper sounds in this language. (There are, incidentally, some rules about what sets of sounds can work and which ones can't. A language whose only stops are five different varieties of p can't exist.) Now we need to create some phonotactics, which are the rules that tell you what sequences of those sounds can be legit words and which ones can't. In English, for example, it's obvious to a native speaker that toog, crat and flane could be words (even though they aren't), while ngorf, mret or fsee[/] couldn't be.

In Kulluq, we'll decide that no consonant clusters are allowed initially, that double consonants like <ll> and <tt> can occur between vowels, that double vowel sequences are allowed, and that the only final consonants allowed are /t ch k q/. Thus, sutu, timmiq, eniak and viittit could be words, but stalyu, sunta, elpu and tuktis could not be. All of this should be pretty easily encodable.

I'll get to vocabulary in a bit.

Morphosyntax

This is a bit trickier. Languages in the wild have a very large range of complexity and diversity in their morphosyntax systems. On one end, you have languages like Chinese, with almost no inflections whatsoever, but a very strict syntax. On the other hand, you have languages like Inuktitut or the Algonquian languages, with very long and complex words, but pretty free word order. You have accusative languages like English. You have ergative languages like Basque. You have "dead old Indo-European" languages with case-marking and large but not too large verbal systems like Latin, Greek or Sanskrit. You have the Semitic languages with their triconsonantal roots. Bantu languages with fifteen or sixteen genders. Languages like Finnish with fifteen or sixteen cases. The list goes on and on and on and on and on. (Don't worry if you don't understand what any of that meant; Toady does, which is important.)

Now, how do you encode that? What the computer will need is a simpler system of Chomsky's parameters theory. This has been mostly debunked in linguistics as a whole, at least the strong version of it, but it's probably the easiest way to encode this, and the end result will be mostly the same. Imagine that the game has a list of "switches", say, thirty of them or so, and it will use the seed to randomly determine which switches are flipped in each language. You could have a switch, for example, labeled "person marking on verbs"; and then a language that has that in ON will have "amo, amas, amat"- type verbs, while a language that has that in OFF won't. (And there could be a subswitch determining whether that's a prefix or a suffix). And then you go through this for a couple tens of switches and their subswitches. Voilą: you have a simple sketch of the morphosyntax.

Now the computer can go back to the phonology it made, and start filling in all those little suffixes and prefixes it's decided it needs.

Language Change

But that language isn't going to stay static. Within a few decades, as the first new generations are born, it will start undergoing sound change.

The basic idea here is that a sound change will happen in any word which has the appropriate environment. The example I always like to use in illustrating sound change is the Polynesian languages. For example, in Hawai'ian, Proto-Polynesian *t became *k, and P-P *k became *ʔ (a glottal stop). Thus, you get correspondences with the more conservative Tongan like this:

H. waʔa -> T. vaka
H. kanaka -> T. tangata
H. kolu -> T. tolu

etc. etc.

It is not enough to say, however, that a Hawai'ian k corresponds to a Tongan t. We must observe the principle of regular sound change: the Proto-Polynesian language had a phoneme *t, which was preserved in Tongan and shifted to k in Hawai'ian.

Another, more complex sound change, this time with an environment: the Proto-Indo-European language had a series of aspirated plosives, so called for the little puff of air, or /h/, that accompanied them. In Greek and Sanskrit, if you had two in a row, the first one de-aspirated. Witness:

PIE *dhrigs -> Greek thriks
PIE *dhrighes -> Greek trikhes
PIE *bhuō -> Greek phuō
PIE *bhebhuka -> Greek pephuka

Symbolically, you could write

Ch -> C /_...Ch.

Using something like a sound change applier, the computer could simulate a thousand years' worth of sound changes. That's more than enough time in human terms to create different languages out of one proto-language; it might only create different dialects for dwarves; goblin speech might hardly change at all. What the computer ought to do is to simulate a sound change every 30-40 years or so, in a certain area where the language is spoken (a cluster of hill dwarf settlements next to each other, for example), run the lexicon through it, then run the grammar through it (prefixes and suffixes attached to the word), seeing what levels and what doesn't. Bits of grammar should drop off on their own every so often, too.

Sound change is going to trigger morphological and even syntactic change, too, and this will be much trickier to model. If a sound change destroys a distinction in morphology, that distinction might well be lost. This is the reason Latin has a case system and the modern Romance languages don't: sound changes and the levelling of final vowels meant that distinction disappeared.

Much trickier is the creation of new morphology. This happens all the time in a process called grammaticalization. For example, the English "go" has been turned into a future auxiliary verb "gonna". That's a future marker, and it's a piece of morphology like any other; in the future of English it might become a full prefix. That's a process that will be much harder to model, but necessary, if we don't want all our languages looking like Chinese in 1050.

Vocabulary

As I said, it's possible for the computer to model sound changes to our vocabulary, but other changes will happen, too. Words get borrowed from other languages (adapted to the language's phonology), displacing native words. If a language survives, for example, but its speakers are subjects of a foreign power for centuries, their language will borrow lots and lots of words from the conquerors' language. Sometimes words get lost and get replaced by compounds of other languages. Etc. etc.

I feel like I've rambled a lot, but I've got a horrid cold and need to sleep. Hopefully this will at least provide food for thought.
Logged


@Footjob, you can microwave most grains I've tried pretty easily through the microwave, even if they aren't packaged for it.

Owlbread

  • Bay Watcher
    • View Profile
Re: Language Creation and Development (kind of rambly)
« Reply #1 on: May 28, 2013, 01:07:25 pm »

For those that wish to discuss the creation of a basic Dwarven language as opposed to planning for procedural language generation, this thread saw 25 pages of heated discussion on everything under the sun from genders to tenses and cases. We actually had a basic language structure, the framework of which can still be built on if you are so inclined. The stuff outlined in this thread though is very interesting given that it could lead to entire languages being created with their own families, dialects, all sorts. I am going to spend a lot of time reading it.
« Last Edit: May 28, 2013, 01:09:45 pm by Owlbread »
Logged

dwarfhoplite

  • Bay Watcher
  • Gentledwarves, prepare for Glory!
    • View Profile
Re: Language Creation and Development (kind of rambly)
« Reply #2 on: May 28, 2013, 03:23:44 pm »

This is indeed interesting but is it something that can be used in community content? The community is what I would like to see a language designed for.
Logged

Gargomaxthalus

  • Bay Watcher
    • View Profile
Re: Language Creation and Development (kind of rambly)
« Reply #3 on: May 28, 2013, 03:58:55 pm »

This is indeed interesting but is it something that can be used in community content? The community is what I would like to see a language designed for.

The point of this thread seems, to me, to be more about setting up a method of avoiding the current nonsense names that plague the game, rather than having an easily digestible medium for "flavorful" exchange. Of course you could use the information here to create a language generator and a language created by it could be shared. At the moment Dwarf Fortress is impossible to take seriously because you can easily end up with an organization named "The Council of Flayedbonobo Phalaces" which makes little sense, is needlessly offensive, and and ends up having no connection to anything else that the game generates. The OP's system would create a consistent series of naming conventions within an auto-generated world that would be unique to that specific world. This is an absolute necessity for many features that people want added to the game, and must be implemented in some form, through some means, as soon as is reasonable for Toady to do so. 
Logged
Well lets see... at least half of what I say is complete bullshit. Hell the other half tends to be pretty sketchy...

OOOOHHHH,JUST SHUT UP AND LISTEN TO WHAT I HAVE TO SAY AND MAYBE I'LL GO AWAY!!!!!!!!!!

javascript:void(0)
javascript:void(0)

Deepblade

  • Bay Watcher
  • Tholtarmid
    • View Profile
Re: Language Creation and Development (kind of rambly)
« Reply #4 on: May 29, 2013, 02:24:26 am »

Instead of generating a language from scratch a "half random" approach might be get the job done just as well, without making the languages too different for the community to use if they see fit.
The entire language, or certain words, would be the same every time for the major races. But, the further from the capital city, for that civilization, a settlement the more words are changed.
In the case of multiple Civs of the same race it'd pick one to be the so called racial capital and the other would get its language altered slightly, and then go from there. These changes would be most obvious on larger maps.

Once the world begins creating history, trade and war would be the only 2 things I'd have altering the language. With enough importing of goods a Human city might refer to swords as a "dastot" instead of "thil". Or, with war, a Dwarven settlement conquered by humans may use "abo" to refer to man instead of "udos" after a generation or two, if there were enough survivors to let the language stick around there.

...At the moment Dwarf Fortress is impossible to take seriously because you can easily end up with an organization named "The Council of Flayedbonobo Phalaces" which makes little sense, is needlessly offensive, and and ends up having no connection to anything else that the game generates....

The ridiculous awesome names are more about the current robot not choosing appropriate words, or phrases, to the cause/actions of the group/individual. So, as it stands just doing a language generator won't make things less ridiculous awesome until the robot has a catalog of what words/phrases are appropriate to what situation, as well as adjectives and adverbs.
Thankfully unfortunately there are no taboo words in the language lists, yet. So, no "The Council of Flayed Bonobo Phalli", the game doesn't allow adjectives on "of X" words either. About the closest we can get to filthy is titles/groups with the word "clam" in it, such as "The Beloved Lustful Devourer of Clams". Or, the word "dogs", such as "The Amazing tight style of dogs". More comprehensive naming is on Toady's list IIRC though.

Side note: You can modify the Raws labeled language to fill out your "language bank", if that's your thing.



Logged
Deepblade's Standardized Creature Parts, for when you're pissed about all the different types of animal products there are.