Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 2 [3] 4

Author Topic: THIS POST SHOULD BE DELETED.  (Read 24566 times)

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #30 on: March 24, 2017, 04:07:25 pm »

This project is on hold currently I appreciate feedback, I'll look through any criticism of the project, but until I get a few other things straight, I don't have time to work on this project... frankly I'm considering doing one of 2 things with it:

1.  releasing my most recent version of it and leaving it at that point.  Its just massive and a lot of work for one person to be involved in.

2.  restart with a much more simpler version.  basically run a page for word requests. 

Urist are you sure?  I thought they was all still there... if not than I have further reason to back completely away from the current version and restart the project.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #31 on: March 24, 2017, 04:42:07 pm »

I am planning to integrate your mod into my next release, because I had dreams of one day doing a similar thing as you had done when I had finished adding more creatures but it seems you did it first. That is how I found out about the bazaars being symbolic of pure goodness. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #32 on: March 24, 2017, 05:04:25 pm »

If it is, its an error.  part of the problem I had... with 13000 words you loose track of random errors that appear rather quickly.  just remove or add words as you see fit from language_sym.txt .  I'll remove it in any future additions.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #33 on: March 26, 2017, 06:01:07 am »

If it is, its an error.  part of the problem I had... with 13000 words you loose track of random errors that appear rather quickly.  just remove or add words as you see fit from language_sym.txt .  I'll remove it in any future additions.

The biggest problem is that you have filled the word file with mistakes related to I and E in plural forms. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #34 on: March 26, 2017, 11:27:26 am »

If it is, its an error.  part of the problem I had... with 13000 words you loose track of random errors that appear rather quickly.  just remove or add words as you see fit from language_sym.txt .  I'll remove it in any future additions.

The biggest problem is that you have filled the word file with mistakes related to I and E in plural forms. 

Yes.  Part of the reason I want to start over. 

History of the mod, you can skip it if you want but will answer 99% of all the oddity questions:
Spoiler (click to show/hide)

So here's my conundrum...  Do I either:
a.  Continue forward with what I got, deal with the issues, one by one, and waste the next 6 months repairing a broken script; or
b.  Restart from the beginning and wipe it clean which means... do I either:
b1. Return to the Top->down approach and try to create a better lua parser to scan the dictionary and produce a better script from the get go.  One that catches the word forms, plurals, and identifies potentially problematic words earlier in the process so that I can deal with them quicker?; or
b2. Restart with a Bottom->up approach where I add in words people request and create each one with the proper word forms etc individually.

I'm going to try to begin work on the script again... but to do so I need to decide the course of action.  I think right now while I try to decide this I'll work on cleaning up the original dictionary file, it had several errors in it to begin with (duplicate entry sections and broken multi-line entries).  At which point I can assess the viability of option b1.  If that course is viable, well that would create a better starting point script that would advance the process a lot faster.

Anyways if you have any advice go for it... I think I've decided to open a repository at github on this project, and start moving my files for it there, so I can see the process in one place and how my changes move on its way through....  Maybe I'll provide a link here so others can see it and assist with it...
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

Button

  • Bay Watcher
  • Plants Specialist
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #35 on: March 27, 2017, 10:18:25 am »

Personally, as a programmer, I think what you should do is scrap it and start over. I know it always feels terrible to "abandon" what you've put so many hours into, but I've also seen firsthand the un-maintainability and general crappiness of PoCs that get expanded into final products.

Also, with a complete re-write, you can create a solution that allows you to use hand-made new words and fill in the blanks with dictionary! Win-win!
Logged
I used to work on Modest Mod and Plant Fixes.

Always assume I'm not seriously back

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #36 on: March 27, 2017, 03:17:18 pm »

Personally, as a programmer, I think what you should do is scrap it and start over. I know it always feels terrible to "abandon" what you've put so many hours into, but I've also seen firsthand the un-maintainability and general crappiness of PoCs that get expanded into final products.

Also, with a complete re-write, you can create a solution that allows you to use hand-made new words and fill in the blanks with dictionary! Win-win!

He does not really need to bother because I am presently trawling through the whole thing debugging it and removing all the errors that are in his program, plus getting rid of particularly redundant words and adding a few more in.  I am presently as the letter E, which is about a third of the way in and should be finished fairly soon, as in by early next week.  I will then post up my fixed version up here which he can use and I will get on with further modifying the file as part of the process of symbolizing all the words for my next release. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #37 on: March 27, 2017, 04:59:36 pm »

Personally, as a programmer, I think what you should do is scrap it and start over. I know it always feels terrible to "abandon" what you've put so many hours into, but I've also seen firsthand the un-maintainability and general crappiness of PoCs that get expanded into final products.

Also, with a complete re-write, you can create a solution that allows you to use hand-made new words and fill in the blanks with dictionary! Win-win!

He does not really need to bother because I am presently trawling through the whole thing debugging it and removing all the errors that are in his program, plus getting rid of particularly redundant words and adding a few more in.  I am presently as the letter E, which is about a third of the way in and should be finished fairly soon, as in by early next week.  I will then post up my fixed version up here which he can use and I will get on with further modifying the file as part of the process of symbolizing all the words for my next release. 
Personally, as a programmer, I think what you should do is scrap it and start over. I know it always feels terrible to "abandon" what you've put so many hours into, but I've also seen firsthand the un-maintainability and general crappiness of PoCs that get expanded into final products.

Also, with a complete re-write, you can create a solution that allows you to use hand-made new words and fill in the blanks with dictionary! Win-win!

Wow guys so what your saying is that I needed to push a button and find a goblincookie and the massive project will be finished?  I appreciate it.

We can post it as the new current version.

I'm also doing a start over rewrite... I'm already at the declutter phase of the original dictionary, will have a modified uncluttered dictionary in a week, I'm also writing side by side the scripts that will parse that modified dictionary and produce the scripts.  funny thing is the first time I ignored like 99% of what was in the original dictionary.  Most of the bad spellings would of been fixed by the definitions following the words.  I also didn't realize a few things about word tokens... really I could of tossed from the get go all the ADJ:X-ing words as they are covered by their verb forms.  [STANDARD VERB] provides participles as adjectives as if they had THE_COMPOUND_ADJ and FRONT_COMPOUND_ADJ.  Also tags in the definitions provide plural forms, and identify N.Plurals and all sorts of things I didn't bother to look for.  I'm currently running a Fix script that seeks target information that identifies it as potentially bad (the original dictionary was 30k deep, I deleted manually from that as I rushed through it mostly by hand).  My goal is to finish the deletions by next weekend.

The truth is I know a thousand things, that I didn't know when I first started this project.  I'm incorporating all I know into the version I want to have up by the 15th of next month, if no other issues come up in my other projects.  I'm doing this one on Github.  I'll post a link to the repository when I have the modified dictionary up this week so that people can start pointing out what needs to be kicked out as I work down from there.

Thanks guys for your assistance, and goblincookie even if you don't finish, I'd like to see what you end up with.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #38 on: March 28, 2017, 08:00:01 am »

I'm also doing a start over rewrite... I'm already at the declutter phase of the original dictionary, will have a modified uncluttered dictionary in a week, I'm also writing side by side the scripts that will parse that modified dictionary and produce the scripts.  funny thing is the first time I ignored like 99% of what was in the original dictionary.  Most of the bad spellings would of been fixed by the definitions following the words.  I also didn't realize a few things about word tokens... really I could of tossed from the get go all the ADJ:X-ing words as they are covered by their verb forms.  [STANDARD VERB] provides participles as adjectives as if they had THE_COMPOUND_ADJ and FRONT_COMPOUND_ADJ.  Also tags in the definitions provide plural forms, and identify N.Plurals and all sorts of things I didn't bother to look for.  I'm currently running a Fix script that seeks target information that identifies it as potentially bad (the original dictionary was 30k deep, I deleted manually from that as I rushed through it mostly by hand).  My goal is to finish the deletions by next weekend.

Things language wise are not simple enough that you can just create a script and expect it to make sense of the sheer complexity of the various tenses of the words and so on.  Like it or not, in order to get a good outcome you have to trawl through all the words in your word file to make sure everything is fine because lots of words do not follow the 'rules' that most of them do. 

Another problem is that these are *not* words as they actually appear in the game.  What we are doing is translating words into another language and using the symbolic tokens to actually make use of them, a lot of the time you need more than one of the same word because the same word (in english) does not mean the exact same thing. 

A lot of the time you have nouns which mean a different thing symbolically to their verbs or adjectives under the same form.  You basically have to manually iron out the quirks of the English language in order to create a universal language that can sustain a large number of other languages.  A computer basically cannot do that, the errors that it has added to the spelling are trivial compared to the time consuming process of having to google search all the different meanings the word has to check if the verb, noun and adjective are all the same thing basically, also to duplicate words where needed. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #39 on: March 28, 2017, 09:09:12 pm »

I'm also doing a start over rewrite... I'm already at the declutter phase of the original dictionary, will have a modified uncluttered dictionary in a week, I'm also writing side by side the scripts that will parse that modified dictionary and produce the scripts.  funny thing is the first time I ignored like 99% of what was in the original dictionary.  Most of the bad spellings would of been fixed by the definitions following the words.  I also didn't realize a few things about word tokens... really I could of tossed from the get go all the ADJ:X-ing words as they are covered by their verb forms.  [STANDARD VERB] provides participles as adjectives as if they had THE_COMPOUND_ADJ and FRONT_COMPOUND_ADJ.  Also tags in the definitions provide plural forms, and identify N.Plurals and all sorts of things I didn't bother to look for.  I'm currently running a Fix script that seeks target information that identifies it as potentially bad (the original dictionary was 30k deep, I deleted manually from that as I rushed through it mostly by hand).  My goal is to finish the deletions by next weekend.
Things language wise are not simple enough that you can just create a script and expect it to make sense of the sheer complexity of the various tenses of the words and so on.  Like it or not, in order to get a good outcome you have to trawl through all the words in your word file to make sure everything is fine because lots of words do not follow the 'rules' that most of them do. 

Another problem is that these are *not* words as they actually appear in the game.  What we are doing is translating words into another language and using the symbolic tokens to actually make use of them, a lot of the time you need more than one of the same word because the same word (in english) does not mean the exact same thing. 

A lot of the time you have nouns which mean a different thing symbolically to their verbs or adjectives under the same form.  You basically have to manually iron out the quirks of the English language in order to create a universal language that can sustain a large number of other languages.  A computer basically cannot do that, the errors that it has added to the spelling are trivial compared to the time consuming process of having to google search all the different meanings the word has to check if the verb, noun and adjective are all the same thing basically, also to duplicate words where needed. 

of course a lot of the subtlety of the language can be lost with bad scripting... but the bad thing a lot of what makes me upset with the current version I have.... came from my own hand.  I shoved many words together trying to reduce the script in ways that were bad... the original dictionary actually had all the rules for spelling, words were separated better by meaning, and really all it needed was bad words culled, more than anything.... which is what I'm doing right now with the rewrite.  I culled many things before without checking to see if it was even in the game, then left a lot of other stuff for variety that really was just clutter.  we don't need cabin, house, cottage, etc. when just house would be enough.... further attempts to cull the script turned into a fiasco. 

If you want to see the original dictionary, I've got a place its uploaded to where you can download it.

the entire thing was a multistage project, how it should of been done was:

1. cull the original dictionary of all bad words period this results in a modified dictionary.  I messed up here, because I let people rush me and goad me into getting the job done fast.  culling properly at this stage wouldn't have forced the rash decisions of the later stages that I took.

2. turn the modified dictionary into a script dictionary, this is a dictionary that is easily read by a parser to create the word files so that manually editing each file separately will not be necessary.  better scripts at this point would of caught the spelling errors by using information nested in the definitions (it literally has tags for when a different past tense and past participle tense should be used, or when a word requires an x to xx transformation).  that would of caught by my estimates 95% of the spelling errors... other scripts at this point could identify invalid adjectives.  adjectives that would be unnecessary due to the presence of a past/past participle verb word already present.

3. create scripts that take the script dictionary and parse it into the proper files.  This part I had actually perfected pretty well.  A little too well in some aspects, I provided way too many options which made the script dictionary extremely hard to read and caused a lot of the later failures at culling properly or adding symbolism.

4. MANUALLY select the symbol tables.  I made several attempts to add symbolism through scripting and here you are absolutely right, its neigh impossible to accomplish...  I don't own a watson super computer.

but through it all when I say target selections for removal, then manually delete, they are rather mundane scripts that basically scan the dictionary for words such as say "abbr." which is the dictionary term for abbreviation, then I check the line and remove it if it is an abbreviation.  my targeting script shows me when a line is a duplicate line, duplicate start word, contains abbr., and various other "dictionary" terms of words that generally need to be removed.  I work each section down, remove the target tags, rerun the script on the new file, until each target returns zero tagged lines.  rinse and repeat with a new target.  If I don't think a line deserves deletion I remove the offending target that appeared. (in other words when targeting american, I cut american out of all the plant and animal definitions that do exist inside off the vanilla DF scripts.).  This works better than searching all the entries line by line manually as some definitions are truly massive.... I can focus on one thing at a time.  when I run out of targets that I'm looking to remove, I can scan a section (100 or so lines) and find another handful of "targets" for removal.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #40 on: March 29, 2017, 11:19:41 am »

of course a lot of the subtlety of the language can be lost with bad scripting... but the bad thing a lot of what makes me upset with the current version I have.... came from my own hand.  I shoved many words together trying to reduce the script in ways that were bad... the original dictionary actually had all the rules for spelling, words were separated better by meaning, and really all it needed was bad words culled, more than anything.... which is what I'm doing right now with the rewrite.  I culled many things before without checking to see if it was even in the game, then left a lot of other stuff for variety that really was just clutter.  we don't need cabin, house, cottage, etc. when just house would be enough.... further attempts to cull the script turned into a fiasco. 

What you needed to have done was to go through the dictionary and consider off-hand whether every word actually adds any meaning that is not already in.  Then you create a symbols file and see what goes in what (add a lot more symbols is my advice), then you go through the words in every symbol category to detect any redundant words that are still in, since they all will end up having the exact same symbolism.  Then go through the dictionary again to see whether any of your symbol words have multiple meanings according to your symbols and then make up new duplicate words accordingly that have different meanings symbolically.

Once you have the human stuff worked out then write your program to turn the words that you have already symbolized and any new words you have made up into actual words that the game uses, anything that a computer can do isn't the hard part of the problem.  The really big program with automating this thing is when the adjectives, nouns and verbs do not have the same symbolic meaning, it might thus be a good idea to seperate verb, adjective and nouns before recombining them one way or another. 

4. MANUALLY select the symbol tables.  I made several attempts to add symbolism through scripting and here you are absolutely right, its neigh impossible to accomplish...  I don't own a watson super computer.

Correction; you do not own a strong AI. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #41 on: March 29, 2017, 08:20:54 pm »

of course a lot of the subtlety of the language can be lost with bad scripting... but the bad thing a lot of what makes me upset with the current version I have.... came from my own hand.  I shoved many words together trying to reduce the script in ways that were bad... the original dictionary actually had all the rules for spelling, words were separated better by meaning, and really all it needed was bad words culled, more than anything.... which is what I'm doing right now with the rewrite.  I culled many things before without checking to see if it was even in the game, then left a lot of other stuff for variety that really was just clutter.  we don't need cabin, house, cottage, etc. when just house would be enough.... further attempts to cull the script turned into a fiasco. 

What you needed to have done was to go through the dictionary and consider off-hand whether every word actually adds any meaning that is not already in.  Then you create a symbols file and see what goes in what (add a lot more symbols is my advice), then you go through the words in every symbol category to detect any redundant words that are still in, since they all will end up having the exact same symbolism.  Then go through the dictionary again to see whether any of your symbol words have multiple meanings according to your symbols and then make up new duplicate words accordingly that have different meanings symbolically.

Once you have the human stuff worked out then write your program to turn the words that you have already symbolized and any new words you have made up into actual words that the game uses, anything that a computer can do isn't the hard part of the problem.  The really big program with automating this thing is when the adjectives, nouns and verbs do not have the same symbolic meaning, it might thus be a good idea to seperate verb, adjective and nouns before recombining them one way or another. 

4. MANUALLY select the symbol tables.  I made several attempts to add symbolism through scripting and here you are absolutely right, its neigh impossible to accomplish...  I don't own a watson super computer.

Correction; you do not own a strong AI.

true to all of that.  So here is my proposal....
I'm going to continue to seek out useless words in this dictionary original then post it to my repository(most likely in sections, github, doesn't like really large or really long files to be visible without downloading) then if you want to work with me, I'm more than willing to make this a collaboration.  Symbol selection was by far my hardest issue, next to this yes there was several severe issues, but they were mostly tweaking of scripts and yes in some cases splitting the words that needed to be split, more than anything else.

I really think I just bit off more than I could choose for my first serious mod experience in DF.  but at least its very dwarfy... go big or go home sort of deal.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #42 on: March 29, 2017, 08:46:42 pm »

Or we could start up a list of useful "word groups" together and figure out what needs to be created to bring into the original word set to make it more complete.

personally the most needed words in the original set was these:
Nouns:
all standard vanilla DF plants and animals
all standard vanilla materials (Metals, ores, stones, clays, etc.)
all standard vanilla items(missing alcohol names, missing armor names, etc)

Verbs:
Lots of verbs... would have to make a list of verbs and start picking out good ones not in the original.

Adjectives:
.... so much could be added, the question is should it be added?

Prefixes,suffixes, etc.
not so much, they make things weird from time to time.

In retrospect the original list is ~2100 words.  I wouldn't mind seeing say lists of nouns beyond what I mentioned along with a list of verbs, adjectives, prefixes, suffixes, that are not included in the original set and selecting from them until we reach say ~5k words... With a selection of over 35k we should have no problem doing this.  It would limit the initial scope of how much work is needed to be done....  Just a suggestion. 

I could write scripts that identify the tags for word "types", pull those out and push them into alphabetical lists pretty easily from the original dictionary.  Then initial cull on duplicates and then you have a selection table of words.  From there we plug the words we want, into symbol groups and extrapolate out the word file and symbol file from there.  After that we check the spelling and word forms and move forward.

The main thing is identifying the "word groups" we want to add in, the symbols for those words and progress... the current list is too cluttered by my opinion.

thats a new direction I didn't consider... let me know wht you think. I'm in the middle of a work week right now and so I can't really do much work on this till Saturday.
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal

GoblinCookie

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #43 on: March 30, 2017, 07:53:02 am »

Or we could start up a list of useful "word groups" together and figure out what needs to be created to bring into the original word set to make it more complete.

personally the most needed words in the original set was these:
Nouns:
all standard vanilla DF plants and animals
all standard vanilla materials (Metals, ores, stones, clays, etc.)
all standard vanilla items(missing alcohol names, missing armor names, etc)

Verbs:
Lots of verbs... would have to make a list of verbs and start picking out good ones not in the original.

Adjectives:
.... so much could be added, the question is should it be added?

Prefixes,suffixes, etc.
not so much, they make things weird from time to time.

In retrospect the original list is ~2100 words.  I wouldn't mind seeing say lists of nouns beyond what I mentioned along with a list of verbs, adjectives, prefixes, suffixes, that are not included in the original set and selecting from them until we reach say ~5k words... With a selection of over 35k we should have no problem doing this.  It would limit the initial scope of how much work is needed to be done....  Just a suggestion. 

I could write scripts that identify the tags for word "types", pull those out and push them into alphabetical lists pretty easily from the original dictionary.  Then initial cull on duplicates and then you have a selection table of words.  From there we plug the words we want, into symbol groups and extrapolate out the word file and symbol file from there.  After that we check the spelling and word forms and move forward.

The main thing is identifying the "word groups" we want to add in, the symbols for those words and progress... the current list is too cluttered by my opinion.

thats a new direction I didn't consider... let me know wht you think. I'm in the middle of a work week right now and so I can't really do much work on this till Saturday.

Symbols=Types really.  What you might decide to call them the underlying problem with using a dictionary remains, the dictionary contains a lot of words where the adjective function, noun function and verb function do not mean the same thing.  It also has a large number of nuns, adjectives and verbs that mean a quite different thing but are the exact same word in English.

You could more or less completely ignore the dictionary instead.  You could start with the dwarf fortress universe and the objects that are in it, or presumably in it or will be in it based upon the dev page, adding each of them in as nouns.  Then you can go through the verbs that would be in based upon the same logic, adding a verb function to existing words where appropriate.  Then you can finish off with the adjectives and prefixes, again adding an adjective/prefix function to existing words where appropriate. 

If you forget to add anything in or are ignorant of it's existence, your fanbase can remind you and you can easily add it in.  You can get around the foibles of the English language by translating the dwarf fortress worlds presumed concepts into words which are then translated into english words or hyphenated phrases. 
Logged

Amostubal

  • Bay Watcher
    • View Profile
Re: Amostubal's Expanded Dictionary
« Reply #44 on: March 30, 2017, 07:28:51 pm »

Or we could start up a list of useful "word groups" together and figure out what needs to be created to bring into the original word set to make it more complete.

personally the most needed words in the original set was these:
Nouns:
all standard vanilla DF plants and animals
all standard vanilla materials (Metals, ores, stones, clays, etc.)
all standard vanilla items(missing alcohol names, missing armor names, etc)

Verbs:
Lots of verbs... would have to make a list of verbs and start picking out good ones not in the original.

Adjectives:
.... so much could be added, the question is should it be added?

Prefixes,suffixes, etc.
not so much, they make things weird from time to time.

In retrospect the original list is ~2100 words.  I wouldn't mind seeing say lists of nouns beyond what I mentioned along with a list of verbs, adjectives, prefixes, suffixes, that are not included in the original set and selecting from them until we reach say ~5k words... With a selection of over 35k we should have no problem doing this.  It would limit the initial scope of how much work is needed to be done....  Just a suggestion. 

I could write scripts that identify the tags for word "types", pull those out and push them into alphabetical lists pretty easily from the original dictionary.  Then initial cull on duplicates and then you have a selection table of words.  From there we plug the words we want, into symbol groups and extrapolate out the word file and symbol file from there.  After that we check the spelling and word forms and move forward.

The main thing is identifying the "word groups" we want to add in, the symbols for those words and progress... the current list is too cluttered by my opinion.

thats a new direction I didn't consider... let me know wht you think. I'm in the middle of a work week right now and so I can't really do much work on this till Saturday.

Symbols=Types really.  What you might decide to call them the underlying problem with using a dictionary remains, the dictionary contains a lot of words where the adjective function, noun function and verb function do not mean the same thing.  It also has a large number of nuns, adjectives and verbs that mean a quite different thing but are the exact same word in English.

You could more or less completely ignore the dictionary instead.  You could start with the dwarf fortress universe and the objects that are in it, or presumably in it or will be in it based upon the dev page, adding each of them in as nouns.  Then you can go through the verbs that would be in based upon the same logic, adding a verb function to existing words where appropriate.  Then you can finish off with the adjectives and prefixes, again adding an adjective/prefix function to existing words where appropriate. 

If you forget to add anything in or are ignorant of it's existence, your fanbase can remind you and you can easily add it in.  You can get around the foibles of the English language by translating the dwarf fortress worlds presumed concepts into words which are then translated into english words or hyphenated phrases.

yeah...  see I can pull up from the raws full lists of all materials in game... what is a dwarf calling cat's-eye gems? What do dwarfs call a moose? or a sandstone? or quartzite, limestone, etc....  that's one of the issues that I originally had with language in the game when I started.... there's a lot of words, but so many things didn't have words that a dwarf normally deals with.... there isn't even a word for dwarf... what do dwarfs call themselves, what do elves call dwarfs?  what do humans call elves?

anyways lets see if this next bit prints:
Made a full list of all the words from the original. by word form (Adjective, Noun, Verb, Prefix).

It was a nice piece of script work.  My other big issue with the original files, is that they are piecemeal to begin with.  Words were put in a list, then new words added, then more words were added, each time the original script isn't alphabetized so checking to see if a word already exists is a lot more work than most people would want to do.  Anyways I've got a concept now....

I'm going to pull all thee creature/item/material/etc raws into a folder then search for all the "name" tokens. collect those into one list. then check them off in comparison to the original list.  then I've got the initial additions to the language files.  I'll write a separate script to target all the verbs and adjectives in the dictionary, collect them into a list. That gives a starting point that is a mix of both methods... because honestly I'm working too hard from the top-down approach.... there is some 35k entries and I would estimate that 75%+ is completely useless or already covered in the original list.  Throw out concept of full automation... Its just a pipe dream that is over the top.  Pulling out all the verbs and adjectives is easy its just a collection of all the lines with -adj. adj. -v. and v. in the dictionary.  Around half of those should be culled from the get go... but that way I'm not checking every item/plant/animal/material/etc I come across in the dictionary which is over half of the dictionary to begin with.

I think I'm becoming more focus.  I have 1 more day at the office, then I'm off for 4 days and I'm going to have this final list of words to add done in 3 days that's my goal... Sunday, 72 hours from now.  From there a week to finish spellchecking and assigning symbols.  From there 4 days to get the files written, since its just a process of plugging in appropriate tokens, and then running them through a downloaded language creator.... boom vanilla DF expanded in 2 weeks from rewrite.... and then it will be just a process of adding in words people request and building an additional word table from Masterwork DF (since I originally had started this mod for that community) release that 2 weeks after, max.

schedule:
April 2nd - finished word list.
April 9th - finished spellcheck and symbol integration.
April 13th - files written and languages created. Vanilla DF complete.
April 27th - repeat steps and Masterwork DF complete.

I think that is a rather loose time frame.... With a decent objective plan.... I bet I can do it faster... but I want it done right no matter what.

« Last Edit: March 30, 2017, 07:56:41 pm by Amostubal »
Logged
Legendary Dwarf Fortress
Legendary Discord Group
"...peering into the darkness behind the curtains, evokes visions of pixies being chased by dragons while eating cupcakes made of coral iced with liquid fire while their hearts burn out with unknown plant substances..." - a quote from the diaries of Amostubal
Pages: 1 2 [3] 4