Topic: Proposal: a standard format for mods in a diff/patch Mod Starter Pack (Read 42410 times)

thistleknot · « **Reply #240 on:** August 22, 2014, 01:06:20 pm »

Quote from: Button on August 22, 2014, 12:44:44 pm

Quote from: MagiX on August 22, 2014, 06:56:57 am
What about writing a custom json/xml/whatever style parser that puts these things into (multi-level) dict structures and then comparing the dict structures? This should look like that:
...

great work man!

Yeah, if it could do caste tokens

and maybe even alphabetize?

and basically follow the important stuff in Dirst's post
http://www.bay12forums.com/smf/index.php?topic=142295.msg5595599#msg5595599

then I think you would have a great way to do some diff comparisons inbetween two objects. One doesn't even have to use diff at this point.

One could just note the sequential order of the tokens between
CommonAncestor
ModB
ModC

Note the sequential & instance additive/subtractive/replacing difference of tokens between two objects and track those changes.

Something like

Diff ancestor:mod c applied to modB.

King Mir
Yeah, I'll look into lexer for python sure.

King Mir · « **Reply #241 on:** August 22, 2014, 01:09:43 pm »

Quote from: Button on August 22, 2014, 12:44:44 pm

I was messing around with formats for defining legal raw objects of various types. Mainly what I found is that XML isn't great for it, because it doesn't deal gracefully for tags which are allowed in any order. Might be best to define a custom format if we want to go into it that far.

What about using attributes for tags like that?

Button · « **Reply #242 on:** August 22, 2014, 01:13:41 pm »

Quote from: Merkator on August 22, 2014, 12:59:52 pm

Button: sound great. I thought myself about something like that.
It may be really much better way.
Storing object is not much problem.

But you remember about tag order and the whole [CASTE] thing.
What kind of data structure you use? List with tuple for each token or ordereddict with tuples or ordereddicts as values?

all_objects is a dict(string object_type,dict(string object_name,RawObject)).

RawObject is a custom class. It stores among other things the filename, the object name & type, and the complete text of that raw object as a (ordered) list of strings (including comments).

I figure as processing gets more in-depth we'll be storing more and more data about each raw object; and it sucks to increase the complexity of a data structure that's being repeatedly searched by other sections of the code, when you could hide it inside an object instead.

thistleknot · « **Reply #243 on:** August 22, 2014, 01:25:20 pm »

I was looking at lexar's...

but... I was thinking we need a parser no?

http://pyparsing.wikispaces.com/

found pyparsing

http://stackoverflow.com/questions/1651487/python-parsing-bracketed-blocks

Code: [Select]

>>> from pyparsing import nestedExpr
>>> txt = "{ { a } { b } { { { c } } } }"
>>>
>>> nestedExpr('{','}').parseString(txt).asList()
[[['a'], ['b'], [[['c']]]]]
>>>

lexer research:

If not

lexar wise I've found:
Pygments
http://pygments.org/docs/lexers/
Ply
http://www.dabeaz.com/ply/
ex: http://www.dabeaz.com/ply/example.html

and a big list
https://wiki.python.org/moin/LanguageParsing

King Mir · « **Reply #244 on:** August 22, 2014, 01:28:31 pm »

You can't store objects in a map/dict when order matters. Use a list or array. That means for creature tokens, because of variations like animal people. Alphabetizing is a problem for the same reason.

Button · « **Reply #245 on:** August 22, 2014, 01:39:23 pm »

Quote from: thistleknot on August 22, 2014, 01:06:20 pm

Quote from: Button on August 22, 2014, 12:44:44 pm
Quote from: MagiX on August 22, 2014, 06:56:57 am
What about writing a custom json/xml/whatever style parser that puts these things into (multi-level) dict structures and then comparing the dict structures? This should look like that:
...

Yeah, if it could do caste tokens

Caste tokens are going to be a ways off no matter how we slice it. Since only creature objects use them, and we'll need a mapping file full of which tokens are caste-level and which are creature-level... .

Quote

and maybe even alphabetize?

The work required to alphabetize objects against each other is essentially already done. Alphabetizing tags within an object would be significantly harder, since we'd need to know which-all tags could be reordered safely.

Dirst · « **Reply #246 on:** August 22, 2014, 01:43:04 pm »

Quote from: King Mir on August 22, 2014, 01:28:31 pm

You can't store objects in a map/dict when order matters. Use a list or array. That means for creature tokens, because of variations like animal people. Alphabetizing is a problem for the same reason.

This generalizes for most raw files... there is a top-level object (like CREATURE or REACTION) that is usually, but not always, self-contained. The biggest exception is that base creatures must exist "earlier in the raws" than any variants of that creature. Raw files of a certain type are parsed in alphabetical order based on the internal name at the first line of the file, and tags within a file are parsed in order of appearance.

Most other dependencies are handled by using separate object types. Materials are their own thing, body plans are their own thing, etc. and order makes no difference if you're "calling" an object from a different object type. For example, creatures can "call" any tissue template and then modify it locally without any concern for what order the tissue templates are in.

The one object that hasn't been pulled out into its own files is the SYNDROME type. I expect that to happen, eventually.

King Mir · « **Reply #247 on:** August 22, 2014, 01:54:33 pm »

Quote from: thistleknot on August 22, 2014, 01:25:20 pm

I was looking at lexar's...

but... I was thinking we need a parser no?

http://pyparsing.wikispaces.com/

found pyparsing

http://stackoverflow.com/questions/1651487/python-parsing-bracketed-blocks
Code: [Select]
>>> from pyparsing import nestedExpr >>> txt = "{ { a } { b } { { { c } } } }" >>> >>> nestedExpr('{','}').parseString(txt).asList() [[['a'], ['b'], [[['c']]]]] >>>
lexer research:

If not

lexar wise I've found:
Pygments
http://pygments.org/docs/lexers/
Ply
http://www.dabeaz.com/ply/

and a big list
https://wiki.python.org/moin/LanguageParsing

So for a bit of background:
A lexer converts text into a sequence of tokens or glyphs. Using a series of regular expressions specified for each token, It would create a function what takes a raw file, and puts out a token structure every time it's called. Effectively it is a stream of tokens. This is something that I think a tool may help with for DF raws.

A parser converts the output of a lexer into rules based on the language grammar. So it has a grammar file that maps the structure of a DF raw to snippets of code that are run whenever a particular token sequence is encountered. It could read in a token and figure out what level token (Cast, creature, objext, etc), and have seperate code for each, including cases like when a cast level token is encountered creature level.

Of course to parse the raws you need both. However, grammar wise, the raws of DF are very simple. So what I'm saying is, you don't necessarily need a parser generator, and it may be simpler not to use one. Unless that is it comes bundled with the lexer anyway, and is easy to use because of that. So definitely look at lexers. Maybe look at parsers too, while you're at it. Often they are bundled together.

EDIT:
On second thought look at parsers too. You're if you're thinking of writing a parser, you should know what tools exist.

Button · « **Reply #248 on:** August 22, 2014, 01:55:16 pm »

Quote from: Dirst on August 22, 2014, 01:43:04 pm

Quote from: King Mir on August 22, 2014, 01:28:31 pm
You can't store objects in a map/dict when order matters. Use a list or array. That means for creature tokens, because of variations like animal people. Alphabetizing is a problem for the same reason.
This generalizes for most raw files... there is a top-level object (like CREATURE or REACTION) that is usually, but not always, self-contained. The biggest exception is that base creatures must exist "earlier in the raws" than any variants of that creature. Raw files of a certain type are parsed in alphabetical order based on the internal name at the first line of the file, and tags within a file are parsed in order of appearance.

Don't worry, King Mir, the dict is just for object lookup. Order is preserved within the object.

I didn't realize that base creatures had to be earlier in the raws than their variations; I'll make a note of that for write-out logic.

thistleknot · « **Reply #249 on:** August 22, 2014, 02:03:04 pm »

Quote from: Button on August 22, 2014, 01:55:16 pm

Quote from: Dirst on August 22, 2014, 01:43:04 pm
Quote from: King Mir on August 22, 2014, 01:28:31 pm
You can't store objects in a map/dict when order matters. Use a list or array. That means for creature tokens, because of variations like animal people. Alphabetizing is a problem for the same reason.
This generalizes for most raw files... there is a top-level object (like CREATURE or REACTION) that is usually, but not always, self-contained. The biggest exception is that base creatures must exist "earlier in the raws" than any variants of that creature. Raw files of a certain type are parsed in alphabetical order based on the internal name at the first line of the file, and tags within a file are parsed in order of appearance.

Don't worry, King Mir, the dict is just for object lookup. Order is preserved within the object.

I didn't realize that base creatures had to be earlier in the raws than their variations; I'll make a note of that for write-out logic.

I believe that's what RawExplorer author was saying.

King Mir · « **Reply #250 on:** August 22, 2014, 02:11:59 pm »

Do we really want to auto-alphabetize everything generally? Some things like gems are kinda convenient if sorted by worth. On the other hand, for stockpiles are clearer when things are in alphabetical order.

Agreeably, second level lags who's order doesn't effect the game at all should be put in a dict/map to make it easy to check for changes to the same token.

Dirst · « **Reply #251 on:** August 22, 2014, 02:21:01 pm »

Quote from: King Mir on August 22, 2014, 02:11:59 pm

Do we really want to auto-alphabetize everything generally? Some things like gems are kinda convenient if sorted by worth. On the other hand, for stockpiles are clearer when things are in alphabetical order.

Agreeably, second level lags who's order doesn't effect the game at all should be put in a dict/map to make it easy to check for changes to the same token.

Everything? No.

For the dict/map idea, is there a reason to stop at two levels? For most structures, it'd be sufficient to soak up all subtokens and just keep them with the parent. Castes, however, are a major pain in the ass. Or, they would be if Dwarves had asses.

The only pseudo-simple solution I can think of is to treat caste declarations and caste selections as milestones within the CREATURE object. A similar tag on the other side of such a milestone is considered a different tag rather than a duplicate one. This will keep the caste-level tags from overwriting each other without requiring an exhaustive list of caste-level tags.

thistleknot · « **Reply #252 on:** August 22, 2014, 02:32:00 pm »

Actually, if if objects are matched there's no need for alphabetizing. That was more a concern using a diff approach on an entire file when comparing to mods that significantly reordered their files.

Merkator · « **Reply #253 on:** August 22, 2014, 02:47:07 pm »

For parsing there are literally tons of tools and libs.
Even re may be all we need. Only problem I found was stuff like
[TILE:1:3:':':4]
It is just sample. In C I would just parse char by char. In python to.
If not this little uggly things whole parser could be almost in one line like

Code: [Select]

l = l.strip('[').strip(']').split(':')
token = (l[0], l[1:])

Damn, i would love to know how you solve this...

King Mir · « **Reply #254 on:** August 22, 2014, 04:41:43 pm »

Quote from: Merkator on August 22, 2014, 02:47:07 pm

For parsing there are literally tons of tools and libs.
Even re may be all we need. Only problem I found was stuff like
[TILE:1:3:':':4]
It is just sample. In C I would just parse char by char. In python to.
If not this little uggly things whole parser could be almost in one line like
Code: [Select]
l = l.strip('[').strip(']').split(':') token = (l[0], l[1:])
Damn, i would love to know how you solve this...

You solve this by writing a proper lexer, that converts a string into a list of tokens. Then write a state machine for each kind of token. So in your example the list of tokens starts like this: ['[',KEYWORD="TILE",':','1',':' ....
(Keywords actually can just be enum values, but strings in general can't, so you need the extra data for them.)

News:

Author Topic: Proposal: a standard format for mods in a diff/patch Mod Starter Pack (Read 42410 times)

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Button

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Button

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Dirst

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Button

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Dirst

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

Merkator

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack