Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 ... 14 15 [16] 17 18 ... 22

Author Topic: Proposal: a standard format for mods in a diff/patch Mod Starter Pack  (Read 41637 times)

King Mir

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #225 on: August 22, 2014, 06:45:37 am »

Code: [Select]
import os
import difflib

context_lines = 2
if os.path.isfile(mixed_raw_folder+file+'.patch'):
    os.remove(mixed_raw_folder+file+'.patch')
for line in difflib.unified_diff(open(vanilla_raw_folder + file).readlines(),
                                 open(mod_raw_folder + file).readlines(), n=context_lines):
    with open(mixed_raw_folder+file+'.patch', 'a') as item:
        item.write(line)

Creating a unified patch file with a few lines of context (two lines matches within but not between objects) fixes the [pet] issue, but I can't work out how to apply a unified patch with python.  Argh.
You can get difflib.Differ.unified_diff() to print out a unified diff, but merging isn't provided in the library.

Quote
<Python 3.x compatible version>
Well, it no longer freaks out about the print statement  :)  Unfortunately it also outputs the contents of the vanilla file  :(

My testing, though I don't follow the various opcodes, shows that the output_file_temp returned by do_merge_seq() is the same as the contents of the vanilla file.
It should return 1 though. If it returns 1, then the output is garbage; it detected a conflict and gave up. It prints out vanilla because it got to the end of vanilla before finding a conflict. To see it returned 1, "echo $?" imediately after running it. You can run it like this:
python mergemod.py mod_file.txt vanilla_file.txt target_file.txt ; echo $?

But I need to test it more. See why it's complaining about maximum recursion depth to thistleknot, and test it more for correctness.

MagiX

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #226 on: August 22, 2014, 06:56:57 am »

Spoiler (click to show/hide)
I haven't read the entire thread, just the last few pages... quite a discussion going on here :)

What about writing a custom json/xml/whatever style parser that puts these things into (multi-level) dict structures and then comparing the dict structures? This should look like that:
Code: (vanilla) [Select]
vanilla_dict={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here}}}
Code: (Pet) [Select]
Mod_1={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here,"PET":''}}}
Code: (Add animal) [Select]
Mod_2={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here}},
"Desert_tortoise":{all the key/value pairs from new animal here}}
So for every key, one could check if the value (i.e. a new dict) is the same or not and if it is not the same, one can do this recursively. A simple 2 dict comparison can be found here
We thus have some options:
  • Stuff that is unchanged is copied to the mixed mod dict
  • Stuff that is simply added (as the [PET] tag or the new creature) will be added to the mixed mod dict
  • Stuff that is changed --> check if the same key/value pair is changed in both mods --> yes: problem; no: copy the change to the mixed mod dict
  • Stuff that is removed in the mod --> remove from mixed mod dict
and as a final step, one should parse the mixed mod dict into a file.
Logged

King Mir

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #227 on: August 22, 2014, 07:42:27 am »

Spoiler (click to show/hide)
I haven't read the entire thread, just the last few pages... quite a discussion going on here :)

What about writing a custom json/xml/whatever style parser that puts these things into (multi-level) dict structures and then comparing the dict structures? This should look like that:
Code: (vanilla) [Select]
vanilla_dict={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here}}}
Code: (Pet) [Select]
Mod_1={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here,"PET":''}}}
Code: (Add animal) [Select]
Mod_2={"Creature":{"Giant_leopard_gekko":{all the key/value pairs from vanilla here}},
"Desert_tortoise":{all the key/value pairs from new animal here}}
So for every key, one could check if the value (i.e. a new dict) is the same or not and if it is not the same, one can do this recursively. A simple 2 dict comparison can be found here
We thus have some options:
  • Stuff that is unchanged is copied to the mixed mod dict
  • Stuff that is simply added (as the [PET] tag or the new creature) will be added to the mixed mod dict
  • Stuff that is changed --> check if the same key/value pair is changed in both mods --> yes: problem; no: copy the change to the mixed mod dict
  • Stuff that is removed in the mod --> remove from mixed mod dict
and as a final step, one should parse the mixed mod dict into a file.
Separating it into two levels like that does solve that particular problem. But you can't just use a dict, because you need to preserve the order of some tags.

But using json or xml may mean we can find a good diff/merge tool, that is aware of how the order of some things doesn't matter and some things do. XML is more powerful than JSON here, because the same xml tag can have both attributes which are unordered, and nested tags that are ordered.

PeridexisErrant

  • Bay Watcher
  • Dai stihó, Hrasht.
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #228 on: August 22, 2014, 07:49:51 am »

Either sounds good, both are beyond my current skills. 

Go for it, and I'll keep writing documentation and design ideas for stuff I can't code yet :P
Logged
I maintain the DF Starter Pack - over a million downloads and still counting!
 Donations here.

MagiX

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #229 on: August 22, 2014, 08:34:48 am »

you need to preserve the order of some tags.
Is there some kind of rule for that? After just briefly scanning some of the files, I haven't seen a "clear" pattern, besides indentation and even that does not seem to be consistent in all cases.

beyond my current skills. 
Time to learn sth new :)
I have no clue how to approach this either, but just thought about it and why not share my idea
Logged

King Mir

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #230 on: August 22, 2014, 08:47:30 am »

Well the first step would be to find an xml merge tool. You might also try to write an XSLT script that does such a merge and properly identifies conflicting mergers; writing an XSLT script may be easier than writing a merge algorithm from scratch in python. Without such a tool, taking a round-trip through xml is pointless.

IMO, a from scratch python script that does merging on a 2+ level structure is probably the best way to go eventually.

Anyway, I'm going to keep working on my merge algorithm for now. And maybe add more boilerplate.

King Mir

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #231 on: August 22, 2014, 08:52:25 am »

you need to preserve the order of some tags.
Is there some kind of rule for that? After just briefly scanning some of the files, I haven't seen a "clear" pattern, besides indentation and even that does not seem to be consistent in all cases.
I'm not a modder, so I don't know the details, but some tags clearly suggest that order matters for them, like [GO_TO_END]. Other tags, like [PET] can be put anywhere after the creature token.

Go for it, and I'll keep writing documentation and design ideas for stuff I can't code yet :P
Design is good. There's a lot of fairly strait forward stuff that needs to be done to manage everything. You need to be able to specify the list of mods. You need to be able to delete the output when merging fails. You probably want to figure out which two mods conflict when merging, which requires extra analysis. And of course the GUI -- designing and stubbing out the GUI can help plan what features you want even if they aren't immediately implemented.
« Last Edit: August 22, 2014, 09:03:21 am by King Mir »
Logged

Putnam

  • Bay Watcher
  • DAT WIZARD
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #232 on: August 22, 2014, 09:30:04 am »

you need to preserve the order of some tags.
Is there some kind of rule for that? After just briefly scanning some of the files, I haven't seen a "clear" pattern, besides indentation and even that does not seem to be consistent in all cases.
I'm not a modder, so I don't know the details, but some tags clearly suggest that order matters for them, like [GO_TO_END]. Other tags, like [PET] can be put anywhere after the creature token.

[GO_TO_END] is pretty much only there because castes are not declared at the start. Castes and not-creature tokens imbedded in creatures (I.E tissues and materials) are the only thing where positioning matterse.

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #233 on: August 22, 2014, 11:19:18 am »

So I've been thinking. The reason we have these a (a being our base/common ancestor) vs b vs c token replacement between mods is what I would call "collisions" when trying to add to the same line (not replace but both are additive).

I was thinking if we somehow marked them in patch files in some post process. Maybe we can modify the changed token to be ]###newobject or ]#oldobjectToken


Since.I figured out how diff3 works I've been reusing kdiff3 in a whole new way.

Most of these collisions are resolved w inserting b before c or c before b. If we could somehow incorporate that logic as a post processor command in the patch files. Maybe an ]###add or ]#replace or ###Del... On each add of replace line (in a patch file).
we might be able to resolve this issue
« Last Edit: August 22, 2014, 11:42:54 am by thistleknot »
Logged

Merkator

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #234 on: August 22, 2014, 11:29:10 am »

I think we end up with full featured parser. Anyone here with some knowledge about Haskell and Parsec. ;)

BTW I wrote my small diff parser and end up with 100 LOC.

I post it when I do some bugtesting and clean up this piece of... I mean beauty. :P
 
Logged

Dirst

  • Bay Watcher
  • [EASILY_DISTRA
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #235 on: August 22, 2014, 11:40:28 am »

you need to preserve the order of some tags.
Is there some kind of rule for that? After just briefly scanning some of the files, I haven't seen a "clear" pattern, besides indentation and even that does not seem to be consistent in all cases.
I'm not a modder, so I don't know the details, but some tags clearly suggest that order matters for them, like [GO_TO_END]. Other tags, like [PET] can be put anywhere after the creature token.

[GO_TO_END] is pretty much only there because castes are not declared at the start. Castes and not-creature tokens imbedded in creatures (I.E tissues and materials) are the only thing where positioning matterse.
There are four kinds of order dependence in the raws, with an example for each at the end.

1. The header (filename and OBJECT: declaration) need to come first in a file.
2. Variations need to be defined after the base creature.
3. Several tokens accept a list of subtokens to build a structure.  The structure closes when the parser hits the first token that isn't a valid subtoken in that context.
4. Castes are a special case of 3.  First, everything that appears before the first caste declaration is applied to ALL castes, nothing closes a caste structure except another caste declaration, and a caste declaration can be re-opened later in the same creature.

Example of 1: the creature_standard and [OBJECT:CREATURE] at the top of a file.
Examples of 2: a giant kea can't be defined until a kea is already defined, an olm man can't be defined before an olm (tiger men are an exception... they are made from scratch rather than being a tiger variant).
Examples of 3: each CREATURE sucks up all tags until it hits another CREATURE tag, and a SYNDROME sucks up all tags until it runs out of syndrome-defining or creature-effect-defining tags.
Example of 4: the intelligent creatures tend to have a lot of definition up front, briefly split into MALE and FEMALE castes, then select all castes again to finish up.

The easiest way to handle this is to hardcode in 1, and treat order within a top-level object as if it is critical to handle 3 and 4. Case 2 is the one that prevents us from alphabetizing things.

One way to handle that is to do a two-level sort.  Base creatures are listed alphabetically, then all variations are listed alphabetically.  The logic could be re-used later if we want to alphabetize gems within categories or something weird like that.
Logged
Just got back, updating:
(0.42 & 0.43) The Earth Strikes Back! v2.15 - Pay attention...  It's a mine!  It's-a not yours!
(0.42 & 0.43) Appearance Tweaks v1.03 - Tease those hippies about their pointy ears.
(0.42 & 0.43) Accessibility Utility v1.04 - Console tools to navigate the map

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #236 on: August 22, 2014, 12:18:19 pm »

That is exactly the info we need to build a raw structure.

So...

I didn't have much luck tweaking my script to replace blank lines with [token] and then back-update with blank lines...

but... I did get the command down to 1 line in a for loop

ParseRawsv4a.bat
Code: [Select]
echo off
REM put tokens on their own line | REM remove tabs | remove all blanklines
for /f %%a in ('dir /b *.txt') do sed -e "s/\[[^][]*\]/\n&\n/g" %%~na.txt | sed -r "s/\t//g" | sed -e "s/^ *//; s/ *$//; /^$/d; s/\r//; /^\s*$/d" > %%~na.out |type %%~na.out
REM cleanup
ren *.out3 *.txt
erase *.out
echo on

I think this flatten should only be applied to the items within the [objects] folder.  Things that affect speech and text seem to be read per line vs per token.

I was hoping to address the whitespace removal possibly affecting when two mods add tailing tokens at the end of objects.  If the dictionary/match PE was trying to attempt, I assumed the relevant whitespace that trails any token additions at the end of objects would be relevant and NOT wish to be deleted, but idk.  Either way, I had a bit of trouble

There is one way to do it, but then I'd have to use cygwin...
http://stackoverflow.com/questions/11393616/replace-string-that-contains-crlf

Button

  • Bay Watcher
  • Plants Specialist
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #237 on: August 22, 2014, 12:44:44 pm »

What about writing a custom json/xml/whatever style parser that puts these things into (multi-level) dict structures and then comparing the dict structures? This should look like that:

Hey guys, sorry for not keeping you up to date on my postprocessor, blah blah work blah blah food poisoning. What I have so far works a lot like this but without needing XML (yet).

So far what I have is code to read a raw file and parse it into raw objects. The code is on my home computer, but here's the pseudocode as I remember it:

Spoiler (click to show/hide)

The idea is, that we parse each mod into a collection of raw objects, indexed by object type and object name. This catches duplicate raws during loading.

It can easily be expanded into comparing each collection of raw objects to each other. Mods which add new, raw objects would be trivial with this setup. Mods which remove or make changes to existing objects would require additional handling, but there's plenty of room for it.

I was messing around with formats for defining legal raw objects of various types. Mainly what I found is that XML isn't great for it, because it doesn't deal gracefully for tags which are allowed in any order. Might be best to define a custom format if we want to go into it that far.
Logged
I used to work on Modest Mod and Plant Fixes.

Always assume I'm not seriously back

Merkator

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #238 on: August 22, 2014, 12:59:52 pm »

Button: sound great. I thought myself about something like that.
It may be really much better way.
Storing object is not much problem.

But you remember about tag order and the whole [CASTE] thing.
What kind of data structure you use? List with tuple for each token or ordereddict with tuples or ordereddicts as values?

Logged

King Mir

  • Bay Watcher
    • View Profile
Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack
« Reply #239 on: August 22, 2014, 01:05:35 pm »

I think we end up with full featured parser. Anyone here with some knowledge about Haskell and Parsec. ;)

Good point.

I have some experience with parsers and parser generators, but DF raws are so primitive that a parser generator seems overkill. There's very little grammar to DF raws, and checking the grammar is not important at all. On the other hand, maybe a lexer tool would be worthwile. Thistleknot, you might want to look into this: if there's a "lex" or lexer generator tool that compiles into python or a portable language. Maybe ANTLR, if it has a sufficiently documented Python generator. It might make reading and de-serializing raws easier.
Pages: 1 ... 14 15 [16] 17 18 ... 22