Topic: Proposal: a standard format for mods in a diff/patch Mod Starter Pack (Read 42423 times)

King Mir · « **Reply #75 on:** August 17, 2014, 10:04:54 am »

You can change "^[^[]*" to "\s*" so it just removes leading whitespace, but not leading comments. Otherwise you would have to treat the first line specially. I suspect the blank lines are just the lines where it removed comments; it doesn't strip empty lines, so the comments are removed, but the line stays.

And yeah, I forgot for windows you'll need \n to be \r\n or I guess `r`n (if ` is powershell's escape character).

Actually, if replace isn't a line editor like sed, then you may just omit the first expression. The second will remove trailing comments only, and the last one will handle everything else. Maybe I should look up the documentation of replace instead of guessing...

thistleknot · « **Reply #76 on:** August 17, 2014, 10:14:41 am »

"^[^[]*"

I searched for that string, closest I found is
"s/^[^[]*//"

in
sed -e "s/^[^[]*//" -e "s/][^\[]*$/]/" -e "s/][^[]*\[/]\n\[/g"

so

this?
sed -e "s/\s//" -e "s/][^\[]*$/]/" -e "s/][^[]*\[/]\n\[/g"
yep

Still leaves the comments adjacent to tokens

try this as input data

Code: [Select]

item_gloves

[OBJECT:ITEM]

###test###
	[ITEM_GLOVES:ITEM_GLOVES_GAUNTLETS]###test###
[NAME:gauntlet:gauntlets]
###test###[ARMORLEVEL:2]
[UPSTEP:1]
###test###[SHAPED]
[LAYER:ARMOR]###test######test###
[COVERAGE:100]
[LAYER_SIZE:20]
[LAYER_PERMIT:15]
[MATERIAL_SIZE:2]
[SCALED]
[BARRED]
[METAL]
[LEATHER]
[HARD]

I'm looking at the regular expressions document on python's page
https://docs.python.org/2/howto/regex.html

Spoiler: I think I need to use a (click to show/hide)

Spoiler: nm (click to show/hide)

may have an answer here
http://stackoverflow.com/questions/6125098/how-to-match-any-non-white-space-character-except-a-particular-one

King Mir · « **Reply #77 on:** August 17, 2014, 10:32:02 am »

So instead of mucking around with shell scripts, here's a python script that will remove all internal comments (but not leading or trailing comments), and put all tags on one line:

Code: [Select]

import re
import sys

print re.sub("][^[]*\[","]\n[",sys.stdin.read())

thistleknot · « **Reply #78 on:** August 17, 2014, 10:39:05 am »

Code: [Select]

C:\Games\Dwarf Fortress\github comparisons\BasedOnVanillaRaws\BasedOnVanillaRaws
\test_SafetoDeleteMe>py
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import sys
>>>
>>> print re.sub("][^[]*\[","]\n[",sys.stdin.read())
  File "<stdin>", line 1
    print re.sub("][^[]*\[","]\n[",sys.stdin.read())
           ^
SyntaxError: invalid syntax
>>>

yeah, I'm just trying to nail down the "correct" parse method so I can do some diff comparisons and attempt some merges, but... comments are a big deal. They add signature data to the file by being a contextual signature, and they are harmless/uniform if put on their own line like tokens are

preserving comments also preserves commented out tokens, as well as file_name information.

the problem I've noticed with most past solutions, is... everything is checking if two brackets are next to each other, but fail to separate non bracket comments apart from token lines.

This is important because if a comment is adjacent to a token. a diff will read that as more than what it should, because diff reads the whole line. For tokens, we only want a token read, not some comment[token] or [token]comment. It should be

comment
[token]

or

[token]
comment

King Mir · « **Reply #79 on:** August 17, 2014, 10:43:07 am »

Quote from: thistleknot on August 17, 2014, 10:14:41 am

"^[^[]*"

I searched for that string, closest I found is
"s/^[^[]*//"

in
sed -e "s/^[^[]*//" -e "s/][^\[]*$/]/" -e "s/][^[]*\[/]\n\[/g"

so

this?
sed -e "s/\s//" -e "s/][^\[]*$/]/" -e "s/][^[]*\[/]\n\[/g"
yep

Here's an explanation of what's going on with "^[^[]*"

The initial ^ matches the beginning. In sed this means the beginning of a line, because sed matches everything per line only. In Python ^ will defaultly just match the beginning of the whole string/file. I don't know about powershell.

The [^[]* matches all non [ characters. This is a character set in brackets. The ^ inverts the character set. It contains the one member [. So with the * it matches all characters up untill the first [.

So it removes leading comments.

thistleknot · « **Reply #80 on:** August 17, 2014, 10:50:08 am »

would this match any character followed by ]?

And I DON'T want to remove comments (ATM)

/[\s\[]/

Spoiler: I'm trying some stuff (click to show/hide)

King Mir · « **Reply #81 on:** August 17, 2014, 11:01:07 am »

Quote from: thistleknot on August 17, 2014, 10:39:05 am

Code: [Select]
C:\Games\Dwarf Fortress\github comparisons\BasedOnVanillaRaws\BasedOnVanillaRaws \test_SafetoDeleteMe>py Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (In tel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> import sys >>> >>> print re.sub("][^[]*\[","]\n[",sys.stdin.read()) File "<stdin>", line 1 print re.sub("][^[]*\[","]\n[",sys.stdin.read()) ^ SyntaxError: invalid syntax >>>
yeah, I'm just trying to nail down the "correct" parse method so I can do some diff comparisons and attempt some merges, but... comments are a big deal. They add signature data to the file by being a contextual signature, and they are harmless/uniform if put on their own line like tokens are

preserving comments also preserves commented out tokens, as well as file_name information.

the problem I've noticed with most past solutions, is... everything is checking if two brackets are next to each other, but fail to separate non bracket comments apart from token lines.

This is important because if a comment is adjacent to a token. a diff will read that as more than what it should, because diff reads the whole line. For tokens, we only want a token read, not some comment[token] or [token]comment. It should be

comment
[token]

or

[token]
comment

Maybe your version of Python is being pedantic about () for print statements. Adding ( after print, and ) at the end should fix that.

So you want to preserve comments but put them on their own line. Ok. This unfortunately generates an extra newline in the simple case.

Code: [Select]

import re
import sys
print(re.sub("]\s*([^[]*?)\s*[","]\n\\1\n\[",sys.stdin.read()))

Or you can do it line by line, like sed.

thistleknot · « **Reply #82 on:** August 17, 2014, 11:05:14 am »

I have no idea how to use this script in windows.

I tried creating a .py file with it and running it as

py test.py *.txt

or even py text.py

or just running the commands in a py console

but it just sits there blank looking at me as if I'm supposed to feed it data
I tried typing in *.txt...
still blank

Spoiler (click to show/hide)

King Mir · « **Reply #83 on:** August 17, 2014, 11:07:58 am »

Quote from: thistleknot on August 17, 2014, 10:50:08 am

would this match any character followed by ]?

And I DON'T want to remove comments (ATM)

/[\s\[]/

That matches any whitespace or [, so no.

All characters followed by ] is /[^\]]*]/ (the final ] matches itself). But you don't want to touch the inside of the token. so you're more likely to match everything before [ which is /[^[]*/

King Mir · « **Reply #84 on:** August 17, 2014, 11:13:58 am »

Quote from: thistleknot on August 17, 2014, 11:05:14 am

but it just sits there blank looking at me as if I'm supposed to feed it data
I tried typing in *.txt...
still blank

You need to feed it input with <raw_file.txt, and write the output to file with >outputfile.txt. so the command might look like "python pyscript.py <creature_birds.txt >creature_birds.test.txt". You can also paste the script directly into the terminal with python -c 'script_body', again specifying input and output files with < and >.

This has the effect of feeding standard input from the input file, and standard output to the output file.

EDIT: unexpected end of regular expression is because I forgot to escape the last [. I fixed it, but not in the script I posted apparenly. look at it now.

thistleknot · « **Reply #85 on:** August 17, 2014, 11:19:22 am »

i'm working on the sed thing tbh with all the issues I'm having with python

Code: [Select]

C:\Games\Dwarf Fortress\github comparisons\BasedOnVanillaRaws\BasedOnVanillaRaws
\test_SafetoDeleteMe>py test.py < item_gloves.txt
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print(re.sub("]\s*([^[]*?)\s*[","]\n\\1\n[",sys.stdin.read()))
  File "C:\Python34\lib\re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python34\lib\re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python34\lib\sre_compile.py", line 465, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Python34\lib\sre_parse.py", line 746, in parse
    p = _parse_sub(source, pattern, 0)
  File "C:\Python34\lib\sre_parse.py", line 358, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\Python34\lib\sre_parse.py", line 484, in _parse
    raise error("unexpected end of regular expression")
sre_constants.error: unexpected end of regular expression

C:\Games\Dwarf Fortress\github comparisons\BasedOnVanillaRaws\BasedOnVanillaRaws
\test_SafetoDeleteMe>

try2

Code: [Select]

C:\Games\Dwarf Fortress\github comparisons\BasedOnVanillaRaws\BasedOnVanillaRaws
\test_SafetoDeleteMe>python test.py <item_gloves.txt >test.txt
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print(re.sub("]\s*([^[]*?)\s*[","]\n\\1\n[",sys.stdin.read()))
  File "C:\Python34\lib\re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python34\lib\re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python34\lib\sre_compile.py", line 465, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Python34\lib\sre_parse.py", line 746, in parse
    p = _parse_sub(source, pattern, 0)
  File "C:\Python34\lib\sre_parse.py", line 358, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\Python34\lib\sre_parse.py", line 484, in _parse
    raise error("unexpected end of regular expression")
sre_constants.error: unexpected end of regular expression

I don't know. i've spent too much time on this already. I could have just finished up my own manual parsing of tokens in c by now.

I have no idea how regular expressions work. There were some fancy suggestions to use readahead, and I think negative readahead (that were mentioned on stackexchange)

This matches any character followed by a [

but... I'm not sure how to use sed to replace *[ with *NewLine[

sed -e "s/[^[]*//"

Here's the sample file

Code: [Select]

item_gloves

[OBJECT:ITEM]

###test###[ITEM_GLOVES:ITEM_GLOVES_GAUNTLETS]###test###
[NAME:gauntlet:gauntlets]###test###
[ARMORLEVEL:2][UPSTEP:1]###test###
[SHAPED]
[LAYER:ARMOR]###test######test###
[COVERAGE:100]
###TEST
[LAYER_SIZE:20]
[LAYER_PERMIT:15]
[MATERIAL_SIZE:2]
[SCALED]
[BARRED]
[METAL]
[LEATHER]
[HARD]

desired output

Code: [Select]

item_gloves

[OBJECT:ITEM]

###test###
[ITEM_GLOVES:ITEM_GLOVES_GAUNTLETS]
###test###
[NAME:gauntlet:gauntlets]
###test###
[ARMORLEVEL:2]
[UPSTEP:1]
###test###
[SHAPED]
[LAYER:ARMOR]
###test######test###
[COVERAGE:100]
###TEST
[LAYER_SIZE:20]
[LAYER_PERMIT:15]
[MATERIAL_SIZE:2]
[SCALED]
[BARRED]
[METAL]
[LEATHER]
[HARD]

I suppose since I'm using a wildcard, I have to store a variable...

SED s/abc/xyz/g filename

That means substitute xyz with abc for the whole file.

King Mir · « **Reply #86 on:** August 17, 2014, 11:30:23 am »

The issue is not with python, I just had a bug in my regular expression.

sed just gets you succinctness at the cost of portability. This is is how to swap with sed:

Code: [Select]

sed s/replace_me/replace_with/g <from_file.txt >to_file.txtAnd this is the python script (which can be run from file)

Code: [Select]

import re
import sys

for line in sys.stdin.readlines():
  print(re.sub("replace_me","replace_with",line))

(If run from the terminal be sure to use single quotes around the code or escape the double quotes.)

Of course replace_me and replace_with as appropriate.

thistleknot · « **Reply #87 on:** August 17, 2014, 11:31:32 am »

Quote from: King Mir on August 17, 2014, 11:30:23 am

The issue is not with python, I just had a bug in my regular expression.

sed just gets you succinctness at the cost of portability. This is is how to swap with sed:
Code: [Select]
sed s/replace_me/replace_with/g <from_file.txt >to_file.txtAnd this is the python script (which can be run from file)
Code: [Select]
import re import sys for line in sys.stdin.readlines(): print(re.sub("replace_me","replace_with",line))(If run from the terminal be sure to use single quotes around the code or escape the double quotes.)

Of course replace_me and replace_with as appropriate.

sed is gnuwin32 though. It was like a 256kb download

I need to stop spamming this thread and sort through the answers I have atm and see if I can come up with something using sed
http://word.mvps.org/faqs/general/usingwildcards.htm

King Mir · « **Reply #88 on:** August 17, 2014, 11:44:13 am »

Quote from: thistleknot on August 17, 2014, 11:19:22 am

I don't know. i've spent too much time on this already. I could have just finished up my own manual parsing of tokens in c by now.

I have no idea how regular expressions work. There were some fancy suggestions to use readahead, and I think negative readahead (that were mentioned on stackexchange)

Regular expressions are worth learning for their own sake, so don't think of it as wasted time.

Readahead and negative readahead basically constrain the match with a required prefix of suffix without that prefix or suffix being part of the matched group. You'd have too look up the syntax for it.

There are online tools for testing regular expression, like http://regexpal.com/. Might be easier than running a script. Note that sed matches line by line, like my for loop in python, whereas some tools will match on the whole file.

thistleknot · « **Reply #89 on:** August 17, 2014, 12:01:32 pm »

well I may have something here

thanks to crowdsourcing & 0xdeadbeef

http://stackoverflow.com/questions/25350991/hold-variable-in-regular-expression-using-sed

sed -e "s/\]$.$/]\n\1/g;s/ *\[/[/g;s/$.$\[/\1\n[/g" *.txt

it does what I need for comments, but there is one additional newline (on windows, but on stack, the guy is on linux, and I don't think it made a newline for them)

sample output based on input:

Code: [Select]

item_gloves

[OBJECT:ITEM]

###test###
[ITEM_GLOVES:ITEM_GLOVES_GAUNTLETS]
###test###
[NAME:gauntlet:gauntlets]
###test###
[ARMORLEVEL:2]

[UPSTEP:1]
###test###
[SHAPED]
[LAYER:ARMOR]
###test######test###
[COVERAGE:100]
###TEST
[LAYER_SIZE:20]
[LAYER_PERMIT:15]
[MATERIAL_SIZE:2]
[SCALED]
[BARRED]
[METAL]
[LEATHER]
[HARD]

News:

Author Topic: Proposal: a standard format for mods in a diff/patch Mod Starter Pack (Read 42423 times)

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

King Mir

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack

thistleknot

Re: Proposal: a standard format for mods in a diff/patch Mod Starter Pack