So, this is basically more evidence that I really need to spend some time learning how to use the regular expression library rather than hand-writing my own parsers.
It seems so. This was the first script I've done with the re module, even though I've seen it in action many times.
re.split(':|\][\s]*\[', line)
I'm not sure if you're already familiar with regular expressions, but lets break it down anyway. The first argument for the splitting method reads as, "(break at the character ':') OR (break at the combination of ']' plus any combination of whitespace characters plus '[')."
Now I'm worried that a newline between brackets can bring unintended results, but it shouldn't be a problem when reading from a file like in this example...
return re.sub("'", '', re.sub('^[\s]*\[', '', re.sub('\][\s]*$', '', item)).strip())
This doesn't need to be written as a one-liner, but I've understood that there's some kind of 'coolness' factor involved in doing just that.
string = re.sub('\][\s]*$', '', source)
string = re.sub('^[\s]*\[', '', string)
string = string.strip()
return re.sub("'", '', string)
The first one matches the character ']' if it's the very last character in the string, and replaces it with nothing. The second one matches the character '[' if it's the very first character in the string, and replaces it with nothing. The third line removes whitespace around the string, and the final string matches the single quote and removes it. the '[\s]*' means 0 or more whitespace characters.
items.insert(0, re.findall('^(\t*)', line)[0])
This pattern matches any number of tab characters in the beginning of the string. The findall() method returns a list and the parentheses make it so that the matching character is preserved so that findall can return it. This returns a list with only one item in it, that being a string with all the tab characters found from the beginning of the source string, which is then inserted to the beginning of the results so that indendation can be tracked.