Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: MNML: a minimalistic structured data format  (Read 1036 times)

Nadaka

  • Bay Watcher
    • View Profile
    • http://www.nadaka.us
MNML: a minimalistic structured data format
« on: August 21, 2012, 06:22:42 pm »

I am creating this thread so I don't spam up the if self.isCoder(): post() #Programming Thread

MNML

Preface:

What is bad about XML:
1: verbosity. each element name is repeated twice, and each of those names is enclosed in its own set of braces.
2: whitespace. xml has fiddly rules for what white space is and is not important, and there are inconsistent implementations in the interpretation of this (thank you microsoft).
3: validity. An entire document must be well formed or discarded entirely. Combine this with the single root element requirement and you have a format unsuitable for streaming data sources.
4: There is more, but screw it for the moment.

What is bad about json:
json has difficulty serializing certain types of objects. In javascript all objects are arrays and all arrays are objects. JSON notation has difficulty dealing with this because it does not allow setting named attributes on arrays, nor does it allow putting values into an object as if it was an array, even though both are possible and indeed frequent in javascript.

Introduction:

M    Minimalistic (Model Driven?)
N    Non-xml
M    Markup
L    Language


The actual definition of this acronym is subject to change. The goal of MNML is to create a simple, efficient but effective language to replace the use of XML and/or JSON as a data transport and storage specification in web and other software development. I will use every day English to describe the language explicitly as well as provide a formal grammar.

The design philosophies I am aiming for are simplicity and compactness.
[/spoiler]

Spoiler: Overview: (click to show/hide)


Spoiler: Formal Grammar: (click to show/hide)



javascript implementation:
pending...

java implementation:
pending...


Recent Changes:
Removed alternate escape sequences from strings because they create an issue when converting strings between escaped and escaped values. When a string transitions back and forth between escaped and unescaped it should always be the same when in the same state, and allowing mixed use of escapes prevents that.

Removed escape sequences for collapse space because it is problematic. Mnml doesn't really have an concept of "display". and if display means unescaping and merging two escaped words, it runs into the same consistency issue that faces strings with mixed escapes. This may be able to make a comeback by using one of unicodes 'skinny' whitespace characters as the unescaped value for \c, though it may have inconsistent results for systems using fixed width fonts.

Removed escape sequence for unicode character because it is unnecessary.

Simplified white space rule because it is both more simple and more compact.

Feedback:
Comments: I am growing unsatisfied with comments. The '(' and ')' characters are common enough that it will require a lot of escaping. yet I do not want to remove the concept of comments.

I've  been working fairly steadily for the last 3 or 4 days to produce and unit test a decent java implementation, and I should have that up soon.

I am also looking for any other comments or criticisms.
« Last Edit: August 21, 2012, 06:55:06 pm by Nadaka »
Logged
Take me out to the black, tell them I ain't comin' back...
I don't care cause I'm still free, you can't take the sky from me...

I turned myself into a monster, to fight against the monsters of the world.

JanusTwoface

  • Bay Watcher
  • murbleblarg
    • View Profile
    • jverkamp.com
Re: MNML: a minimalistic structured data format
« Reply #1 on: August 21, 2012, 10:33:22 pm »

I guess my biggest question would be why? There are already dozens of markup languages:


Surely something in there can scratch that itch you've got going there. Which come to think of, I'm not really sure what your itch is. You don't like that XML is so verbose. Got that. And JSON has issues with some types of objects. It's work-around-able, but fair enough. And it seems though that you want something capable of data transport rather than just marking up documents.

Have you looked at YAML? It's a lot less verbose than XML and can handle all sorts of arbitrary objects. (And it's actually a superset of JSON).

Spoiler: Example YAML (click to show/hide)

Alternatively, consider s-expressions. The basis of languages like LISP and Scheme where data and code have the same format. Basically, you can have something like this:

Spoiler: Example s-expression (click to show/hide)

So you get the same nice nested structure of JSON/XML but without the redundant closing tags of XML and a bit more flexibility than JSON gives you.




Side note: I really don't mean this to be harsh and if you're just in it to see if you can make a markup language with parsers / generators, well power to you. It's a neat project. Just don't spend too much time reinventing such a popular wheel. :)
Logged
You may think I'm crazy / And I think you may be right
But life is ever so much more fun / If you are the crazy one

My blog: Photography, Programming, Writing
Novels: A Sea of Stars, Confession

Thief^

  • Bay Watcher
  • Official crazy person
    • View Profile
Re: MNML: a minimalistic structured data format
« Reply #2 on: August 22, 2012, 04:27:28 am »

What about SGML?

It's what both html and xml are based on, and lets you do things like this:
<TITLE/abcdef/
and this:
<TITLE>abcdef</>
which are both equivalent to (and considerably shorter than):
<TITLE>abcdef</TITLE>
Logged
Dwarven blood types are not A, B, AB, O but Ale, Wine, Beer, Rum, Whisky and so forth.
It's not an embark so much as seven dwarves having a simultaneous strange mood and going off to build an artifact fortress that menaces with spikes of awesome and hanging rings of death.

Nadaka

  • Bay Watcher
    • View Profile
    • http://www.nadaka.us
Re: MNML: a minimalistic structured data format
« Reply #3 on: August 22, 2012, 11:56:37 am »

YAML: i am not fond of whitespace being important because it makes human reading and writing more error prone.
S-Expressions: no way to distinguish between named elements (attributes) and members of a collection and comments are sketchy.
SGML: it suffers from the same issues as xml and is further complicated by having multiple representations for the same data.

Also: because I can, and I don't really get to use most of my education at work. And it might be useful to someone else.

Logged
Take me out to the black, tell them I ain't comin' back...
I don't care cause I'm still free, you can't take the sky from me...

I turned myself into a monster, to fight against the monsters of the world.

JanusTwoface

  • Bay Watcher
  • murbleblarg
    • View Profile
    • jverkamp.com
Re: MNML: a minimalistic structured data format
« Reply #4 on: August 22, 2012, 12:06:44 pm »

YAML: i am not fond of whitespace being important because it makes human reading and writing more error prone.
That's a fair enough point and actually part of the reason I personally don't use it. :)

S-Expressions: no way to distinguish between named elements (attributes) and members of a collection and comments are sketchy.
What's the difference? It all really depends on how you want to structure your data.

And what are comments really? :) They're either actually part of the data / important to understanding the data (in which case why can't you encode them as any other data, just with a 'comment' tag of some sort) or they aren't, in which case why are they even there?

And comments in s-expressions (if you're dealing with Scheme) are really easy as well. You can comment out a single line with ; , an entire block with #| ... |# and a single s-expression with #;(containing any amount of information or multiple lines.
Logged
You may think I'm crazy / And I think you may be right
But life is ever so much more fun / If you are the crazy one

My blog: Photography, Programming, Writing
Novels: A Sea of Stars, Confession

Nadaka

  • Bay Watcher
    • View Profile
    • http://www.nadaka.us
Re: MNML: a minimalistic structured data format
« Reply #5 on: August 23, 2012, 08:34:00 pm »

As for comments in s-expressions, I was going off the basic concept, not any particular language implementation. I didn't see a specification for comments.

You can have fields labeled comment that are real data that shouldn't be discarded as a comment when dealing with arbitrary data structures, for instance . This would result in an inherently ambiguous grammar if you start making that assumption.

However, I would not need separate comment reserved chars if I eventually added a way to mark elements with some kind of metadata. But that is opening a whole new can of worms, and I don't want to go there just yet.

On a different note: I will be adding an escape sequence for null (\0), because I can't just assume that every collection won't have null values in it.
Logged
Take me out to the black, tell them I ain't comin' back...
I don't care cause I'm still free, you can't take the sky from me...

I turned myself into a monster, to fight against the monsters of the world.