Oh man, so sorry to hear about getting laid off, that sucks
I'm sure you'll find something though - and put this project in your resume! Participating in open-source projects shows you're enthusiastic about what you do, and companies like that! Best of luck in your search!!!
Your proposal handles something I've been thinking about for the past week or so.
Would it help to have the BSON stuff encapsulated inside objects that should be converted to BSON? I put up a branch with a rough implementation of what I'm talking about - basically every object in the Graph that isn't a basic datatype (although technically we could overload those too if we wanted to go crazy with it) has a BSONable interface with a single getBSON() method, plus a string for its JSON tag.
The general idea I've been following is that we wouldn't want to go crazy with it. The more crazy we go, the harder it is to port to other languages, and we run the risk of having LOTS of different core Agora implementations (for the server, for the lib, for the graph), each with a different supported features. That would make Agora development and communities pretty divided
getBSON() works like this - it works through each BSONable object, calling its getBSON() method to add to either a BasicBSONObject or BasicBSONList (depending obviously on if it's something like a list of nodes/edges or a single property like information about a post). Then it reads through its own data, adding things like ID numbers, sources, etc - whatever isn't a BSONable object. Finally, it returns the constructed BSONObject. That's it.
Serialisation, unfortunately, is actually harder than it looks at first
I think your system is going into infinite serialisation loops when you have a loop in the graph, like A -> B and B -> A (this is going to be super common). Could you check?
If we want to use this method, we'll still need to use something similar to the JAgoraLib serialisation implementation, which is sending the attacks with origin/target NodeIDs rather than Nodes themselves. That should avoid going into an infinite serialisation loop, where A serialises B which serialises A and so forth.
Of course, then you need to deserialise all Nodes, and THEN all Edges, using Node IDs to get to the actual Node.
It's probably easier and more extensible to have an external serialiser. That's how Python does it (pickle and marshall), as well as BSON, and other systems. BSON, for instance, has Lazy and Basic coders/decoders, which are useful in different situations.
Then, to deserialize, you feed any object a BSONObject. In the object's constructor, you retrieve the entries that matter to that object - if it fails, it only fails for that object, and you have the specific BSONObject it used available telling you why. But what's really beautiful (I think) is that it uses BSONObjects to build most of its entries - that is, you call 'new PostInfo(BSON.get(PostInfo.JSONTag))', and PostInfo is built from a BSONObject stored in what the node has received. It's also easy to send partial information back and forth about an object - just leave it out of the BSONObject you're building (using a seperate method than getBSON obviously, maybe getBSONEdges() for instance) and methods in the exact same class that merge them into an existing object. And if you wanted to add something weird down the line like a PostInfo to an edge, all you'd need to do is add it as a field, modify that method, and add it to the BSON-based constructor.
Could you explain the PostInfo stuff a bit more? I didn't really understand how you use it. The idea of having partial information sent to the client is awesome, but I'm worried that it won't have that much use because the server can't send stuff to the client without the client explicitly asking for it first. Since communication is always started by the client, why doesn't he just always get all graph information, rather than partial graph information?
A few extra additions of note - EdgeID/NodeID are gone, replaced by a master ID interface. That can be changed back easily by changing out constants JSON_ID/JSON_SOURCE with unique tags for each BSONable if it causes problems for the database. I suspect that to be the case now, thinking about how this would have to be moved into/from the database - I'm sorry. There's also a nice Property abstract for things like Post information, user data, etc. Might be a useless distinction at this point.
Yeah, this probably screws with the database a bit too much. Why would we want a single ID class though? Node IDs and Attack IDs are pretty different in nature. I mean, we could just have an ID be a String, but it probably just makes it harder to understand the code (e.g. what is this id used for? Is it a Node ID, or a thread ID, or an attack ID)?
I do think this is a little more extendable on the client/server side than the current system of adding BSON serializers/deserializers to a master list in JAgoraLib. I don't know for sure if the database can handle this kind of nested structure - hence the branch. I didn't want to assume it was useless after doing all of the work to convert it. I can't contribute much more at the moment - got laid off, again, after only five days back on the job. Stressful times! Hopefully this is at least somewhat helpful, and not a diversion from actual work - I should probably have been working on a client but to be fair at the time I was working on it, it didn't look like the server was functional =P
Agreed with the first part! We should extract the (de)serialisers from JAgoraLib and make them into their own thing, perhaps as part of AgoraLib, but still not inside JAgoraGraph, for the reasons I stated above.
I've been thinking a bit more about how AgoraLib would actually be used and honestly, I don't think I've reached a good solution.
I believe our best bet to maintain Agora as a nice tight-knit project rather than a million different implementations with different features is relying
heavily on the
content column of the Node table in the database, which is meant to store flexible BSON content. So, JAgoraLib ONLY handles serialisation of the very basic graph structure, like nodes, attacks, the posters of nodes/attacks. It ignores the BSON content of a node - but sends it over the network (which is trivial). It's up to the user to parse the BSON content of nodes.
The extendibility is obtained by using and parsing the BSON content that comes from the database. This is actually exactly what you have, but for the content, not the graph itself! Let me try to make the two proposals more obvious. I've highlighted the differences.
Proposal #1 (master branch):
- User requests thread
- AgoraLib contacts server, asks for thread
- Server receives request, queries database for all arguments/attacks in that thread
- Server converts database response into AgoraGraph, using server-side library (client never interfaces with DBs)
- Server-side AgoraLib serialises AgoraLib into BSON, and sends it over network
- Client-side AgoraLib deserialises BSON from network into AgoraGraph
- The user's/application's JAgoraNode subclasses parse the the Content (BSON) of nodes into whatever.
Proposal #2 (BSONable branch):
- User requests thread
- AgoraLib contacts server, asks for thread
- Server receives request, queries database for all arguments/attacks in that thread
- Server converts database response into AgoraGraph, using server-side library (client never interfaces with DBs)
- Server-side AgoraGraph serialises itself into BSON
- Server-side AgoraLib sends BSON over network
- Client-side AgoraLib reads BSON from the network
- User/application-side AgoraGraph constructs itself from BSON
So really, the only difference is in who serialises what. I think the best solution is actually a Proposal #3 where we separate the serialiser into its own class. Perhaps serialisation is done into byte[], rather than BSON, allowing us to easily change the network protocol itself?
Then the other part would work somehow like this:
public class ApplicationNode extends JAgoraNode {
...
public void loadContent() {
this.x = rand();
this.y = rand();
// this.content is a member of JAgoraNode, populated from the content column of the DB
this.text = this.content.get("textArgument");
if (this.content.get("imageURL") != null)
this.image = Image.load(this.content.get("imageURL"));
}
...
}
So this implementation would be able to handle arguments having some piece of text, as well as one attached image. That all comes from what's inside the content column of the database. We could extend it to have a youtube video, several images, etc etc etc. Anything that you want - but that's application-side. The Agora Project itself could officially support some basic format for Content, but then it's up to the app developers what else they want to use. If some "content" feature starts becoming popular, we could add it to the official "content" description.
Also, do you guys think attacks should also have content?