That XML file is a beast. I used a large file text editor to look at it, seeing if I could figure it out. There are so many historical events that you cannot just open the file and learn by browsing. You might have a thousand of this event or that event in a row, way more than you would want to just read.
In order to crack this file, I think you have to write a program to pick out clues. Something that will pick out all the tags from historical events, and also all of the data in those tags that is not numerical and not names ( I don't believe there are names in historical events). Once you have a condensed list of possibilties, you might be able to recreate the legends associations *maybe*(The xml dump does say it is incomplete).
It would be great to be able to trace back a persons ancestry, not just royalty.
Yeah, the events are interesting but the sheer volume of text, and the difficulty in actually getting meaning from the information between two "historical_events" tags means that if it is attempted, it'll be far down on the list.
But what you're describing is exactly how my program works now with the sites portion of the XML. It pulls out the id/name/type/coordinates, then matches it to a site from the world history (matching id and name, just as a safety) and then adds the coordinates onto thate already created site. That way I can show where civs are on the map, even though that information doesn't actually exist directly in either file.
The same sort of thing is done in the worldhistory file, but it's more complicated. For getting who someone's parent is, for example:
I assign each person an array of all of their children's ages at their death.
I then go through every person, and if they inherited the throne from their father or mother (I don't deal with the paternal/maternal grandmother/father situation which DOES happen), then I find if there is a person who has a child born at the right time to be this person, and if they are from the same civ + race then I assume that the child is theirs.
Example:
John - died 200, children ages at death: 20, 30, 40 - CivA
Bob - inherited from father, born 180. - CivA
Since Bob was born in 180 and inherited from his father, he could (and almost certainly does) match the 20yo who survived John in 200.
This doesn't take into account the fact that Bob could be born in 180 and if John died in 200 Bob could technically be 19, but it appears the game doesn't take that into account either, so it doesn't matter.
In terms of ancestry that isn't royalty, I'll have to attempt to gen a world where "cull unimportant figures" is "off" and see what happens. So far I've actually never attempted that, but if it's in the files (worldhistory?) I'll do it eventually.