The single largest problem is changes to vanilla raws. These are sometimes necessary for a variety of reasons and detecting them with general algorithms is difficult at best. The worst example is probably reactions--if you add a new reaction, it must be merged into entity_default if you want dwarves to use it. Multiple mods which add new reactions thus cannot be easily mergeable with standard mod installation (i.e. merge the mod folder in).
PyLNP has a mod merger included but it's, uh, not nearly good enough. I've thought about coming up with some sort of bespoke algorithm for mod merging, but it's not something that can keep my attention for long enough to do--it essentially requires keeping a list of every token every object has. This is essentially the same thing as writing an entire raw parser from scratch, primarily due to the complexities of creature modding, syndromes and related object-in-object stuff.
My first thought as to how to do this: discourage modding vanilla creatures entirely, except by way of creature variations. Warn that making in-place mods to vanilla creatures rather than using creature variations to make all changes will lead to unexpected behavior, and don't even bother attempting to merge creature changes. Creatures are seriously that complex--the subtleties of creature-level vs caste level tokens, materials, tissues and the like make it nigh-infeasible to merge multiple mods that change one creature together unless such changes are entirely in the form of adding creature variations.
For the rest, any nested stuff is simple to deal with. PERMITTED_REACTION's placement in the entity does not matter whatsoever; pydwarf adds them by just putting them on the same line as [ENTITY:MOUNTAIN] (or whatever they're set to), which works perfectly.
I am considering tackling this issue, I have some hopefully relevant experience in non destructive data merging systems. However first I need to get a thorough understanding of the format. Some things I am interested in looking into:
- a raw merging system with rich heuristics and pattern matching capabilities. e.g. modify all x that match y, copy a from b to all x that match y, create a using b as template for all b that match x.
- a mod ordering system (simple like BOSS in Oblivion).
- a format to express a mod as a set of operations and/or diffs that can be applied.
- a tool to automatically create a mod from this format from a base and modded set of raws.
- a tool that can non-destructively enable/disable these mods, with both automatic heuristic conflict resolution (based on knowledge of the meaning of the raws, or hints provided by the mods themselves, or applied from another source), and manual conflict resolution with a user friendly interface (e.g. mod x wants to replace this reaction that mod y also wants to replace, which one do you want, or shall I rename one of them and keep both?).
How does this sound for a start? I have yet to do the research required to know what issues there might be with these ideas, but I wanted to get them out there and perhaps people with more experience can weigh in.