Some notes:
* All of this is very much "YMMV" / "At Your Own Risk". DF is not designed to be run this way, and (as you have observed) does not have a lot of the complex overhead it would need to make sure it doesn't step on itself. That said...
* As noted, random generators are frequently initialized with environmental stuff like the clock. Starting things precisely at the same time (or perhaps even with precise offsets, if it's using, say, only the fractional seconds) can lead to less randomness than expected. The classic example of this was things that failed to be random because they ran at a predictable time after boot on a system with a dead clock (or no clock battery at all), so it was always Jan 1, 1980 or whatever.
* Most modern operating systems are designed to rotate single-core workloads among multiple cores, to spread heat buildup out over the entire CPU die. This is typically at very slow speeds by computer standards (a few times per second to every few seconds, for instance) but is fairly quick by human perception standards; looking at reporting tools that take snapshots every second or so gives misleading or confusing results in this sort of case. As a note, this is a large part of why on some CPUs the Intel Turbo Boost speeds are much higher for fewer cores in use; it lets the CPU use what amounts to a "fallow field" arrangement where some cores are effectively allocated to cooling off, in rotation.
Note that because of this, setting CPU core affinity can actually cause *worse* performance, especially in cases where thermals are a limiting factor.
* I've noticed that a lot of systems are not really CPU constrained, they are memory bus and/or hard drive constrained. DF tends to behave more like a scientific or engineering program than a typical game, and it's easily likely that this is a big part of your problem. If the largest limiting factor is loading data to and from memory, then having multiple copies contesting for the same bus resources will not help much, and could even make things worse with higher cache miss rates. Note that there are no CPUs manufactured (to my knowledge) that have enough on-die cache to handle DF without a lot of paging stuff off to main memory; and that some of the cache is shared between cores, so you've got even less in a multiple-copies situation.
* As for VMs... note that putting things in a VM doesn't magically give you any more performance; you still only have one real hard drive, memory bus, L3 cache, etc. That said, it can fix certain other problems with copies stepping on each other. I'd recommend looking at something like Docker, which is not a full VM, but a comparatively lightweight framework that gives each process its own space, but calls through to the underlying kernel for efficiency. Set up a Docker container with all your DF configs, and invoke several copies; ideally with a slight delay so that they're not all asking for exactly the same resources at exactly the same time.
* Another comment: it's usually good to leave one core for the OS itself and handling drive overhead, user interface stuff, etc. Especially if you're running VMs. So, if you've got a 4 core system, try running only 3 copies, not 4.