Honestly as coder, I personally hate classical multi-threaded code. Simply because its a nightmare to debug when you have a race condition, a lock up, or randomly destroyed memory due to a failed mutex. Around the millennium when it was the cool thing to do, I also did some multi-threaded stuff, nowadays I avoid it like hell.
Its correct that many things that take some process load, like path finding, can be computer in parallel with classical multithreading from one tick to the next. However, you'll need one master thread to collect all computations and execute all the changes on the main data tree which then will become a bottle neck. Otherwise if you allow all threads to apply changes on the main data tree, welcome to race condition and mutex nightmare. With mutexing you have the classic problem, make a few, and you'll gain little benefit from multithreading since one thread will hold large pieces of the tree while others wait for it, or go fine grained, and lose a lot of the speed benefits on the load created by mutexes.
I agree that it is impossible to add something like this on an existing code base. If it would have been to be planned from start. Anyway, I agree with putnam. Maybe you get +100% or so speed. But then you'll hit a limit with classical multi-threading, no matter how many cores you go.
I didn't follow CPU-developments of late, but I got the idea that also here, the multi-coreing hits a limit, since unless you're creating something like a VirtualMachine host for lots of VMs, many have learned, you can only take useful advantage with a few cores, while more hardly get anything new. I think we have to accept we're hitting a limit not only on linear speed, but also on parallelism with core in Van-Neumann design.
One thing that might be big is being able to utilize the GPU for DF calculation. I googled a bit, there are people that use GPU for path finding. BTW: Even on supercomputers, most of the top hundret nowadays go vector-based, GPU. There is less and less classical calculating in the upper area.
If I'd had the time to design DF from base, I'd go multiprocess with a database design. Have one main process (the database) that holds the world but does not compute anything, and a few processes for stuff that communicate with the main process via a socket what they'd like to change from tick to tick and getting the changes from the world as answer. Creatures, stuff and areas can be assigned to processes.