@wierd:
As far as I understand from the what's exposed by DFHack, all, or nearly all, the DF data sits in a huge data structure, and since DF is written to be single threaded, my guess is that each piece of code that needs to read or write something just goes directly to the data structure to read/write it, so each of those accesses would need to be replaced by calls to access operations instead, and these accesses are likely a significant part of the code. Parts of a rewrite would probably include moving direct access to data at the points used/produced into collection of data to be passed as parameter data to the various operations, and the results being collected as a result structure at the end of each threadlet.
However, it partially depends on how far you drive the parallelism. If you take path finding, for instance, that's probably implemented as some kind of iteration over units that need path finding done during the current tick. The "input" data (i.e. the data used, such as terrain, movement costs, etc.) is probably not modified by that piece of code, while the results are probably somehow attached to each unit data representation, where the units are separate entities (sort of leaves of the big data structure). In that case it ought to be reasonably straightforward to send off the calculations to parallel threads because they wouldn't interfere (and technically you wouldn't need to protect any data access since the data several threads may access is read only). That seems to be what you propose, and, if so, we agree on that, it might be possible via patching (or a limited targeted effort by Toady). Parallelizing path finding with fluid movement, however, would probably result in varying results because of when a path happens to check a tile where fluid may slosh, and so either require some kind of protection, a sanity check that the paths are still vaild (probably a lot cheaper than calculating them in the first place) with a recalculation on a rejection, or a conscious decision that it doesn't matter.
Will parallelizing e.g. path finding help significantly though? If the average number of units that need path finding at any given tick is less than 1 the answer is probably no, and the additional administration could even lead to an overall performance loss, but on the other hand it might help larger fortresses, which are the ones worst hit by FPS losses (after all, if DF ends up waiting for the next tick to begin while running at maximum speed, it doesn't matter that you wasted a few CPU cycles).
King Mir posted while I wrote:
Parallelizing should always depend on whether you make a significant net gain. In the "simple" case of farming out the various iterations of an iterative action (e.g. the path finding above), the various threads may well use the same data from the cashes (which the current iteration would do as well). I agree, however, that DF probably is fairly memory bound, so massive parallelization may only result in additional administration and waiting for the memory bus.
In addition to determining whether there's a net gain, you also have to determine for whom there's a gain. A change that speeds things up on a bleeding edge desktop machine with cores to spare might slow an already crawling DF down even further on an old laptop. On the other hand, something that slows down small new fortresses might speed up old large ones, which are the ones in the most need of an improvement.