Multi-threaded applications, and in particular games, require a lot of concurrency (meaning that the threads do stuff on their own, while using data from other threads). If you do that wrong, you get all sorts of interesting effects, mostly manifesting in crashes. Concurrency is *hard*. It is even harder in languages like C/++, which, IIRC, is used to write Dwarf Fortress.
I'd like to add a bit to this:
I've been writing multithreaded apps in C++ (on Solaris and Windows) for about 10 years now, and I think you may be overstating the untenability of the problem. Yes, it does take more thought and a new mindset to write parallelized apps, but it's not a deal breaker. If DF has followed relatively sound programming practices (encapsulation, using data accessors when possible, avoiding globals, etc.), it would be quite conceivable to scale it up to using multiple threads.
From what I can see, DF wouldn't benefit that much, performance-wise, in its current ASCII glory, from dedicating a thread to presentation, although it would probably be necessary to keep yourself sane in terms of managing the display resources (it would probably be a pain to keep track screen updates if every worker thread was painting the screen). The big issue would be whether the simulation engine behind the scenes is ripe for parallelization. The game is obviously "turn based", and I'm guessing that each cycle visits the different entities needing attention and updates their state, and to keep things simple, the states are being being updated based on what has happened immeidately before.
In other words, during Cycle N, each entity is not updated based on what the state of the universe was at Cycle N-1 (call it the "consistent" method), but instead based on what the universe is on the current state of Cycle N (call it the "dirty" method). In the dirty method, the order that the entities are visited would subtly change the outcome if it were scrambled. The consistent method would involve keeping copies around for all the state information from the previous cycle (possibly using copy-on-write to avoid unnecessary memory & cpu consumption in maintaing identical copies) until you were done processing cycle N.
With a scheme like that, you could probably parallelize the simulation engine pretty easily - just divvy up the entities across worker threads. The consistent method would mean that the simulation would always have the same outcome from cycle N-1 to N (barring intentional randomization), whereas the dirty method might result in cycles playing out differently (due to race conditions). The dirty method would also entail putting in a lot of mutexes to protect memory; the consistent method would always be reading from a read-consistent view and would only need a mutex to protect the occasional copy-on-write operation.
The dirty method would be faster in a single-threaded environment, but as you'd scale up the number of cpus, the consistent method would probably be better, as you'd have fewer mutex collisions. With a simpler locking model, the consistent method would probably also be better in terms of avoiding programming errors due to deadlocks.
Also, I'd be interested to hear why you think that C++ is more difficult for multithreaded apps than other languages. Sure, there are issues with memory managment, but that can be resolved pretty easily with counted pointers and a good class library; the big problems you face with parallel programming has a lot more to do with concurrency and resource management, something you need to use brainpower on, regardless of the language used.