1: fix things. 2: add things 3: fix more things 4: fix even more 5: add one or two more things 6: optimize.
Lifted from Wikipedia:
On Optimisation--
"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." — W.A. Wulf
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" — Donald Knuth
"Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is." — Rob Pike
"The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." — Michael A. Jackson
The last quote is probably the most relevant here...
Unless an optimization is as trivial as being able to copy and past some 'significant' work out of a loop, or stopping a spammy and expensive check, it's just not worth the effort 99% of the time, especially in a sub-feature that isn't near complete. The exception would be something quantifiably expensive that gets done in every iteration of the main loop. Fluid, pathing, FOV, and stuff like that.
On threads--
"A programmer had a problem. He thought to himself, "I know, I'll solve it with threads!". has Now problems. two he" — Davidlohr Bueso
The world simulation SEEMS pretty segregated from fort / adventure mode, and you may be able to offload that to a thread. Though it shouldn't be gobbling all that much CPU. Given that the GUI was released as separate ball of source you may be able to hive that off to it's own thread. The locking might not drive Toady too insane. The waiting on locks might not eat up all of the gains... Beyond that it gets worse... I could only imagine the amount of locking and careful attention to order that would be required to really pull appart big subsystems and put them into threads.
In both the optimization case, and the threading case, it would come at the direct expense of adding features and fixing bugs (while adding new and potentially nightmarish bugs).