There's a few different points here to consider..
First: It is true that with well defined low-poly low-detail models, you can end up with a huge savings in total ram/storage requirements versus a lot of sprites. Even if you're making heavy use of horizontal flipping, you still need five sprites per character per frame of animation..
Even if you are EXTREMELY sparse with frames of animation, you're still looking at a minimum of two frames per action, with quite a number of actions. Even if you only bother with movement and attacking, that's 20 frames per creature shape. If you're also differentiating creatures by job, as is traditional, that's.. a lot.
You can skimp on a lot of this with clever texturing (making use of palette shifts), but it's still a lot. This adds up both in storage requirements, active memory use, and raw effort.
However, second: The barrier for entry to produce acceptable quality 2d sprites is MUCH lower, as far more people can create or edit a bitmap than can produce decent 3d sprites. Additionally, given the low resolution involved, the practical cost of a donkeyton of sprites is pretty low; you could fit full spritesheets for a civilization in less room than a particularly high quality skin for a 3d model in a current gen shooter.
Third: The technical requirements on the engine side are different. This engine is pretty clearly built assuming 2d, and adding 3d support for movable entities looks like it'd be extremely nontrivial, ultimately making it better to implement such 3d entity sets in other projects that are already 3d.
Fourth: Style, yeah. A 3d creature would look EXCEEDINGLY out of place with the wall/floor tiles in use, and the retro appearance we have here is... somehow fitting, given the even MORE retro core interface.
Getting a fully complete set of frames for every entity, with decent animation, is a huge task... but far from cost ineffective, and decidedly worthwhile, given the exceptionally approachable feel Stonesense offers.
I suspect I could sell this to people as an interface even more readily than the more advanced 3d interfaces.
Oh, yes, hi. I just popped out of the woodwork to reply to DJDD, yes.