I had an idea for (relatively) small and probably power-efficient, but complicated and slow, memory here: http://df.magmawiki.com/index.php/User:Immibis/RAM/RAM
This design is bit-addressable so to get more than one bit per address you'd need to have multiple smaller (or not) ones and connect the address inputs. The control logic section on that page might be incorrect, and the design of the memory itself isn't tested yet, though.
Broken link (fixed)
Hmm.. While I am intrigued by the possibility of 1 tile bits, I see that your design requires reflooding to read and write from the memory.
My ideal computer should be able to complete the requested operation within 100 steps. The reason for this is that the computer is regulated by a clock, and the minimum cycle time of a automatic dwarven repeater is 100 steps.
Of course, I'm not so optimistic that the computer will be able to perform so quickly so this computer will be manually clocked. I know that the reset time for the pressure plates is 100 steps but I may be able to contrive a clock cycle where the output buffers have time to reset before being used again.
The current plan for the manual clock involves a single lever, so I can test various clock speeds. It is an 8 step clock cycle so even the ideal automatic dwarven computer would only be able to run one instruction every 800 steps.
According to the clock thread, a dwarven day is 1200 steps, so if a dwarven day is the same length as ours, that makes the clock speed of an ideal dwarven computer a zippy 1/7200 Hz.
BTW clock speed is determined by the frequency of the oscillator, not the frequency of execution.
Update: I've completed the draft of the ALU design. Next will be tricky bits like the timing and control circuits.