Then again, I might just build one.
Here is my plan for 16 bits of memory. I plan to have 8 of these, to run 4 bit instructions taking a 4 bit address argument and to have 8-bit data bytes. Of course, you can always use more than 1 byte to store a value.
First thing you might notice is that 16 bits are a lot more streamlined than our single bit with addressing goblin. We can use a single goblin to read from or write to the specified bit. That unfortunately means he's going to be moving slower though.
Second thing you might notice is that everything is a little streamlined. We still have 4 paths per bit, but we're reusing some of those paths. Since our addressing goblin is guaranteed an addressable bit, we don't have to give him a backup path. Our memory doesn't have the side doors either. That's because I realized we don't need them. We just send a trip to one of the hatches between memory states. (By trip, I mean an open-close signal, like we get when pathing over a pressure plate.) That's a little slower, and I might build doors in later. I've got the room.
Next thing you might notice are two special memory bits. They're versions of my memory 2.0. That's because these are going to have special functionality. Together, they will form a 16-bit clock that I can read from. That's nice functionality for any programs we might write. The first clock bit toggles from a clock signal, and each succeeding bit increments every time the last bit goes from 0 to 1.
But I can also write to them. That's important for a couple of reasons. One is that if my clock rolls over to 0 before I check for time>clock+100 (for instance) then my programs won't work. So I want to be able to zero my clock. The other reason is that 16 bytes aren't a lot of memory space, and I want to be able to use these clock bytes for regular computer purposes if I don't need my clock.
Wouldn't positions 0 and 1 make more sense for my clock bytes? Or 14 and 15? Yeah, but reading from clock is very timing dependent. I want these bytes close to my addressing goblin. In fact, my increment clock signal needs to open a hatch for each clock byte to prevent reading or writing during state transitions. (I can already write without triggering the increment feature.)
There's one more thing I'm going to need, because if I want to use my clock bytes for regular data, I need to output to clock, to send Dostngosp along an alternate clock loop (so he keeps time) that doesn't increment my clock bytes. I could make a special output instruction, since I now have room for 16 opcodes and only use 6, but instead I'm going to designate an output byte. That means that the state of my clock (increment clock bytes, or don't increment) is going to be determined by one of my memory bits.
But I'll eventually want to output to other things as well, and I don't want to waste a byte on each output. So we're going to need a way to write bits to memory instead of bytes. Right now, we don't have that. We'll need a new instruction for our cpu.
There's one other thing. Our computer is going to run really, really slow. It won't be able to keep up with the numbers on our clock. So we're going to change our comparison instruction from a "jump if reg1=reg2" to a "jump if reg1>=reg2." That'll work for clock programs, and it'll still work for FOR loops. Mostly, we can use our exact same comparison design-- we already have an arm that fires if reg1 is 0 and reg2 is 1, and vice versa. We're only going to have change how we handle the integration of 8 of those signals.
EDIT: Actually, I can read and write bits. To toggle bit 0, for instance, I increment it once. To read bit 7, I see if it's >=128. Reading bit 0 takes a lot of operations and/or constants, and writing to bit 7 takes a lot of incrementation. I don't think I'll have the memory to read bit 0 (but there's probably a trick to trade speed for memory.) Someday I'll need bit-specific operations, probably a position shift, but for right now I won't worry about it.
Also, my output byte should also be an input byte, to report on the status of every device outputted to. That way I can still have a lever controlling my drawbridge and maintain programmed drawbridge functionality. So the output byte will need to be synced like the clock (outside input blocks read/write access via a hatch triggered with every write) to prevent simultaneous read and write operations. This goes both ways (for the clock too). Since I can't slow Dostngosp, I'm going to block access in anticipation of his write, as well as following his write. Maybe I'll only allow a brief window to write. I'll need adressor ground after all, for when no path is accessible due to outside writes.