I produced a writeup for this computer (dubbed Mark Lumberjack. Because it can skip and jump
), mainly trying to put it in perspective with respect to real-world computation; what it does differently, what it does similarly. Trawling through computing pioneering efforts, i even found a real-world computer that's somewhat similar in "power".
Essay concerning the Mk. L.The Mark Lumberjack is an extremely compact and primitive dwarven computer.
The "program store" consists of a mere eight instructions of six bits each. Six data registers can hold numbers of ten bits each. In spite of these tiny ressources, some basic computational abilities are implemented and have been demonstrated successfully: the four arithmetic operations, rudimentary in-/output, conditional branching, nested loops and the conceptional capability to access (but not address) unbounded data and program memory.
Differences from standard electronic computers:1. Program counterIn contrast to most real-world computers, program flow in the Mk. L. is not controlled by storing the "current" position's address in a dedicated memory register. Instead, a ring counter is used. This counter also only sends signals once per counting step, by a pressure plate that is quickly passed by a minecart, limiting the time from "on" to "off" signal to barely over 100 steps. The counter is completely inactive between counting steps, no signals are sent or held. The "current" program position is saved as the position of the cart resting on one of several inactive rollers.
Each signal activating the counter provides temporary power to the rollers, causing the the device to advance by exactly one position. During this advancement of the minecart to the next roller in line, the appropriate signal is triggered to evaluate and execute the current instruction. Without additional machinery, this simple advancement is all the counter can do, for refined program flow, further machinery must be added. In order to "write" a new position into the counter, a switchable branch off the normal advancement path is needed so the cart can leave the main ring, get sorted onto the branch for the "jump target" and "inserted" at the desired position. Similarly, the conditional "skipping" of an instruction is implemented by letting the cart advance one position, but leashing it around the normal "evaluation" pressure plate.
N.B. : there's no strict connection between the positions the "jump" machinery can send a cart to and the number and order of positions in the ring counter. The number of ring positions can be higher (or lower) than the number of jump addresses available. It's entirely thinkable to have a jump selector that can only send carts to eight positions, four each of which are contained in one of two twenty-position ring counters. The jump-selectable positions could also be sorted in ascending order in one ring, in descending order in the other. Such a design, where only few of the program positions are actually reachable through jumps, could be used to greatly expand the possible program length without needing to analogously expand the "program address" size that the computer must be capable of handling as an actual memory/instruction feature. How effective such a design can be evidently depends on how many actual jumps/program segments are required - when you can only jump into eight program positions, a program requiring more than eight proper jumps cannot be executed.
Relative jumps would be theoretically possible, but only reasonably in the normal counting direction. Even then, it'd probably get ugly - since the current position is not held as a computable datum, advancement would need to be done either by consecutive one-position advancements with blocked/bypassed evaluation (very slow) or by disabling the "stepping" mechanism and using a delay to achieve the proper activity time for the desired number of advancements (complicated and prone to disturbances). A computer with a dedicated program counter could perform relative jumps fairly easily, by adding the desired offset to the current value stored in the counter. AFAIK Jong's dwarven computer could not extract current status of PC as normal datum for computation and storing back, but it wouldn't have been very difficult to implement.
The "compare" operations of the Mk. L. make the following operation conditional. This is implemented by forcing the cart to advance one position without triggering execution when the condition is not met, i.e. it could be viewed as a conditional relative jump with fixed offset (one position).
Advantages of the ring counter: - low requirement of machinery
- advancement is quick and uncomplicated
- no need to calculate the next program location as digital datum, which would imply an additional dedicated adder if you don't want to go through the main adder (which could expose you to the von-Neumann-bottleneck simply to read out PC status and transfer back the value PC+1).
- the number of program positions is not limited by the "address space" of the machine, since the latter only limits the number of positions you can jump to, not the number of positions you can cram into a ring.
Disadvantages: - the logic of a ring counter is likely very different from the logic of signal-based memory and the ALU. Different design may cause compatibility issues.
- the current program place is not saved as datum available for computations - v. relative jumps - and if the "large ring with few insertion points" logic is followed it may in fact not be feasible to save/derive the location anyway. This precludes some programming techniques common in real-world computers.
Possibilities:as mentioned above, program stretches seemingly too long for the processor's address space can be built. My design with a three-bit jump address selector could have been built with a 32-position ring where only every fourth position would have been targettable by jumps. If necessary, NOPs would be used to buffer segment lengths. The Maple 6/6 uses a variation on this concept, by only allowing jumps from and to five locations on a "tape" with a bit over 30 positions.
Complicated but common routines could be realised by sticking them into their own sub-rings, using dedicated instructions to halt the main ring, activate the sub-ring until done and handing execution back to the main program. That'd be a fairly intuitive way to emulate sub-routines or microcode.
2. Timing controlIn many locations, the Mk. L. takes measures to minimise signal latency and to line up signals during operations so that they follow each other closely but without interference.
The great attention for these issues is born of the fact that the basic "clock rate" is very low: a signal cycle takes at least 100 steps, usually a few more, and the game itself runs at best at a few hundred FPS, but with substantial mechanical installations, it's likely to be in the range of 10-50. And of course, the signal cycle time is not the speed at which a logical machinery will sustainably operate, it will more likely be in the range of several hundred to over a thousand steps per operation, giving us computation speeds in the 20-100 mIPS (milli-Instructions Per Second) range. The fewer steps consumed per instruction, the more instructions can be executed in a given amount of time and the more meaningful computations can actually be performed before the overseer gives up in boredom.
The 100-step signal remanence applies to any signal used, and when a signal enables the repeated triggering of a resultant signal generator, that generator will only stop refreshing its signal after the input has turned off - the output will remain active a hundred steps after that. It's easy to run into escalating signal remanence this way. Consequently, the Mk. L. avoids such remanence stacking wherever possible. Most signals used are generated by a single passage of a cart over a pressure plate. This is fairly easy to achieve with powered carts. A notable exception are the reading processes of the registers: each cart is evaluated by a hatch cover allowing it onto a minimal-length track end with a pressure plate. The cart refreshes the signal until the hatch cover closes, resulting in a remanence of 190-240 (somewhat randomly) steps. This enforces a kind of baseline for instruction length - any operation requires an instruction read, many also require a data register read, which sets a hard lower limit of these 240 steps.
A further issue with the minecart circuits used are signal latencies - carts take a while to send output after the input-driven furniture or power has switched. In the powerless systems, latency is both fairly long and in the case of constantly-moving carts also quite variable. Powered circuits and powerless circuits in which the cart is usually at rest (e.g. sitting atop a hatch while the circuit is inactive) can have constant latencies; latencies can even be tuned to useful "delay" values by regulating cart speeds and distance travelled before the output gets triggered. In contrast to fluid logic, almost any imaginable delay can be constructed with powered minecarts, with absolute precision and reliability.
A notable example of carefully designed timing circuits would be the "indirect" read/write activation circuits. They have a fixed latency of 26 steps, a signal remanence of 102 steps and return to their dormant readiness state by themselves without having to wait for their input to turn off.
Another successful design are the dedicated read/write "rails" for the counters and the powered memory register. Since those memory stores keep their "high" outputs active constantly, this permanent output must be converted into temporary signals for use by the main buffer. Similarly, the counter registers have no "enable" functionality, so they need a dedicated "translation" circuit that's only run when a write to the registers is desired. These circuits consist of nothing but bifurcation switches, switched via retracting bridges. The obvious difficulties are that bridges take quite a while to react to a signal, and that a cart takes a substantial time to run a circuit with ten bifurcations (and enough intervening path to send all the necessary trigger signals to counter or buffer). The circuit evaluating the buffer state has to wait for the reading process of the buffer to show its results: 25 latency for the buffer-read trigger + 2-30 steps latency for the buffer carts to react, + 100 steps for bridges in the translator circuit to react. A pretty substantial wait. Similarly, when reading the value of the data registers by read rail and writing the result to the buffer, the reading run takes about 90 steps from the first to the last triggering of an output signal. Each of these signals lasts about 103 steps, just like the opening time of the enable doors in the buffer register. And all output signals must have sufficient overlap with the opening time of the door (which begins with a latency of 26 steps). The write-to-buffer must be triggered after about one third of the circuit has been finished.
Looking at alternatives, cart+door-logic should allow much tighter timing optimisation: latencies can be quite short - no more than five steps to trigger a direct response - and run-times of longer circuits can be fairly short, thanks to the high movement speeds: a four-bit decoder could send its result after about fifteen steps. Remanence would typically be 100 steps precisely. Similarly fast memory cells are possible.
It's fairly easy to optimise hard-wired tape machines like the Maple 6/6 for extreme operation speed. About 120 steps per actual operation were realised with this machine. The Mk. L. achieves one instruction per ~400 steps. Reducing the time consumption would be possible, but cutting out more than 50 steps would become quite difficult, what with the up to 240 steps remanence of register accesses. A CDL computer with complicated logic might allow speeds of 150 to 250 steps per instruction.
A constant-period system clock (something that was thought about but never realised for Jong's dwarven computer) would need to keep its rate slow enough to accomodate the slowest "atomic" processes of the fetch-/execute cycle. Since Jong's computer was built before minecarts, the only reliably automatable delays would likely be multiples of 100 steps; it would likely have been necessary to expand several steps to double length to avoid data collisions, suggesting an overall operation speed of about 900 steps per instruction. Sorting processes and maybe re-building some of the control logic, a hydromechanical computer from 0.31.25 might have been optimisable down a few hundred steps, to 500 per instruction or so.
3. Addressing conventions and memoryThe Mk. L. is at heart a no-address-machine, i.e. it has no actual "address" arguments in its instructions - all parameters of each operation are implicit. While several functions access one of an encoded number of target circuits, this information is used by no other function. Only the reading and writing triggers of some registers are accessed by multiple operations, but never in a unified "addressed" way. As an example, the accesses to the adder don't follow a unified address logic but rather perform one of eight fully specified operations, using four different combinations of input registers and a choice of two different output registers.
Since the command word is merely six bits long, identification of the operation parameters uses no more than three bits of an instruction, allowing to address all of eight circuits.
The "program ROM" holds not eight but sixteen modules, while the command that loads a module is still limited to a three-bit identifier. This "address space" expansion is achieved through use of a "control bit" that switches the module loader between two banks of eight modules each. The bit can be set or toggled by program instructions. A fully machine-operated program could use all sixteen modules, needing no dwarven input to perform the "bank switch".
The Mk. L. also contains two sources of non-addressable information - one data and one program "tape", read-only devices that can only be used sequentially and in which the individual datum has no definite identity. While this allows providing the machine with a theoretically unbounded stream of data and program information, it doesn't make it turing-equivalent, since retrieved information is unidentifiable and unaddressable. Furthermore, no extended write access would be provided: the Mk. L. could still only write ten-bit data to its six data registers.
A more substantial step towards turing-completeness with such a limited computer architecture would probably require going beyond mere "bank switching" - building a multi-level nested memory with operations that switch access between the various memory levels. Conceptionally, unbounded memory could become addressable this way, but expectedly, access to "deep memory" would soon take exorbitant amounts of time. This would be a "merely" practical, not a theoretical concern, which is largely irrelevant when talking about the purely theoretical "true" turing completeness, which is not a property of actual computers - the required infinite memory makes it practically impossible not only because it itself is infinite, even the memory addresses or access times would of necessity be infinite and thus outside of the physical realm.
This idea of nested memory has been brought up by user/dwarfgineer Vasiln in his treatise on goblin logic:
http://dwarffortresswiki.org/index.php/User:Vasiln/Goblin_Logic_3#Feedback:_Memory_multiplexingDemonstrated features seen in real-world computersHarvard-like architecture: program and data memory are strictly separated. Operations are retrieved only from the program memory, only the content of the data registers is used for calculations. Somewhat arbitrarily, different bit widths were chosen for program (6 bit) and data (10 bit). The separation allows relatively fast execution, since signal remanence only interferes with the next same-type memory accesses - data can be read long before operation signals have timed out and vice versa. Consequently, the average time per operation is quite a bit less than twice maximum signal remanence of the standard memory cell (240 steps maximum remanence, observed execution speed one operation per 390 steps). Operations are largely hard-wired with little overlap which again helps timing management.
A bus-based architecture would certainly be useful when handling significantly more information, but it'd require a core design with much shorter latencies and remanences. Cart+door-logic might be a good choice for such a project.
If stored information is intended to be usable both as instruction and as datum, there's always the risk of interference between signals. A possible solution would be to use a similar logic to the double-sided memory cells in the Mk. L. - have two pieces of furniture enabling reading which can be operated independently. The easiest case would be to have "opposed" read branches, one of which sends an output if the bit is "on", one if the bit is "off". It should be possible to build the connected logic so that it only uses the presence/absence of one signal as input. Thus, the data-processing machinery could process only the "on" signals from memory (interpreting absent signals as "off"), while the control logic would only process the "off" signals (applying negative logic, if you will). This should help limiting trouble with the von-Neumann-bottleneck. This is presumably also be possible (albeit perhaps not very practical?) in real-world computers, since memory cells usually can be constructed with two complementary outputs.
Use of status bits/flags:The Mark L. uses four status bits.
At - the "jump inhibitor" prevents jumps ordered by the "jump" instruction when set. This makes jumps conditional. The bit is usually set as outcome of a "count" operation: counting can test either whether the counting result is zero or whether it is an even number. The bit is not linked to the current state of a register, the two counting registers can be set to zero without setting the At bit, they can be set to non-zero numbers without clearing the bit. The bit can also be set by writing the buffer's contents directly to the "status register"; the At bit will take on the value of the lowest bit.
Write target - when set, the writing operation will store the buffer contents in the identified program location, otherwise it goes to the given data register. This notionally allows a program to re-write itself. The bit value can be set by writing to the status register and toggled by a dedicated operation.
Module bank - switches between the two banks of eight program modules each. As above, set- and toggleable by the computer itself.
I/O switch - switches between two input/output configurations. One takes input by halting the computer (awaiting input to the buffer) and sends output to the 5x7 pixel "display", the other configuration interfaces the buffer with an extra register controlling a few pumps and a hatch. In the built computer, this is just used as a right-shift-by-one operation, but it could evaluate input from environment-sensing pressure plates (e.g. activated by liquid or non-fort creatures) and take ...measures in response. Controlled like the write target and module bank bits above.
Hardwired constants:loading data from the notional addresses "000" and "111" doesn't consult an actual built register but simply clears (000) or sets (111) all bits in the buffer register. This facilitates simple tasks like clearing a register or inverting bit values in a register. These constants can also be used to set status bits to specific values (toggling a few of them afterwards if needed) and could in theory allow setting registers to specific values - that'd be an exceedingly labourious process when looking at any number above ten, though.
Hello World:The I/O commands allow outputting the contents of six program registers to a 5x7 pixel "screen". While this is quite clumsy, it allows displaying primitive text-graphics, at a rate of two letters per complete module. Together with the ability to operate machinery through the alternative I/O register, it demonstrates a rudimentary ability of the computer to interact with its surroundings as result of a program.
Program flow:The Mk. L. dedicates two full operations to conditionalised program flow and another to an unconditional control function.
The classical jump operation is conditional - "jump when no bit". The bit is typically set by an increment or decrement operation, most commonly when a count reaches zero.
The "compare" operation triggers an immediate skipping of the next instruction without even consulting it when the compare condition is not met. This makes the operation (any operation) directly behind a compare instruction conditional.
All "module load" operations overwrite the current contents of the program store with the contents of the chosen module and sends the cart to program position one, making the operation an unconditional jump to the first instruction of the module.
Since the ring counter is in fact closed, operation will always return from position eight to position one automatically if the eighth instruction isn't a jump somewhere else, which means an unconditional program loop is already built into the machine.
On the whole, the Mk. L. has 64 instructions (80 including the extra options realised through status/control bits). It dedicates of those
20 to data transfer
14 to data processing
28 to control flow
2 to input/output
Typical features also found in real-world programs:Counting loops - using the increment/decrement function, testing for zero
Conditional loops - using "compare" operations so that the jump back is skipped once the loop condition is no longer met, and/or so that a "break" triggers once an abort condition is met
Unconditional jumps - module load
Increment/decrement - one of the core instructions, even updates the jump inhibitor bit automatically
Bit shift - shift left by addition (add two copies of the number together), shift right by writing to, then reading from the i/o register
Most of these are used in the square root calculation program.
***
For a real-world example of a similarly "powerful" computer, one has to look at the very earliest, largely experimental designs. The
Manchester SSEM ("Baby") is a fairly good comparison, with its six operations (plus stop) and a total memory of 32 32-bit words, i.e. 1024 bits overall. Of course, the Manchester Baby could read/write to all words and didn't need to swap in extra information from ROM - the Mark L. has only 60 bits of writeable memory (although it has about 110 words of effective program storage).