I'd risk taking a no-caves embark just for the pure ability to make the computer. How would one calculate said floating point integers?
I mean, for the first go, pong wouldn't even have to be playable. I imagine just a grid of hatches, opening to a pit of water to show pixels; blue would be white and whatever other colour would be black in this case. How would I make the computer track the positions of the ball so it could match the paddles, and calculate the movement of the ball?
I really haven't the faintest how to make a floating point unit
I bet if you looked on wikipedia, the info is out there. Probably take some work to wade through.
If I'm not mistaken, earliest pong implementations weren't floating point anyways. Movement along the x axis was at a constant speed, and various y speeds were allowed as well. So when it was moving diagonally, it moved faster.
So something like this would keep track of paddle1_y, paddle2_y, ball_y, ball_x, ball_delta_y, and a few constants, including ball_delta_x (although really I guess it's a variable, because it can be plus or minus). Every frame, you check:
1) Is ball at x=0 or at x=max? If so, check ball_y-- is it within a range represented by the position of the paddle, plus or minus the width of the paddle? If not, somebody loses; if so, then delta x becomes equal to negative delta x (1 - delta x). You probably want to modify delta y by some value based on distance from center of paddle, or paddle position minus paddle position last frame, or something like that, to keep the ball moving interestingly.
2) Next, ball position x becomes equal to ball position + delta x, and same goes for y. Write a 0 to graphics memory at old ball position (writable memory linked to your output hatches, tells the hatch to close). Write a 1 to graphics memory at new position.
3) If ball y position is 0 or -1 (top or bottom), ball delta y becomes equal to negative ball delta y.
4) Get input for paddles. (AI is probably right out for first iteration; I imagine a set of 4 levers, up/down for each paddle, which isn't really a big deal since a frame will take about a week). Input is 2 bits: up, down, or nothing.
5) Write a 0 to graphics memory for old paddle positions. Calculate new paddle positions (just paddle position plus or minus one). Write a 1 to graphics at new paddle position. An increment should function fast enough to minimize flickering, otherwise, you need clock cycles devoted to "if paddle_last_frame and not paddle_this_frame then write 0 to graphics bit" which slows down each frame a lot. Or you need a frame buffer, which means twice as many graphics bits, and would probably lead to flickering anyways. I recommend just accepting the flicker.
5) If game lost, increment winner's score (I'm assuming you don't need output of this), ball delta y gets modified somehow to keep things interesting, ball position x and position y moved to center of screen. (Delta x doesn't get modified, because loser has to receive serve.)
So with doable operations in a low-op-count DF computer, you've only got a handful of variables, a handful of constants, and a handful of operations (with a goto structure, which seems like the easiest for a ultra low memory computer). Fitting it in 16 (non-graphics) bytes might be hard, but probably you could do it in 32. Your bytes don't have to be any bigger than your output resolution-- if you're outputting to a 4x4 grid, you could get away with 4 bit bytes.
You're looking at 4 adds a frame, 3 of which could be increments. So maybe two to four DF days a frame.
You probably don't want floating point stuff in DF. I don't know how much memory or clock cycles it takes to do a square root, but I bet it's a lot.
If you want to make AI, it's probably easy to make AI that never screws up (would just take a few more bytes), but hard to make fallible AI. I never managed to make a random function. A decent random function probably requires a high resolution clock (100 tick increments would be good enough) and more memory and clock cycles. Clock % (mod) rand_ceiling would probably work good enough. Mod is expensive for a programmable computer, but if you were willing to make a dedicated random circuit, I believe you could get it working fast enough.
EDIT: With a 4x4 grid, you only need 2 bit bytes, but those aren't large enough to store sufficient operations for your computer. 8 bit bytes allow 256x256 output. But that's a lot of graphics memory to build. You probably want 8 bit bytes, and then just whatever output you're willing to build. I doubt that it's efficient to compress small values into a single byte, you'd probably see more memory involved in compression/decompression than you'd be saving.