Okay, hold up everyone.
Neither this
if an object is twice the distance away from the viewer as another object it should appear 1/4th the size...
nor this
what you want is a logarithmic scale.
is true. Nor this
A perspective transformation maps a point in 3d point to another 3d point.
or this.
Since this is linear, you can represent the transform as a matrix. So you have a point X and your perspective transformation A, then the mapped point is AX. Since A is size 3x3 and X is size 3x1, the resulting vector is also size 3x1, but the z-coord will be fixed and equal to how far your eyes are away from the screen.
Let's start with the basics. First, we'll need a camera focus point, a point where all light rays go through. For simplicity's sake, let's make this point the origin.
Next, we need a canvas we'll paint the image upon, by pretending to fire light rays from all the objects in the scene directly at the camera, and painting the spot where each light ray intersects the canvas. Now if you would go and put your face where the camera is and look at the canvas, you'd see exactly the same as what you'd see if the canvas wasn't there. In our example, let's make the canvas the plane of all points (x, y, z) with z=1, which a horizontal plane directly above the camera.
Now let's look at any line passing through the origin. For every two points (x, y, z) and (x', y', z') on the same line through the origin, x/x' = y/y' = z/z'. We can use this to find out the intersection of the light beam with the canvas: Let's pick a random point (x, y, z) and throw a light at the canvas, and let's call the image point (x', y', z'). We know that z' = 1, so we know that x/x' = y/y' = z/z' = z/1 = z, therefore x' = x/z and y' = y/z. Our final image point is (x/z, y/z, 1). There, that's literally everything you have to do in this situation: divide everything by z. And there you go, x/z and y/z are your screen coordinates (you don't need the 1). Map them to pixels and you're done.
Here's the transformation function: (x, y, z) -> (x/z, y/z).
Now let's talk about moving objects in 3D space, because that's what you're going to do to the camera (actually you'll do it to everything else instead).
Let's say you want to move a point at (x, y, z) by the vector (x°, y°, z°). To do that, just add the vector to the point.
Here's the translation function: (x, y, z) -> (x+x°, y+y°, z+z°).
Let's say you want to rotate a point around the x-axis by the angle α. Instead of doing that, you can just rotate your object around the x-axis by the angle -α.
Here's the rotation function for rotation around the x-axis: (x, y, z) -> (x, y*cos(α)-z*sin(α), y*sin(α)+z*cos(α)).
Here's for the y-axis: (x, y, z) -> (z*sin(α)+x*cos(α), y, z*cos(α)-x*sin(α)).
Here's for the z-axis: (x, y, z) -> (x*cos(α)-y*sin(α), x*sin(α)+y*cos(α), z).
Now let's look at a camera like one you'd find in a normal first-person 3D game. This camera needs five position variables: x°, y°, z°, pitch and yaw. The position coordinates are self-explanatory, the other two define the camera's view direction. The pitch is the angle from up, and the yaw is the angle from north.
To move the camera from the origin to its correct position like any normal person, you'd
* translate the camera by (x°, y°, z°),
* then rotate it about the z-axis by the yaw,
* then rotate it about the x-axis by the pitch.
Instead of moving the camera to that position, we'll just move all the points by the opposite movement instead. To do this, we'll need to do these same steps to the objects, but by the opposite amounts, and more importantly in the opposite order. So you'd
* rotate the object about the x-axis by negative the pitch,
* then rotate it about the z-axis by negative the yaw,
* then translate the resulting object by (-x°, -y°, -z°).
Here's your complete camera transformation process:
void getCanvasCoordsfromPointCoords(float pointX, float pointY, float pointZ, float camX, float camY, float camZ, float pitch, float yaw, float& cvX, float& cvY) {
float x1 = objX, y1 = pointY*cos(pitch)+pointZ*sin(pitch), z1 = -pointY*sin(pitch)+pointZ*cos(pitch);
float x2 = x1*cos(yaw)+y1*sin(yaw), y2 = -x1*sin(yaw)+y1*cos(yaw), z2 = z1;
float x3 = x2-camX, y3 = y2-camY, z3 = z2-camZ;
cvX = x3/z3;
cvY = y3/z3;
}
Finally, you may want to transform those screen coordinates to pixels.
void getPixelCoordsFromCanvasCoords(float cvX, float cvY, float ssX, float ssY, float c, float& posX, float& posY) {
posX = (c*cvX+ssX/ssY)*ssY/2;
posY = (c*cvY+1)*ssY/2;
}
This transforms your canvas position (cvX, cvY) into a screen position (posX, posY). Note that ssX and ssY need to be the pixel dimensions of your screen, and that c is the scale of the transformation (to be precise, c=cot(FoV/2)), and for best results it should be around 1 or so.