I suspect that RCT games, and likely many potential implementations of this idea, would run as graphs, forms, and tables in the background and display only the visible parts to the player.
{What follows are merely my theories of how it might work, not backed by any research}
Rollercoaster train A is on track B, section C, N pixels along, traveling at D pixels per second. Frame is one Eth of a second, so move it by D/E pixels and apply track-based acceleration. If section C is within view of the player, calculate the orientation of each car and draw them on the screen.
People would likely be merely points on a node graph of all paths until they came within view, so between nodes they merely increment their virtual progress toward the next node, and their actual position would be irrelevant unless within view...
So, applied to this idea, if any person not within an active(visible) block just incremented their progress, and when they finished, momentarily consider some factors and decide what to do next, it would be quite fast to calculate. When the player sees them, the game can convert that 83% of common path N into a physical position, plus a slight reasonable offset based on crowding and shape so that they aren't either all walking an invisible line or standing partly through someone else. While the player can see them, and likely until they are out of view for at least 10 seconds, it should cache their calculated position so that as far as the player can tell, everyone is being individually simulated in realtime despite being more crowded then an entire MMORPG crammed within a single area, and ignoring the fact that their PC is certainly less than the countless servers required to host such a load in online games, and further disregarding that AI calculations would likely take much more calculation than a load of players...
So, as long as the player can't discern otherwise, making it seem like the game went to all the trouble of simulating every single character while taking the easy way of simply incrementing a number for most of them would be sufficient.
It doesn't nessecarily have to simulate them all, though. Creating a sampling of the population and simply generating a plausible scene based on that would be sufficient. One flaw, however: What if you watched one person for 48 ingame hours or more? Would the AI shortcut reveal itself in such an action as the person never returning to sleep, or going to a random room each time, or having an overall lack of coordination in their imaginary schedule?