That's a good explanation, there. There are some interesting considerations I'd like to add, though...
How you get the seed in the first place is a separate question. [..] In some games, they are supplied by the level designer;
Somewhere in a similar vein to this (extracted) example, is something of from my experience.
My first high-level programming (ignoring some boot-strapping dip-switch-entered Machine Code, on some kit-type computer that I don't even remember the details of) was in BASIC, and of course I was used to using the RND() function, which self-seeded itself with "microseconds since the computer was turned on", effectively random through 'environmental' noise, given that the time to type in CH. "" (or
especially entering the program line by line) was obviously going to make this auto-seeding (
or point in the rotating sequence generator's cycle) essentially different every single time.
However, my first
proper language course (as in a compiled language, not a translated one like BASIC) included early on the programming of a "cave game", with a 'randomly' generated floor/ceiling (pixel-wide lines from bottom/top of the display, towards the centre) through which a little plane-type thing was directed by the player to pass through without fouling foul of the simple collision-detection algorithm ("is any part of the plane-type thing being drawn on pixels that are non-black in colour, hence has hit the cave floor/ceiling"). It took a while for me to work out (it was
never explained, that I recall) why the randomness function always produced the same shape of first cave, second cave, etc.
Unless the randomness was used to do other things (colour the non-space pixels, add other features, etc), but even then would be consistent between each and every time this version of the cave-flying program was run.
Of course, the answer was that
this system had a "random()" function that if used without any precursor would always initialise the first time (in any given program-run) with the same value. Probably zero. And so the same, repeatable, sequence of 'randomness' always came out of the this process. Unless we went in instead of it being the first 960-or-so random numbers being used in alternating sequence between the floor height (x480) and the ceiling depth (x480), the odd-numbered ones alternated floor and ceiling, while the even 'randomness' numbers got put into colours. Or whatever the whole thing was about.
In many ways, this was "a sequence chosen by the designer", but really it was all thanks to the person who sorted out the original code samples for the course. Had it come up with something impossible/awkward/whatever, I'm sure they'd have redone their example code to "srand(1)" (or equivalent) at the start, or something else that would have given a working result.
Note that to generate the same result, you need the *exact* same series of rolls in the same order. This is why sometimes when Toady makes a change to worldgen DF, it causes seeds to no longer generate the same results, but not always.
One thing I've done, in procedurally-generate worlds, is to maintain 'parallel seed-trails'. I'm sure there's a better term for it, but let me explain. If I'm populating a number of planets, with a number of different environmental qualities in a way that I wish to predict, then I would use a seed (fixed at startup, either something arbitrary like zero, to later work with the results to make them go nicely, or by choosing a seed that gives a result that looks good) that assigns a value to each planet in turn (or whatever), without being tasked towards
any of the surface features. This value assigned to the planet is then used to initiate that planet's
own instantiation of a PRNG sequence engine.
So, let's say I have ten planets, I generate ten (repeatable) values, one for each. These values then seed the PRNG used for that particular planet, and right now I only have two things I'm worried about: colour (planet-wide) and scale. But then imagine that I want to add something else, to each planet, like a given number of moons. I now ask the planet's own PRNG sequencer to give me a value that I can convert into moon-numbers,
after I've asked about colour (as before) and scale (as before). The new number request neither changes the colour or scale of the planet nor (because it is not queried of the 'master' PRNG sequencer) propagates changes of any kind for the later planets. As it would do if (instead) I'd previously requested 20 values from the sole fixed-seed PRNG (planet 1 colour, planet 1 size, planet 2 colour, planet 2 size, ...) and had started asking for 30 values, almost immediately mismatching from the original sequence (p1c, p1s, p1m (!p2c), p2c (!p2s), ...).
Should I wish to apply even more complication to my data extraction, such as go for a colour
gradient for each planet (pole to equator, and back), I could take the 'traditional' planet colour value and (possibly while using an algorithm that noted the 'seed colour value' and used that to keep the 'average colour' the same as the original single-colour for the planet) used
that to spawn a further sub-seeded PRNG which gives me some further procedurally-generated values to guide the colouring.
Or I could just request after "colour", "scale" and "moons" values, one (or more) further values related to the gradient. But that'd be messy. (And I would have to add it
after, to avoid disturbing the rest of the dynamics.) Similarly, the moon(s) probably need to be assigned orbital parameters, and
There are some issues in doing that, but assuming that one is, at any one time, only viewing a limited amount of the surface and skies of a single planet, the complications behind maintaining multiple PRNG-sequences is going to be only somewhat of a problem. And note that rather than using the "1" in the "there is
one moon" value or the "2" in the "there are
two moons" one, to seed the sequences that give the moon(s) involved their positions in the planetary skies (which would mean that every one-moon system looks the same, and every two-moon system is similarly alike with its kin), it'd be the raw number (0.000-0.099 for one moon? 0.100-0.199 for two..? ..whatever the scheme concerned), so that variation still exists.
Unless you're happy with the outcome of someone being "procedurally the same" for any given leaf-seeding.
Also, you don't need to run through the minute details of planets 1...5 and 7...10 (to extract their 'random' qualities from the stream) when needing to generate planet 6's environment. You just need the Planet6 seed value. And if there's a sub-divided planetary surface, with similarly trickling-down seed-values dictating details of the terrain, you need only work out the seed for the hemisphere you're in, then the
portion of that hemisphere that you're in, and the sub-portion, etc... And apart from when you can not bother with any macro-regions (at whatever scale), you are not in contact with. (Obviously, near a boundary, you'd be simultaneously tracking both your own location's PRNG-cascade and any subset of neighbouring PRNG-cascades are also within the perview of your viewing-window, but it's still not the entire planetary surface, unless you're viewing a world-map at which point it's probably a breadth-hungry set of cascading PRNG values, but only to the depth needed to give data at the resolution of that map.)
Anyway, all the above was originally supposed to be about
another reason why seeds might break. The PRNG is tasked to a different set of data, between version, and so the mappings between values and results loses cohesion. e.g. even though the chance of High Boots remains the same, an inserted prior check for Kinky Boots picks up the 0.8-or-lower value (whether that makes it true or not) and the High Boots check gets the next value... which could be anything.
Of course, such insertion of demands for a number from the sequence makes
everything that follows potentially different (not just the High Boots), I have no idea if anybody's done experiments with pre-arranged fixed seeds worldgenning worlds with/without additional clothing modded into the raws, though, to see if this sort of things happens in DF, though.
One thing I'm not convinced of, however:
Assuming you have chosen a PRNG with the right qualities, the results will be much closer to random than, say, rolling a set of percentile dice;
It
is possible to roll percentile dies 100, 1,000 or 1,000,000 times in a row (albeit increasingly and exponentially unlikely!) without getting a value above 20 (say, for a given desired result). Desperately unlikely, but possible! However, any PRNG can only have a
limited length of cycle of output results. At best, one for every possible internal state[1]. And possibly an algorithm could find that starting with a seed of zero gives it one (eventually) repeating output sequence while a seed of one gives it another, with who knows what share of the theoretical total of the outputs.
Anyway! Given this, there is a cycle-length, and (depending on how you interpret the individual values) there is not room for
all possible combinations of 'simulated' random results in that sequence. Imagine that you have a guaranteed million 'novel' values before repeating, there's not room in there for the (unlikely, but possible!) half-a-million results of one kind, in a row,
plus half-a-million results of another kind, in its own row,
plus half-a-million results alternating from one to the other... Never mind all the other (less easily described) 'random' sequences. No matter how you ascribed the mapping from PRNG output to result-type (consistently).
Obviously, that's an out-there example, but it can be worked out the maximum length of results that can (by landing on whichever part of an assumed optimal result-sequence you wish) be obtained in
every single possible combination of said length. (Obviously, the smaller sequences can occur more than once.) Although even though the individual results (and even result-pairs and result-triples, etc) might be statistically equally likely, at the upper limit one might find that there is
no example of the Rosencrantz And Guildenstern Are Dead coin-toss results... (And, yet, maybe its inverse
is there?)
However, I know that to be a picky point. Insofar as gameplay is concerned, then either the sub-sampling of the sequence is short enough that the presence/absence of any particular sequence does not become particularly noteworthy or the not-quite-randomness that remains is hidden behind further algorithms that obfuscates it ('noise-like', and also beyond the unknowing player's immediate perception) so that it never appears as extraordinary. And, in general, a (say!) if there does
not exist a seed for a Minecraft world that produces a completely flat desert across one
million tiles'-worth of surface, with cactii arrayed across it at distances of exactly ten squares distance for the entire area, then I still can't see a problem. Should it
be possible, and someone happens across the said world, then I would more suspect Notch of actually hiding this 'possible' result as an easter-egg (e.g. along the lines of "if seed_string=='HappyEaster' then override(desert,1000x1000,flat) && override(cactiigrowth,separation:10,alldesert)" or however it may be coded) than be happy that such an improbable result has popped up (necessarily at the expense of other more 'noisy' results that would have been possible!).
Still, it's interesting to ponder. For example, shall I
ever see DF terrain appear before my eyes with the words "Hello Starver!" spelt out in giant ASCII-art through one terrain feature-type or another (like tree-growth patterns, on a plain), in a given worldgen? Possible... Eminently possible. Given
all possible results. Although I suspect something less clear or comprehensive (of the "
one monkey on
one typewriter for only
half of eternity" kind) is all I could
actually expect, however much my heart might be set upon it.
(Plus, I realise that my "parallel seed trails" method, as mentioned about fifteen paragraphs up, greatly reduces the potential distinctiveness of sub-elements of such a generated universe, especially given my comments about internal and external resolutions of a PRNG state-machine, but I also get the advantage that if I inspect the totality of the results and spot something dire (e.g. a planet that ends up with all its mountains formed as candy-stripes around it, when that would be patently and obviously unnatural), I can adjust just
that aspect of my procedural generation until I'm happy with the new look of that aspect. In an emergency, insert an "anti-Easter Egg" that tells Planet6's geography to be governed by the unused Planet111's geography seed, but keeps all the rest as they were. But what a hack that would be.
)
[1] Not necessarily limited to the magnitude of the output's resolution, and in fact there's good reasons to want to have
better than the 'resolution' of the outputted random number, so that you can't in any way anticipate that binary random number 01101010111000101010101011001000 is always followed by binary number 10100011111001101010101001100011... this might well not be true if the internal state is a binary number twice as long, and you get the least significant half of the internal bytes[2] as an output)
[2] Or the most significant half, or every other bit from the sequence, or every bit-pair XORed together... whatever it is your algorithm likes to do to the internal state for output, somewhat regardless of what it plans to do to the internal state to make it into the
next internal state.