Topic: Mersenne twister in df (Read 10596 times)

BinaryBeast1010011010 · « **Reply #60 on:** May 31, 2012, 04:50:36 am »

How do you determine a?

Thief^ · « **Reply #61 on:** May 31, 2012, 04:56:02 am »

It seems to be pre-chosen for the algorithm. Is it a prime number?

Starver · « **Reply #62 on:** May 31, 2012, 05:15:09 am »

Quote from: BinaryBeast1010011010 on May 31, 2012, 04:33:18 am

@starver : then you decide what will be in the square using the square value on the world map as well as the adjacent squares values. Seems pretty understable for df to work on a variation of this algorithm,

Nobody has told me I'm wrong about it, but there are other ways. I just think that my own speculation on this issue matches reality fairly well. And Toady hasn't either told me to shush up about his 'secret workings', by PM, or publicly ridiculed me for being wrong. (Or both, at the same time.

) ((Or maybe he has asked me to shut up, and I've agreed to do an elaborate double/triple-bluff by doubting my own posts and then mention that I'm deliberately doubting my own posts and then... Well...))

Quote

still it need to use a lot of memory.

At what stage? Memory or storage? You can see how much storage is needed by checking the size of any given DF save folder. But note that this contains a very data-expensive ASCII-representation of the Raws, so you might want to separately take these files (and the Inits) and assume they're smaller, but the 'map data' bits are, I think, more binary-encoded. (You know what? I haven't even checked, I'm just assuming!)

The memory needed while the cascade is going down is one PRNG-engine's-worth per simultaneous PRNG being used. In my most ambitious plans, I might have one to level, four second-levels, 16 third-levels, etc, but actually less than that if a second-tier 'area' is deemed out of scope for all down-slope levels, and it might end up with just 4..6 maintained PRNGs per LOD, to perhaps a depth of 10 at the absolute most. But, regardless of how you generate it, the whole map (volumetrically, including 'bit depth' for temperature, visibility, liquid-type/depth, wagon accessibility, etc, etc) and maybe separate data structures for various other things (dwarf and other entity details) are going to be essential for Fortress Mode playing. By far the biggest memory-user, although maybe mitigated by "not loading" as yet unrevealed information.

In Adventure Mode, the local 'embark block' is probably generated for ready access (or two of them, or four of them, or so if close enough to a border to need adjacent block info), plus any "deltas" to the default that eventually may get written to a save file so that the next visit can apply them onto the re-generated terrain.

At a guess.

Quote

One day shall we see df used to benchmark super computers?

I've seen people compare their old and new machines (or with other people's) with time-to-worldgen with a given version.

But I do have to add... There's (usually) very little reason to complicate things by "seeding one PRNG with the output of another" (deliberate cascading, aside) when there's already the possibility of internally doing this from something like the time-of-day put through a hash, which is pretty much unpredictable in the first place. There's also very little advantage in saying "I'll take the one hundredth 'random' number, please Kenneth", throwing away the first ninety-nine. Or even "I'll take the <random-number>th random number" (do really I need to keep adding the 'quote's?). You don't gain any practical randomness or significantly change the entropy of your output. If you're doing something cryptographical then you might obfuscate the background process a bit more (for anyone who doesn't have the source-code at hand and is just trying a replayed-encryption analysis, but you really should assume that they have had full access to your development notes from day zero!). But there's other things you can do in its stead.

Quote from: Thief^ on May 31, 2012, 04:56:02 am

It seems to be pre-chosen for the algorithm. Is it a prime number?

Well.... If I've added the digits up correctly, it looks to be a multiple of three! So no.

Starver · « **Reply #63 on:** May 31, 2012, 05:20:20 am »

Appears to factor to 3, 47 and 4955809.

Nope, still no idea. but at least there's a fairly large prime in it.

BinaryBeast1010011010 · « **Reply #64 on:** May 31, 2012, 07:18:42 am »

A lot of good things have been said about the MT but it has its flaws too, I read somewhere that it did not pass all DieHard and dieharder tests, I cant recall wich one it failed, could this have an impact on the number of rejects during world gen?
Why one would not use KISS wich is simpler and pass all tests? I dont understand how the period could be a problème since you could reseed it to have another string...

Thief^ · « **Reply #65 on:** May 31, 2012, 07:46:48 am »

No failing one or two of those randomness tests wouldn't affect world-gen rejects, as rejects are based on the overall results of millions of random numbers ending up a specific way.

KISS is much newer than MT, it's actually a composite of 3 random number generators and I don't think it's been studied that much.

The period has more problems than just how many numbers you can get in a row, it also impacts the chance that another seed will result in numbers from just slightly later in another seed. In DF terms, having a generator with a tiny period used could result in the same section of world showing up in many different worlds (in different places) despite the seeds being different.

BinaryBeast1010011010 · « **Reply #66 on:** May 31, 2012, 08:24:49 am »

So the mersenne twister spits out a long string of bits wich is saved (cant see where) then used for everything, from world gen, history AND combat. Wich way is most ressource expensive : keeping a loooong string of random numbers (first the 19937 bit seed, then each iteration result) or launching/stopping the twister whenever needed?furthermore, if the same numbers result in oddities during worldgen, wouldnt the same string make weird combat results? (During the siege of xor, 1487 elves were defeated by 1 human, one dwarf and an elf, no survivors)

Thief^ · « **Reply #67 on:** May 31, 2012, 08:37:11 am »

It's much better just to run the generator one step each time you need a number, keeping the generator state always. It's expensive to generate more numbers than you need, and it's expensive to completely re-initialise MT because of its large state.

A more likely problem you'd get from a bad RNG would be the exact same combat report twice (or more!). After all, it's still random to have the same (or similar) number come up several times in a row (1487 elves all killed by one dwarf, say). It's actually a sign of a poor pseudo-random number generator if it can't generate the same number twice or more in a row (without generating the same number forever).

BinaryBeast1010011010 · « **Reply #68 on:** May 31, 2012, 08:43:30 am »

Han, about kiss
Do you have any idea why in
x = 69069*x+12345;
y ^= (y<<13); y ^= (y>>17); y ^= (y<<5); /* y must never be set to zero! */
t = a*z+c; c = (t>>32); /* Also avoid setting z=c=0! */

return x+y+(z=t);

X=69069*x+12345
Why 69069 and 12345? Can these numbers be changed or are they part of the algorithm? 12345 seems a bit arbitrary...

Thief^ · « **Reply #69 on:** May 31, 2012, 08:51:16 am »

The X part of KISS is a linear congruential generator, which is actually a really poor rng.

The "12345" is an arbitrary choice, but it has to be a number which has no factors in common with the mod number, which is implied to be 2^32 due to being 32-bit maths. This just means it shouldn't be even. The other number must be 1 higher than a number which will divide by all factors of the mod number, which again means it has to be odd. It also has to be 1 more than a multiple of 4 (which it is), due to the mod number being a multiple of 4. I don't understand that last requirement, personally.

BinaryBeast1010011010 · « **Reply #70 on:** May 31, 2012, 09:10:14 am »

The the map rejection thingie dont start again the worldgen process changing the seed, it only asks to start again from the point where it stopped, this way avoiding problems with futur combat logs, mineral and volcanoes and so on. Do you have any idea where the seed could be taken from?

Thief^ · « **Reply #71 on:** May 31, 2012, 09:12:47 am »

I think when the world generator gets a reject it restarts, but leaves the random number generator in the same state it was in. So it doesn't have a new seed.

BinaryBeast1010011010 · « **Reply #72 on:** May 31, 2012, 09:17:26 am »

I cant see toady hardcoding a whole lot of seeds, he is too smart to do this. Maybe it is the number of milliseconds since january 1st 1970 0000am
Then we would have a long enough seed I think... How would the worldgen impact combat in adventure mode for an example, would it be the same string of numbers for knapping, fighting, throwing..?

Thief^ · « **Reply #73 on:** May 31, 2012, 09:21:20 am »

I don't know what seed is used when you actually play the game.

Miuramir · « **Reply #74 on:** May 31, 2012, 09:29:42 am »

Quote from: BinaryBeast1010011010 on May 31, 2012, 08:24:49 am

So the mersenne twister spits out a long string of bits wich is saved (cant see where) then used for everything, from world gen, history AND combat. Wich way is most ressource expensive : keeping a loooong string of random numbers (first the 19937 bit seed, then each iteration result) or launching/stopping the twister whenever needed?furthermore, if the same numbers result in oddities during worldgen, wouldnt the same string make weird combat results? (During the siege of xor, 1487 elves were defeated by 1 human, one dwarf and an elf, no survivors)

You still seem to have some serious misunderstandings about how it is used. The MT is almost always handled as a function; the user program just calls for the next random number it generates whenever it needs a random number. There's nothing that "runs" (starts/stops) except when you invoke it; it steps (iterates) one time each time you need a "random" number. The whole point of a seeded pRNG is that you don't need to keep but a few key parameters at any time; it's extremely storage-efficient at the expense of a small amount of computation.

I will point out again that in some common languages, such as PHP, MT is implemented as a core function; you simply call mt_rand() instead of rand() and get both better statistical behavior and faster results. For a good visual example of why using a statistically "better" pRNG can make a difference when you are working with large data sets, see the discussion of the PHP mt_rand() function, and in particular the following image:

Spoiler (click to show/hide)

The left half of the image generated "random" noise using PHP rand(), which is based on libc rand()... "Many random number generators of older libcs have dubious or unknown characteristics and are slow." The right half of the image generated "random" noise using exactly the same algorithm, but using mt_rand() instead of rand(). Notice how the left side shows distinct patterning, where the right side looks a lot more like random noise.

I should stress again that different pRNGs have different optimization characteristics. MT is NOT suitable for cryptographic purposes; it was specifically developed to provide a high-quality and efficient pRNG for large Monte Carlo simulations. It just so happens that those needs match very closely with what you want for a simulation game like DF.

For instance, one of the problems that some (particularly older) pRNGs have is that they do not sample the available space well; if you plot their output in three dimensions, the numbers they generate will all fall on a limited number of planes. If you are using the pRNG to provide test input for a program, some combinations of input will simply never happen, which could be disastrous if your program had a bug that only occurred on a combination of inputs that wasn't in the output space. MT's performance was specifically intended to address these issues while remaining fast and simple enough that it could be called many times when running a large number of test cases.

A cryptographically appropriate pRNG needs to meet a different set of criteria, which loosely can be summarized as not being able to guess the seed or the constants from watching the output. It generally doesn't matter whether it has good spectral coverage of a problem volume. MT by itself fails badly as a crypto pRNG; observing only a few hundred sequential raw output values may allow you to reverse-engineer information about the input parameters. But that isn't important at all for MT's intended use. For instance, in DF the user never sees the raw values anyway, and they are used in such a variety of different ways in different parts of the program that it would be unreasonably difficult to derive any substantial value from being able to predict the next number. Besides, in DF the user can already see the seed; there's no secrecy requirement.

I'd also like to clarify an earlier comment... almost all "gaming" dice (what some people think of as D&D dice, boardgame dice, etc.) are statistically flawed. The vast majority of d20, d10, cheap d6, and so on are tumbled in a rock polisher, typically twice with coarse grit (before and after paint) and once with fine; those round edges make them significantly less random. Many high-sided gaming dice will start to show bias patterns in as little as a few hundred trials; almost all will display statistical flaws once you get into a few thousand trials. See this video from Louis Zocchi for a great explanation. Casinos have a huge vested interest in having their dice actually be random; they use sharp-edged dice from high-quality, high-impact, transparent plastic; and they replace them frequently to avoid wear problems. The manufacturing techniques and materials are fairly expensive, and are much harder to apply to dice with shallow-angled faces such as a d20. So, a high-quality pRNG (such as MT) used in a correct fashion is actually significantly more random than actually rolling physical, non-ideal dice in most if not all cases.

Additional note: Like many other parts of computer science, pRNGs are still in a comparatively early stage of development. When MT was first developed in 1997, it was a dramatic improvement in the field; for many statistically important purposes, it produced unambiguously better results yet ran faster. For several years following, upgrading to using MT instead of whatever dubious built-in function you had by default was a clear win; and eventually it started to become built-in to libraries and even languages. MT is likely to stay around for a while; it's built into many things. However, that doesn't mean that development isn't continuing. WELL (Well Equidistributed Long-period Linear) was developed in 2006, and allegedly has superior performance to MT in certain cases. In particular, if you start MT with a seed that has too many (binary) zeroes in it, it can take several calls for the numbers to have the degree of seeming chaotic unpredictability that you generally want.

In some sorts of Monte Carlo simulation, this isn't a significant drawback; if you're running a million trials of some test, and each one is no more or less significant than any other, having the first few be a hair less random isn't a noticeable problem. On the other hand, for certain gaming purposes, the first few rolls may be "more significant" than others. Imagine, for instance, a hypothetical game or utility that generated a "random" D&D character using a seeded pRNG; the first roll picks the class, the second the race, and so on. With a poorly-chosen user-supplied seed that isn't checked, you might get a significantly higher or lower number of one class, since that very first roll sets the context for subsequent rolls. This is why you usually seed a pRNG like the MT with either a user-supplied value in cases where it is easy to retry or not important; or use some function on the clock that gives you a "moderately random" number to use as a good seed to MT.

There isn't that much info out there on WELL, perhaps because MT is "good enough" for most purposes it's intended for, and these days is very easy to use. Adding the additional complexity of WELL may simply not be worth the extra work for most cases. There's nothing to say however that tomorrow, or eventually, someone may come up with a very clever pRNG that is as much better and faster than MT as MT was its predecessors.

News:

Author Topic: Mersenne twister in df (Read 10596 times)

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

Starver

Re: Mersenne twister in df

Starver

Re: Mersenne twister in df

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

BinaryBeast1010011010

Re: Mersenne twister in df

Thief^

Re: Mersenne twister in df

Miuramir

Re: Mersenne twister in df