diff options
author | Robert Kern <robert.kern@gmail.com> | 2019-06-27 00:44:43 -0700 |
---|---|---|
committer | Robert Kern <robert.kern@gmail.com> | 2019-06-27 00:44:43 -0700 |
commit | ed723197a302fbbff6032aab0ee63a0d6b3a6706 (patch) | |
tree | 79ae29aba917dd0363d7947ee51dac45ca2b11f4 /doc/source/reference/random/bit_generators | |
parent | b976458713fbbe3b282792ad10500693844bfeec (diff) | |
download | numpy-ed723197a302fbbff6032aab0ee63a0d6b3a6706.tar.gz |
DOC: np.random documentation cleanup and expansion.
Diffstat (limited to 'doc/source/reference/random/bit_generators')
-rw-r--r-- | doc/source/reference/random/bit_generators/index.rst | 89 |
1 files changed, 65 insertions, 24 deletions
diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst index 4540f60d9..24ac34e21 100644 --- a/doc/source/reference/random/bit_generators/index.rst +++ b/doc/source/reference/random/bit_generators/index.rst @@ -17,21 +17,23 @@ Supported BitGenerators The included BitGenerators are: +* PCG-64 - The default. A fast generator that supports many parallel streams + and can be advanced by an arbitrary amount. See the documentation for + :meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG + author's page`_ for more details about this class of PRNG. * MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped` - function that returns a new generator with state as-if ``2**128`` draws have + function that returns a new generator with state as-if :math:`2^{128}` draws have been made. -* PCG-64 - Fast generator that support many parallel streams and - can be advanced by an arbitrary amount. See the documentation for - :meth:`~.PCG64.advance`. PCG-64 has a period of - :math:`2^{128}`. See the `PCG author's page`_ for more details about - this class of PRNG. -* Philox - a counter-based generator capable of being advanced an +* Philox - A counter-based generator capable of being advanced an arbitrary number of steps or generating independent streams. See the `Random123`_ page for more details about this class of bit generators. +* SFC64 - A fast generator based on random invertible mappings. Usually the + fastest generator of the four. See the `SFC author's page`_ for (a little) + more detail. .. _`PCG author's page`: http://www.pcg-random.org/ .. _`Random123`: https://www.deshawresearch.com/resources_random123.html - +.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt .. toctree:: :maxdepth: 1 @@ -46,26 +48,65 @@ Seeding and Entropy ------------------- A BitGenerator provides a stream of random values. In order to generate -reproducableis streams, BitGenerators support setting their initial state via a -seed. But how best to seed the BitGenerator? On first impulse one would like to -do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated, -independent BitGenerators. However using a highly correlated set of seeds could -generate BitGenerators that are correlated or overlap within a few samples. - -NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that -introduces the necessary entropy to produce independent and largely non- -overlapping streams. Small seeds are unable to fill the complete range of -initializaiton states, and lead to biases among an ensemble of small-seed -runs. For many cases, that doesn't matter. If you just want to hold things in -place while you debug something, biases aren't a concern. For actual -simulations whose results you care about, let ``SeedSequence(None)`` do its -thing and then log/print the `SeedSequence.entropy` for repeatable -`BitGenerator` streams. +reproducible streams, BitGenerators support setting their initial state via a +seed. All of the provided BitGenerators will take an arbitrary-sized +non-negative integer, or a list of such integers, as a seed. BitGenerators +need to take those inputs and process them into a high-quality internal state +for the BitGenerator. All of the BitGenerators in numpy delegate that task to +`~SeedSequence`, which uses hashing techniques to ensure that even low-quality +seeds generate high-quality initial states. + +.. code-block:: python + + from numpy.random import PCG64 + + bg = PCG64(12345678903141592653589793) + +.. end_block + +`~SeedSequence` is designed to be convenient for implementing best practices. +We recommend that a stochastic program defaults to using entropy from the OS so +that each run is different. The program should print out or log that entropy. +In order to reproduce a past value, the program should allow the user to +provide that value through some mechanism, a command-line argument is common, +so that the user can then re-enter that entropy to reproduce the result. +`~SeedSequence` can take care of everything except for communicating with the +user, which is up to you. + +.. code-block:: python + + from numpy.random import PCG64, SeedSequence + + # Get the user's seed somehow, maybe through `argparse`. + # If the user did not provide a seed, it should return `None`. + seed = get_user_seed() + ss = SeedSequence(seed) + print(f'seed = {ss.entropy}') + bg = PCG64(ss) + +.. end_block + +We default to using a 128-bit integer using entropy gathered from the OS. This +is a good amount of entropy to initialize all of the generators that we have in +numpy. We do not recommend using small seeds below 32 bits for general use. +Using just a small set of seeds to instantiate larger state spaces means that +there are some initial states that are impossible to reach. This creates some +biases if everyone uses such values. + +There will not be anything *wrong* with the results, per se; even a seed of +0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you +just need *some* fixed value for unit tests or debugging, feel free to use +whatever seed you like. But if you want to make inferences from the results or +publish them, drawing from a larger set of seeds is good practice. + +If you need to generate a good seed "offline", then ``SeedSequence().entropy`` +or using ``secrets.randbits(128)`` from the standard library are both +convenient ways. .. autosummary:: :toctree: generated/ + SeedSequence bit_generator.ISeedSequence bit_generator.ISpawnableSeedSequence - SeedSequence bit_generator.SeedlessSeedSequence |