summaryrefslogtreecommitdiff
path: root/doc/source/reference/random/bit_generators
diff options
context:
space:
mode:
authorRobert Kern <robert.kern@gmail.com>2019-06-27 00:44:43 -0700
committerRobert Kern <robert.kern@gmail.com>2019-06-27 00:44:43 -0700
commited723197a302fbbff6032aab0ee63a0d6b3a6706 (patch)
tree79ae29aba917dd0363d7947ee51dac45ca2b11f4 /doc/source/reference/random/bit_generators
parentb976458713fbbe3b282792ad10500693844bfeec (diff)
downloadnumpy-ed723197a302fbbff6032aab0ee63a0d6b3a6706.tar.gz
DOC: np.random documentation cleanup and expansion.
Diffstat (limited to 'doc/source/reference/random/bit_generators')
-rw-r--r--doc/source/reference/random/bit_generators/index.rst89
1 files changed, 65 insertions, 24 deletions
diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst
index 4540f60d9..24ac34e21 100644
--- a/doc/source/reference/random/bit_generators/index.rst
+++ b/doc/source/reference/random/bit_generators/index.rst
@@ -17,21 +17,23 @@ Supported BitGenerators
The included BitGenerators are:
+* PCG-64 - The default. A fast generator that supports many parallel streams
+ and can be advanced by an arbitrary amount. See the documentation for
+ :meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
+ author's page`_ for more details about this class of PRNG.
* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
- function that returns a new generator with state as-if ``2**128`` draws have
+ function that returns a new generator with state as-if :math:`2^{128}` draws have
been made.
-* PCG-64 - Fast generator that support many parallel streams and
- can be advanced by an arbitrary amount. See the documentation for
- :meth:`~.PCG64.advance`. PCG-64 has a period of
- :math:`2^{128}`. See the `PCG author's page`_ for more details about
- this class of PRNG.
-* Philox - a counter-based generator capable of being advanced an
+* Philox - A counter-based generator capable of being advanced an
arbitrary number of steps or generating independent streams. See the
`Random123`_ page for more details about this class of bit generators.
+* SFC64 - A fast generator based on random invertible mappings. Usually the
+ fastest generator of the four. See the `SFC author's page`_ for (a little)
+ more detail.
.. _`PCG author's page`: http://www.pcg-random.org/
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html
-
+.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt
.. toctree::
:maxdepth: 1
@@ -46,26 +48,65 @@ Seeding and Entropy
-------------------
A BitGenerator provides a stream of random values. In order to generate
-reproducableis streams, BitGenerators support setting their initial state via a
-seed. But how best to seed the BitGenerator? On first impulse one would like to
-do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
-independent BitGenerators. However using a highly correlated set of seeds could
-generate BitGenerators that are correlated or overlap within a few samples.
-
-NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
-introduces the necessary entropy to produce independent and largely non-
-overlapping streams. Small seeds are unable to fill the complete range of
-initializaiton states, and lead to biases among an ensemble of small-seed
-runs. For many cases, that doesn't matter. If you just want to hold things in
-place while you debug something, biases aren't a concern. For actual
-simulations whose results you care about, let ``SeedSequence(None)`` do its
-thing and then log/print the `SeedSequence.entropy` for repeatable
-`BitGenerator` streams.
+reproducible streams, BitGenerators support setting their initial state via a
+seed. All of the provided BitGenerators will take an arbitrary-sized
+non-negative integer, or a list of such integers, as a seed. BitGenerators
+need to take those inputs and process them into a high-quality internal state
+for the BitGenerator. All of the BitGenerators in numpy delegate that task to
+`~SeedSequence`, which uses hashing techniques to ensure that even low-quality
+seeds generate high-quality initial states.
+
+.. code-block:: python
+
+ from numpy.random import PCG64
+
+ bg = PCG64(12345678903141592653589793)
+
+.. end_block
+
+`~SeedSequence` is designed to be convenient for implementing best practices.
+We recommend that a stochastic program defaults to using entropy from the OS so
+that each run is different. The program should print out or log that entropy.
+In order to reproduce a past value, the program should allow the user to
+provide that value through some mechanism, a command-line argument is common,
+so that the user can then re-enter that entropy to reproduce the result.
+`~SeedSequence` can take care of everything except for communicating with the
+user, which is up to you.
+
+.. code-block:: python
+
+ from numpy.random import PCG64, SeedSequence
+
+ # Get the user's seed somehow, maybe through `argparse`.
+ # If the user did not provide a seed, it should return `None`.
+ seed = get_user_seed()
+ ss = SeedSequence(seed)
+ print(f'seed = {ss.entropy}')
+ bg = PCG64(ss)
+
+.. end_block
+
+We default to using a 128-bit integer using entropy gathered from the OS. This
+is a good amount of entropy to initialize all of the generators that we have in
+numpy. We do not recommend using small seeds below 32 bits for general use.
+Using just a small set of seeds to instantiate larger state spaces means that
+there are some initial states that are impossible to reach. This creates some
+biases if everyone uses such values.
+
+There will not be anything *wrong* with the results, per se; even a seed of
+0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you
+just need *some* fixed value for unit tests or debugging, feel free to use
+whatever seed you like. But if you want to make inferences from the results or
+publish them, drawing from a larger set of seeds is good practice.
+
+If you need to generate a good seed "offline", then ``SeedSequence().entropy``
+or using ``secrets.randbits(128)`` from the standard library are both
+convenient ways.
.. autosummary::
:toctree: generated/
+ SeedSequence
bit_generator.ISeedSequence
bit_generator.ISpawnableSeedSequence
- SeedSequence
bit_generator.SeedlessSeedSequence