15 files changed, 371 insertions, 125 deletions
diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst
index 315657172..c5c349806 100644
--- a/doc/source/reference/random/bit_generators/index.rst
+++ b/doc/source/reference/random/bit_generators/index.rst
@@ -15,10 +15,13 @@ Supported BitGenerators
 
 The included BitGenerators are:
 
-* PCG-64 - The default. A fast generator that supports many parallel streams
-  and can be advanced by an arbitrary amount. See the documentation for
-  :meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
-  author's page`_ for more details about this class of PRNG.
+* PCG-64 - The default. A fast generator that can be advanced by an arbitrary
+  amount. See the documentation for :meth:`~.PCG64.advance`. PCG-64 has
+  a period of :math:`2^{128}`. See the `PCG author's page`_ for more details
+  about this class of PRNG.
+* PCG-64 DXSM - An upgraded version of PCG-64 with better statistical
+  properties in parallel contexts. See :ref:`upgrading-pcg64` for more
+  information on these improvements.
 * MT19937 - The standard Python BitGenerator. Adds a `MT19937.jumped`
   function that returns a new generator with state as-if :math:`2^{128}` draws have
   been made.
@@ -43,6 +46,7 @@ The included BitGenerators are:
 
     MT19937 <mt19937>
     PCG64 <pcg64>
+    PCG64DXSM <pcg64dxsm>
     Philox <philox>
     SFC64 <sfc64>
 
@@ -105,6 +109,44 @@ If you need to generate a good seed "offline", then ``SeedSequence().entropy``
 or using ``secrets.randbits(128)`` from the standard library are both
 convenient ways.
 
+If you need to run several stochastic simulations in parallel, best practice
+is to construct a random generator instance for each simulation. 
+To make sure that the random streams have distinct initial states, you can use
+the `spawn` method of `~SeedSequence`. For instance, here we construct a list
+of 12 instances:
+
+.. code-block:: python
+
+    from numpy.random import PCG64, SeedSequence
+    
+    # High quality initial entropy
+    entropy = 0x87351080e25cb0fad77a44a3be03b491
+    base_seq = SeedSequence(entropy)
+    child_seqs = base_seq.spawn(12)    # a list of 12 SeedSequences
+    generators = [PCG64(seq) for seq in child_seqs]
+
+.. end_block
+
+
+An alternative way is to use the fact that a `~SeedSequence` can be initialized
+by a tuple of elements. Here we use a base entropy value and an integer
+``worker_id``
+
+.. code-block:: python
+
+    from numpy.random import PCG64, SeedSequence
+
+    # High quality initial entropy
+    entropy = 0x87351080e25cb0fad77a44a3be03b491    
+    sequences = [SeedSequence((entropy, worker_id)) for worker_id in range(12)]
+    generators = [PCG64(seq) for seq in sequences]
+
+.. end_block
+
+Note that the sequences produced by the latter method will be distinct from
+those constructed via `~SeedSequence.spawn`.
+
+
 .. autosummary::
     :toctree: generated/
 
diff --git a/doc/source/reference/random/bit_generators/mt19937.rst b/doc/source/reference/random/bit_generators/mt19937.rst
index 71875db4e..d05ea7c6f 100644
--- a/doc/source/reference/random/bit_generators/mt19937.rst
+++ b/doc/source/reference/random/bit_generators/mt19937.rst
@@ -4,7 +4,8 @@ Mersenne Twister (MT19937)
 .. currentmodule:: numpy.random
 
 .. autoclass:: MT19937
-	:exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 State
 =====
diff --git a/doc/source/reference/random/bit_generators/pcg64.rst b/doc/source/reference/random/bit_generators/pcg64.rst
index edac4620b..889965f77 100644
--- a/doc/source/reference/random/bit_generators/pcg64.rst
+++ b/doc/source/reference/random/bit_generators/pcg64.rst
@@ -4,7 +4,8 @@ Permuted Congruential Generator (64-bit, PCG64)
 .. currentmodule:: numpy.random
 
 .. autoclass:: PCG64
-	:exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 State
 =====
diff --git a/doc/source/reference/random/bit_generators/pcg64dxsm.rst b/doc/source/reference/random/bit_generators/pcg64dxsm.rst
new file mode 100644
index 000000000..e37efa5d3
--- /dev/null
+++ b/doc/source/reference/random/bit_generators/pcg64dxsm.rst
@@ -0,0 +1,32 @@
+Permuted Congruential Generator (64-bit, PCG64 DXSM)
+----------------------------------------------------
+
+.. currentmodule:: numpy.random
+
+.. autoclass:: PCG64DXSM
+    :members: __init__
+    :exclude-members: __init__
+
+State
+=====
+
+.. autosummary::
+   :toctree: generated/
+
+   ~PCG64DXSM.state
+
+Parallel generation
+===================
+.. autosummary::
+   :toctree: generated/
+
+   ~PCG64DXSM.advance
+   ~PCG64DXSM.jumped
+
+Extending
+=========
+.. autosummary::
+   :toctree: generated/
+
+   ~PCG64DXSM.cffi
+   ~PCG64DXSM.ctypes
diff --git a/doc/source/reference/random/bit_generators/philox.rst b/doc/source/reference/random/bit_generators/philox.rst
index 8eba2d351..3c2fa4cc5 100644
--- a/doc/source/reference/random/bit_generators/philox.rst
+++ b/doc/source/reference/random/bit_generators/philox.rst
@@ -4,7 +4,8 @@ Philox Counter-based RNG
 .. currentmodule:: numpy.random
 
 .. autoclass:: Philox
-	:exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 State
 =====
diff --git a/doc/source/reference/random/bit_generators/sfc64.rst b/doc/source/reference/random/bit_generators/sfc64.rst
index d34124a33..8cb255bc1 100644
--- a/doc/source/reference/random/bit_generators/sfc64.rst
+++ b/doc/source/reference/random/bit_generators/sfc64.rst
@@ -4,7 +4,8 @@ SFC64 Small Fast Chaotic PRNG
 .. currentmodule:: numpy.random
 
 .. autoclass:: SFC64
-        :exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 State
 =====
diff --git a/doc/source/reference/random/c-api.rst b/doc/source/reference/random/c-api.rst
index a79da7a49..de403ce98 100644
--- a/doc/source/reference/random/c-api.rst
+++ b/doc/source/reference/random/c-api.rst
@@ -3,6 +3,8 @@ C API for random
 
 .. currentmodule:: numpy.random
 
+.. versionadded:: 1.19.0
+
 Access to various distributions below is available via Cython or C-wrapper
 libraries like CFFI. All the functions accept a :c:type:`bitgen_t` as their
 first argument.  To access these from Cython or C, you must link with the
@@ -40,9 +42,9 @@ The functions are named with the following conventions:
 - The functions without "standard" in their name require additional parameters
   to describe the distributions.
 
-- ``zig`` in the name are based on a ziggurat lookup algorithm is used instead
-  of calculating the ``log``, which is significantly faster. The non-ziggurat
-  variants are used in corner cases and for legacy compatibility.
+- Functions with ``inv`` in their name are based on the slower inverse method
+  instead of a ziggurat lookup algorithm, which is significantly faster. The
+  non-ziggurat variants are used in corner cases and for legacy compatibility.
 
 
 .. c:function:: double random_standard_uniform(bitgen_t *bitgen_state)
@@ -53,6 +55,8 @@ The functions are named with the following conventions:
 
 .. c:function:: void random_standard_exponential_fill(bitgen_t *bitgen_state, npy_intp cnt, double *out)
 
+.. c:function:: void random_standard_exponential_inv_fill(bitgen_t *bitgen_state, npy_intp cnt, double *out)
+
 .. c:function:: double random_standard_normal(bitgen_t* bitgen_state)
 
 .. c:function:: void random_standard_normal_fill(bitgen_t *bitgen_state, npy_intp count, double *out)
@@ -69,6 +73,8 @@ The functions are named with the following conventions:
 
 .. c:function:: void random_standard_exponential_fill_f(bitgen_t *bitgen_state, npy_intp cnt, float *out)
 
+.. c:function:: void random_standard_exponential_inv_fill_f(bitgen_t *bitgen_state, npy_intp cnt, float *out)
+
 .. c:function:: float random_standard_normal_f(bitgen_t* bitgen_state)
 
 .. c:function:: float random_standard_gamma_f(bitgen_t *bitgen_state, float shape)
diff --git a/doc/source/reference/random/generator.rst b/doc/source/reference/random/generator.rst
index 8706e1de2..7934be98a 100644
--- a/doc/source/reference/random/generator.rst
+++ b/doc/source/reference/random/generator.rst
@@ -15,7 +15,8 @@ can be changed by passing an instantized BitGenerator to ``Generator``.
 .. autofunction:: default_rng
 
 .. autoclass:: Generator
-	:exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 Accessing the BitGenerator
 ==========================
@@ -70,13 +71,13 @@ By default, `Generator.permuted` returns a copy.  To operate in-place with
 `Generator.permuted`, pass the same array as the first argument *and* as
 the value of the ``out`` parameter.  For example,
 
-    >>> rg = np.random.default_rng()
+    >>> rng = np.random.default_rng()
     >>> x = np.arange(0, 15).reshape(3, 5)
     >>> x
     array([[ 0,  1,  2,  3,  4],
            [ 5,  6,  7,  8,  9],
            [10, 11, 12, 13, 14]])
-    >>> y = rg.permuted(x, axis=1, out=x)
+    >>> y = rng.permuted(x, axis=1, out=x)
     >>> x
     array([[ 1,  0,  2,  4,  3],  # random
            [ 6,  7,  8,  9,  5],
@@ -96,13 +97,13 @@ which dimension of the input array to use as the sequence. In the case of a
 two-dimensional array, ``axis=0`` will, in effect, rearrange the rows of the
 array, and  ``axis=1`` will rearrange the columns.  For example
 
-    >>> rg = np.random.default_rng()
+    >>> rng = np.random.default_rng()
     >>> x = np.arange(0, 15).reshape(3, 5)
     >>> x
     array([[ 0,  1,  2,  3,  4],
            [ 5,  6,  7,  8,  9],
            [10, 11, 12, 13, 14]])
-    >>> rg.permutation(x, axis=1)
+    >>> rng.permutation(x, axis=1)
     array([[ 1,  3,  2,  0,  4],  # random
            [ 6,  8,  7,  5,  9],
            [11, 13, 12, 10, 14]])
@@ -115,7 +116,7 @@ how `numpy.sort` treats it.  Each slice along the given axis is shuffled
 independently of the others.  Compare the following example of the use of
 `Generator.permuted` to the above example of `Generator.permutation`:
 
-    >>> rg.permuted(x, axis=1)
+    >>> rng.permuted(x, axis=1)
     array([[ 1,  0,  2,  4,  3],  # random
            [ 5,  7,  6,  9,  8],
            [10, 14, 12, 13, 11]])
@@ -130,9 +131,9 @@ Shuffling non-NumPy sequences
 a sequence that is not a NumPy array, it shuffles that sequence in-place.
 For example,
 
-    >>> rg = np.random.default_rng()
+    >>> rng = np.random.default_rng()
     >>> a = ['A', 'B', 'C', 'D', 'E']
-    >>> rg.shuffle(a)  # shuffle the list in-place
+    >>> rng.shuffle(a)  # shuffle the list in-place
     >>> a
     ['B', 'D', 'A', 'E', 'C']  # random
 
diff --git a/doc/source/reference/random/index.rst b/doc/source/reference/random/index.rst
index 13ce7c40c..96cd47017 100644
--- a/doc/source/reference/random/index.rst
+++ b/doc/source/reference/random/index.rst
@@ -25,7 +25,7 @@ nep-0019-rng-policy.html>`_ for context on the updated random Numpy number
 routines. The legacy `RandomState` random number routines are still
 available, but limited to a single BitGenerator. See :ref:`new-or-different` 
 for a complete list of improvements and differences from the legacy
-``Randomstate``.
+``RandomState``.
 
 For convenience and backward compatibility, a single `RandomState`
 instance's methods are imported into the numpy.random namespace, see
@@ -84,10 +84,10 @@ different
 .. code-block:: python
 
     try:
-        rg_integers = rg.integers
+        rng_integers = rng.integers
     except AttributeError:
-        rg_integers = rg.randint
-    a = rg_integers(1000)
+        rng_integers = rng.randint
+    a = rng_integers(1000)
 
 Seeds can be passed to any of the BitGenerators. The provided value is mixed
 via `SeedSequence` to spread a possible sequence of seeds across a wider
@@ -97,8 +97,8 @@ is wrapped with a `Generator`.
 .. code-block:: python
 
   from numpy.random import Generator, PCG64
-  rg = Generator(PCG64(12345))
-  rg.standard_normal()
+  rng = Generator(PCG64(12345))
+  rng.standard_normal()
   
 Here we use `default_rng` to create an instance of `Generator` to generate a 
 random float:
@@ -146,10 +146,10 @@ As a convenience NumPy  provides the `default_rng` function to hide these
 details:
   
 >>> from numpy.random import default_rng
->>> rg = default_rng(12345)
->>> print(rg)
+>>> rng = default_rng(12345)
+>>> print(rng)
 Generator(PCG64)
->>> print(rg.random())
+>>> print(rng.random())
 0.22733602246716966
   
 One can also instantiate `Generator` directly with a `BitGenerator` instance.
@@ -158,16 +158,16 @@ To use the default `PCG64` bit generator, one can instantiate it directly and
 pass it to `Generator`:
 
 >>> from numpy.random import Generator, PCG64
->>> rg = Generator(PCG64(12345))
->>> print(rg)
+>>> rng = Generator(PCG64(12345))
+>>> print(rng)
 Generator(PCG64)
 
 Similarly to use the older `MT19937` bit generator (not recommended), one can
 instantiate it directly and pass it to `Generator`:
 
 >>> from numpy.random import Generator, MT19937
->>> rg = Generator(MT19937(12345))
->>> print(rg)
+>>> rng = Generator(MT19937(12345))
+>>> print(rng)
 Generator(MT19937)
 
 What's New or Different
@@ -222,6 +222,9 @@ one of three ways:
 * :ref:`independent-streams`
 * :ref:`parallel-jumped`
 
+Users with a very large amount of parallelism will want to consult
+:ref:`upgrading-pcg64`.
+
 Concepts
 --------
 .. toctree::
@@ -230,6 +233,7 @@ Concepts
    generator
    Legacy Generator (RandomState) <legacy>
    BitGenerators, SeedSequences <bit_generators/index>
+   Upgrading PCG64 with PCG64DXSM <upgrading-pcg64>
 
 Features
 --------
diff --git a/doc/source/reference/random/legacy.rst b/doc/source/reference/random/legacy.rst
index 6cf4775b8..42437dbb6 100644
--- a/doc/source/reference/random/legacy.rst
+++ b/doc/source/reference/random/legacy.rst
@@ -48,7 +48,8 @@ using the state of the `RandomState`:
 
 
 .. autoclass:: RandomState
-	:exclude-members:
+    :members: __init__
+    :exclude-members: __init__
 
 Seeding and State
 =================
diff --git a/doc/source/reference/random/new-or-different.rst b/doc/source/reference/random/new-or-different.rst
index 6cab0f729..a81543926 100644
--- a/doc/source/reference/random/new-or-different.rst
+++ b/doc/source/reference/random/new-or-different.rst
@@ -58,18 +58,18 @@ And in more detail:
 
   from  numpy.random import Generator, PCG64
   import numpy.random
-  rg = Generator(PCG64())
-  %timeit -n 1 rg.standard_normal(100000)
+  rng = Generator(PCG64())
+  %timeit -n 1 rng.standard_normal(100000)
   %timeit -n 1 numpy.random.standard_normal(100000)
 
 .. ipython:: python
 
-  %timeit -n 1 rg.standard_exponential(100000)
+  %timeit -n 1 rng.standard_exponential(100000)
   %timeit -n 1 numpy.random.standard_exponential(100000)
 
 .. ipython:: python
 
-  %timeit -n 1 rg.standard_gamma(3.0, 100000)
+  %timeit -n 1 rng.standard_gamma(3.0, 100000)
   %timeit -n 1 numpy.random.standard_gamma(3.0, 100000)
 
 
@@ -94,9 +94,9 @@ And in more detail:
 
 .. ipython:: python
 
-  rg = Generator(PCG64(0))
-  rg.random(3, dtype='d')
-  rg.random(3, dtype='f')
+  rng = Generator(PCG64(0))
+  rng.random(3, dtype='d')
+  rng.random(3, dtype='f')
 
 * Optional ``out`` argument that allows existing arrays to be filled for
   select distributions
@@ -112,7 +112,7 @@ And in more detail:
 .. ipython:: python
 
   existing = np.zeros(4)
-  rg.random(out=existing[:2])
+  rng.random(out=existing[:2])
   print(existing)
 
 * Optional ``axis`` argument for methods like `~.Generator.choice`,
@@ -121,9 +121,9 @@ And in more detail:
 
 .. ipython:: python
 
-  rg = Generator(PCG64(123456789))
+  rng = Generator(PCG64(123456789))
   a = np.arange(12).reshape((3, 4))
   a
-  rg.choice(a, axis=1, size=5)
-  rg.shuffle(a, axis=1)        # Shuffle in-place
+  rng.choice(a, axis=1, size=5)
+  rng.shuffle(a, axis=1)        # Shuffle in-place
   a
diff --git a/doc/source/reference/random/parallel.rst b/doc/source/reference/random/parallel.rst
index 721584014..7f0207bde 100644
--- a/doc/source/reference/random/parallel.rst
+++ b/doc/source/reference/random/parallel.rst
@@ -88,10 +88,11 @@ territory ([2]_).
        estimate the naive upper bound on a napkin and take comfort knowing
        that the probability is actually lower.
 
-.. [2] In this calculation, we can ignore the amount of numbers drawn from each
-       stream. Each of the PRNGs we provide has some extra protection built in
+.. [2] In this calculation, we can mostly ignore the amount of numbers drawn from each
+       stream. See :ref:`upgrading-pcg64` for the technical details about
+       `PCG64`. The other PRNGs we provide have some extra protection built in
        that avoids overlaps if the `~SeedSequence` pools differ in the
-       slightest bit. `PCG64` has :math:`2^{127}` separate cycles
+       slightest bit. `PCG64DXSM` has :math:`2^{127}` separate cycles
        determined by the seed in addition to the position in the
        :math:`2^{128}` long period for each cycle, so one has to both get on or
        near the same cycle *and* seed a nearby position in the cycle.
@@ -150,12 +151,14 @@ BitGenerator, the size of the jump and the bits in the default unsigned random
 are listed below.
 
 +-----------------+-------------------------+-------------------------+-------------------------+
-| BitGenerator    | Period                  |  Jump Size              | Bits                    |
+| BitGenerator    | Period                  |  Jump Size              | Bits per Draw           |
 +=================+=========================+=========================+=========================+
-| MT19937         | :math:`2^{19937}`       | :math:`2^{128}`         | 32                      |
+| MT19937         | :math:`2^{19937}-1`     | :math:`2^{128}`         | 32                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
 | PCG64           | :math:`2^{128}`         | :math:`~2^{127}` ([3]_) | 64                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
+| PCG64DXSM       | :math:`2^{128}`         | :math:`~2^{127}` ([3]_) | 64                      |
++-----------------+-------------------------+-------------------------+-------------------------+
 | Philox          | :math:`2^{256}`         | :math:`2^{128}`         | 64                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
 
diff --git a/doc/source/reference/random/performance.py b/doc/source/reference/random/performance.py
index 28a42eb0d..794142836 100644
--- a/doc/source/reference/random/performance.py
+++ b/doc/source/reference/random/performance.py
@@ -1,14 +1,13 @@
-from collections import OrderedDict
 from timeit import repeat
 
 import pandas as pd
 
 import numpy as np
-from numpy.random import MT19937, PCG64, Philox, SFC64
+from numpy.random import MT19937, PCG64, PCG64DXSM, Philox, SFC64
 
-PRNGS = [MT19937, PCG64, Philox, SFC64]
+PRNGS = [MT19937, PCG64, PCG64DXSM, Philox, SFC64]
 
-funcs = OrderedDict()
+funcs = {}
 integers = 'integers(0, 2**{bits},size=1000000, dtype="uint{bits}")'
 funcs['32-bit Unsigned Ints'] = integers.format(bits=32)
 funcs['64-bit Unsigned Ints'] = integers.format(bits=64)
@@ -26,10 +25,10 @@ rg = Generator({prng}())
 """
 
 test = "rg.{func}"
-table = OrderedDict()
+table = {}
 for prng in PRNGS:
     print(prng)
-    col = OrderedDict()
+    col = {}
     for key in funcs:
         t = repeat(test.format(func=funcs[key]),
                    setup.format(prng=prng().__class__.__name__),
@@ -38,7 +37,7 @@ for prng in PRNGS:
     col = pd.Series(col)
     table[prng().__class__.__name__] = col
 
-npfuncs = OrderedDict()
+npfuncs = {}
 npfuncs.update(funcs)
 npfuncs['32-bit Unsigned Ints'] = 'randint(2**32,dtype="uint32",size=1000000)'
 npfuncs['64-bit Unsigned Ints'] = 'randint(2**64,dtype="uint64",size=1000000)'
@@ -54,7 +53,7 @@ for key in npfuncs:
     col[key] = 1000 * min(t)
 table['RandomState'] = pd.Series(col)
 
-columns = ['MT19937','PCG64','Philox','SFC64', 'RandomState']
+columns = ['MT19937', 'PCG64', 'PCG64DXSM', 'Philox', 'SFC64', 'RandomState']
 table = pd.DataFrame(table)
 order = np.log(table).mean().sort_values().index
 table = table.T
diff --git a/doc/source/reference/random/performance.rst b/doc/source/reference/random/performance.rst
index 74dad4cc3..85855be59 100644
--- a/doc/source/reference/random/performance.rst
+++ b/doc/source/reference/random/performance.rst
@@ -5,9 +5,12 @@ Performance
 
 Recommendation
 **************
-The recommended generator for general use is `PCG64`. It is
-statistically high quality, full-featured, and fast on most platforms, but
-somewhat slow when compiled for 32-bit processes.
+
+The recommended generator for general use is `PCG64` or its upgraded variant
+`PCG64DXSM` for heavily-parallel use cases. They are statistically high quality,
+full-featured, and fast on most platforms, but somewhat slow when compiled for
+32-bit processes. See :ref:`upgrading-pcg64` for details on when heavy
+parallelism would indicate using `PCG64DXSM`.
 
 `Philox` is fairly slow, but its statistical properties have
 very high quality, and it is easy to get assuredly-independent stream by using
@@ -39,49 +42,48 @@ Integer performance has a similar ordering.
 
 The pattern is similar for other, more complex generators. The normal
 performance of the legacy `RandomState` generator is much
-lower than the other since it uses the Box-Muller transformation rather
-than the Ziggurat generator. The performance gap for Exponentials is also
+lower than the other since it uses the Box-Muller transform rather
+than the Ziggurat method. The performance gap for Exponentials is also
 large due to the cost of computing the log function to invert the CDF.
-The column labeled MT19973 is used the same 32-bit generator as
-`RandomState` but produces random values using
-`Generator`.
+The column labeled MT19973 uses the same 32-bit generator as
+`RandomState` but produces random variates using `Generator`.
 
 .. csv-table::
-    :header: ,MT19937,PCG64,Philox,SFC64,RandomState
-    :widths: 14,14,14,14,14,14
-
-    32-bit Unsigned Ints,3.2,2.7,4.9,2.7,3.2
-    64-bit Unsigned Ints,5.6,3.7,6.3,2.9,5.7
-    Uniforms,7.3,4.1,8.1,3.1,7.3
-    Normals,13.1,10.2,13.5,7.8,34.6
-    Exponentials,7.9,5.4,8.5,4.1,40.3
-    Gammas,34.8,28.0,34.7,25.1,58.1
-    Binomials,25.0,21.4,26.1,19.5,25.2
-    Laplaces,45.1,40.7,45.5,38.1,45.6
-    Poissons,67.6,52.4,69.2,46.4,78.1
+    :header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64,RandomState
+    :widths: 14,14,14,14,14,14,14
+
+    32-bit Unsigned Ints,3.3,1.9,2.0,3.3,1.8,3.1
+    64-bit Unsigned Ints,5.6,3.2,2.9,4.9,2.5,5.5
+    Uniforms,5.9,3.1,2.9,5.0,2.6,6.0
+    Normals,13.9,10.8,10.5,12.0,8.3,56.8
+    Exponentials,9.1,6.0,5.8,8.1,5.4,63.9
+    Gammas,37.2,30.8,28.9,34.0,27.5,77.0
+    Binomials,21.3,17.4,17.6,19.3,15.6,21.4
+    Laplaces,73.2,72.3,76.1,73.0,72.3,82.5
+    Poissons,111.7,103.4,100.5,109.4,90.7,115.2
 
 The next table presents the performance in percentage relative to values
 generated by the legacy generator, ``RandomState(MT19937())``. The overall
 performance was computed using a geometric mean.
 
 .. csv-table::
-    :header: ,MT19937,PCG64,Philox,SFC64
-    :widths: 14,14,14,14,14
-
-    32-bit Unsigned Ints,101,121,67,121
-    64-bit Unsigned Ints,102,156,91,199
-    Uniforms,100,179,90,235
-    Normals,263,338,257,443
-    Exponentials,507,752,474,985
-    Gammas,167,207,167,231
-    Binomials,101,118,96,129
-    Laplaces,101,112,100,120
-    Poissons,116,149,113,168
-    Overall,144,192,132,225
+    :header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64
+    :widths: 14,14,14,14,14,14
+
+    32-bit Unsigned Ints,96,162,160,96,175
+    64-bit Unsigned Ints,97,171,188,113,218
+    Uniforms,102,192,206,121,233
+    Normals,409,526,541,471,684
+    Exponentials,701,1071,1101,784,1179
+    Gammas,207,250,266,227,281
+    Binomials,100,123,122,111,138
+    Laplaces,113,114,108,113,114
+    Poissons,103,111,115,105,127
+    Overall,159,219,225,174,251
 
 .. note::
 
-   All timings were taken using Linux on an i5-3570 processor.
+   All timings were taken using Linux on an AMD Ryzen 9 3900X processor.
 
 Performance on different Operating Systems
 ******************************************
@@ -98,33 +100,33 @@ across tables.
 64-bit Linux
 ~~~~~~~~~~~~
 
-===================   =========  =======  ========  =======
-Distribution            MT19937    PCG64    Philox    SFC64
-===================   =========  =======  ========  =======
-32-bit Unsigned Int         100    119.8      67.7    120.2
-64-bit Unsigned Int         100    152.9      90.8    213.3
-Uniforms                    100    179.0      87.0    232.0
-Normals                     100    128.5      99.2    167.8
-Exponentials                100    148.3      93.0    189.3
-**Overall**                 100    144.3      86.8    180.0
-===================   =========  =======  ========  =======
+=====================   =========  =======  ===========  ========  =======
+Distribution            MT19937    PCG64    PCG64DXSM    Philox    SFC64
+=====================   =========  =======  ===========  ========  =======
+32-bit Unsigned Ints          100      168         166        100      182
+64-bit Unsigned Ints          100      176         193        116      224
+Uniforms                      100      188         202        118      228
+Normals                       100      128         132        115      167
+Exponentials                  100      152         157        111      168
+Overall                       100      161         168        112      192
+=====================   =========  =======  ===========  ========  =======
 
 
 64-bit Windows
 ~~~~~~~~~~~~~~
-The relative performance on 64-bit Linux and 64-bit Windows is broadly similar.
-
+The relative performance on 64-bit Linux and 64-bit Windows is broadly similar
+with the notable exception of the Philox generator.
 
-===================   =========  =======  ========  =======
-Distribution            MT19937    PCG64    Philox    SFC64
-===================   =========  =======  ========  =======
-32-bit Unsigned Int         100    129.1      35.0    135.0
-64-bit Unsigned Int         100    146.9      35.7    176.5
-Uniforms                    100    165.0      37.0    192.0
-Normals                     100    128.5      48.5    158.0
-Exponentials                100    151.6      39.0    172.8
-**Overall**                 100    143.6      38.7    165.7
-===================   =========  =======  ========  =======
+=====================   =========  =======  ===========  ========  =======
+Distribution              MT19937    PCG64    PCG64DXSM    Philox    SFC64
+=====================   =========  =======  ===========  ========  =======
+32-bit Unsigned Ints          100      155          131        29      150
+64-bit Unsigned Ints          100      157          143        25      154
+Uniforms                      100      151          144        24      155
+Normals                       100      129          128        37      150
+Exponentials                  100      150          145        28      159
+**Overall**                   100      148          138        28      154
+=====================   =========  =======  ===========  ========  =======
 
 
 32-bit Windows
@@ -134,20 +136,20 @@ The performance of 64-bit generators on 32-bit Windows is much lower than on 64-
 operating systems due to register width. MT19937, the generator that has been
 in NumPy since 2005, operates on 32-bit integers.
 
-===================   =========  =======  ========  =======
-Distribution            MT19937    PCG64    Philox    SFC64
-===================   =========  =======  ========  =======
-32-bit Unsigned Int         100     30.5      21.1     77.9
-64-bit Unsigned Int         100     26.3      19.2     97.0
-Uniforms                    100     28.0      23.0    106.0
-Normals                     100     40.1      31.3    112.6
-Exponentials                100     33.7      26.3    109.8
-**Overall**                 100     31.4      23.8     99.8
-===================   =========  =======  ========  =======
+=====================   =========  =======  ===========  ========  =======
+Distribution            MT19937    PCG64    PCG64DXSM    Philox    SFC64
+=====================   =========  =======  ===========  ========  =======
+32-bit Unsigned Ints          100       24           34        14       57
+64-bit Unsigned Ints          100       21           32        14       74
+Uniforms                      100       21           34        16       73
+Normals                       100       36           57        28      101
+Exponentials                  100       28           44        20       88
+**Overall**                   100       25           39        18       77
+=====================   =========  =======  ===========  ========  =======
 
 
 .. note::
 
-   Linux timings used Ubuntu 18.04 and GCC 7.4.  Windows timings were made on
+   Linux timings used Ubuntu 20.04 and GCC 9.3.0.  Windows timings were made on
    Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual
-   Studio 2015). All timings were produced on an i5-3570 processor.
+   Studio 2019). All timings were produced on an AMD Ryzen 9 3900X processor.
diff --git a/doc/source/reference/random/upgrading-pcg64.rst b/doc/source/reference/random/upgrading-pcg64.rst
new file mode 100644
index 000000000..9e540ace9
--- /dev/null
+++ b/doc/source/reference/random/upgrading-pcg64.rst
@@ -0,0 +1,152 @@
+.. _upgrading-pcg64:
+
+.. currentmodule:: numpy.random
+
+Upgrading ``PCG64`` with ``PCG64DXSM``
+--------------------------------------
+
+Uses of the `PCG64` `BitGenerator` in a massively-parallel context have been
+shown to have statistical weaknesses that were not apparent at the first
+release in numpy 1.17. Most users will never observe this weakness and are
+safe to continue to use `PCG64`. We have introduced a new `PCG64DXSM`
+`BitGenerator` that will eventually become the new default `BitGenerator`
+implementation used by `default_rng` in future releases. `PCG64DXSM` solves
+the statistical weakness while preserving the performance and the features of
+`PCG64`.
+
+Does this affect me?
+====================
+
+If you
+
+  1. only use a single `Generator` instance,
+  2. only use `RandomState` or the functions in `numpy.random`,
+  3. only use the `PCG64.jumped` method to generate parallel streams,
+  4. explicitly use a `BitGenerator` other than `PCG64`,
+
+then this weakness does not affect you at all. Carry on.
+
+If you use moderate numbers of parallel streams created with `default_rng` or
+`SeedSequence.spawn`, in the 1000s, then the chance of observing this weakness
+is negligibly small. You can continue to use `PCG64` comfortably.
+
+If you use very large numbers of parallel streams, in the millions, and draw
+large amounts of numbers from each, then the chance of observing this weakness
+can become non-negligible, if still small. An example of such a use case would
+be a very large distributed reinforcement learning problem with millions of
+long Monte Carlo playouts each generating billions of random number draws. Such
+use cases should consider using `PCG64DXSM` explicitly or another
+modern `BitGenerator` like `SFC64` or `Philox`, but it is unlikely that any
+old results you may have calculated are invalid. In any case, the weakness is
+a kind of `Birthday Paradox <https://en.wikipedia.org/wiki/Birthday_problem>`_
+collision. That is, a single pair of parallel streams out of the millions,
+considered together, might fail a stringent set of statistical tests of
+randomness. The remaining millions of streams would all be perfectly fine, and
+the effect of the bad pair in the whole calculation is very likely to be
+swamped by the remaining streams in most applications.
+
+.. _upgrading-pcg64-details:
+
+Technical Details
+=================
+
+Like many PRNG algorithms, `PCG64` is constructed from a transition function,
+which advances a 128-bit state, and an output function, that mixes the 128-bit
+state into a 64-bit integer to be output. One of the guiding design principles
+of the PCG family of PRNGs is to balance the computational cost (and
+pseudorandomness strength) between the transition function and the output
+function. The transition function is a 128-bit linear congruential generator
+(LCG), which consists of multiplying the 128-bit state with a fixed
+multiplication constant and then adding a user-chosen increment, in 128-bit
+modular arithmetic. LCGs are well-analyzed PRNGs with known weaknesses, though
+128-bit LCGs are large enough to pass stringent statistical tests on their own,
+with only the trivial output function. The output function of `PCG64` is
+intended to patch up some of those known weaknesses by doing "just enough"
+scrambling of the bits to assist in the statistical properties without adding
+too much computational cost.
+
+One of these known weaknesses is that advancing the state of the LCG by steps
+numbering a power of two (``bg.advance(2**N)``) will leave the lower ``N`` bits
+identical to the state that was just left. For a single stream drawn from
+sequentially, this is of little consequence. The remaining :math:`128-N` bits provide
+plenty of pseudorandomness that will be mixed in for any practical ``N`` that can
+be observed in a single stream, which is why one does not need to worry about
+this if you only use a single stream in your application. Similarly, the
+`PCG64.jumped` method uses a carefully chosen number of steps to avoid creating
+these collisions. However, once you start creating "randomly-initialized"
+parallel streams, either using OS entropy by calling `default_rng` repeatedly
+or using `SeedSequence.spawn`, then we need to consider how many lower bits
+need to "collide" in order to create a bad pair of streams, and then evaluate
+the probability of creating such a collision.
+`Empirically <https://github.com/numpy/numpy/issues/16313>`_, it has been
+determined that if one shares the lower 58 bits of state and shares an
+increment, then the pair of streams, when interleaved, will fail 
+`PractRand <http://pracrand.sourceforge.net/>`_ in
+a reasonable amount of time, after drawing a few gigabytes of data. Following
+the standard Birthday Paradox calculations for a collision of 58 bits, we can
+see that we can create :math:`2^{29}`, or about half a billion, streams which is when
+the probability of such a collision becomes high. Half a billion streams is
+quite high, and the amount of data each stream needs to draw before the
+statistical correlations become apparent to even the strict ``PractRand`` tests
+is in the gigabytes. But this is on the horizon for very large applications
+like distributed reinforcement learning. There are reasons to expect that even
+in these applications a collision probably will not have a practical effect in
+the total result, since the statistical problem is constrained to just the
+colliding pair.
+
+Now, let us consider the case when the increment is not constrained to be the
+same. Our implementation of `PCG64` seeds both the state and the increment;
+that is, two calls to `default_rng` (almost certainly) have different states
+and increments. Upon our first release, we believed that having the seeded
+increment would provide a certain amount of extra protection, that one would
+have to be "close" in both the state space and increment space in order to
+observe correlations (``PractRand`` failures) in a pair of streams. If that were
+true, then the "bottleneck" for collisions would be the 128-bit entropy pool
+size inside of `SeedSequence` (and 128-bit collisions are in the
+"preposterously unlikely" category). Unfortunately, this is not true.
+
+One of the known properties of an LCG is that different increments create
+*distinct* streams, but with a known relationship. Each LCG has an orbit that
+traverses all :math:`2^{128}` different 128-bit states. Two LCGs with different
+increments are related in that one can "rotate" the orbit of the first LCG
+(advance it by a number of steps that we can compute from the two increments)
+such that then both LCGs will always then have the same state, up to an
+additive constant and maybe an inversion of the bits. If you then iterate both
+streams in lockstep, then the states will *always* remain related by that same
+additive constant (and the inversion, if present). Recall that `PCG64` is
+constructed from both a transition function (the LCG) and an output function.
+It was expected that the scrambling effect of the output function would have
+been strong enough to make the distinct streams practically independent (i.e.
+"passing the ``PractRand`` tests") unless the two increments were
+pathologically related to each other (e.g. 1 and 3). The output function XSL-RR
+of the then-standard PCG algorithm that we implemented in `PCG64` turns out to
+be too weak to cover up for the 58-bit collision of the underlying LCG that we
+described above. For any given pair of increments, the size of the "colliding"
+space of states is the same, so for this weakness, the extra distinctness
+provided by the increments does not translate into extra protection from
+statistical correlations that ``PractRand`` can detect.
+
+Fortunately, strengthening the output function is able to correct this weakness
+and *does* turn the extra distinctness provided by differing increments into
+additional protection from these low-bit collisions. To the `PCG author's
+credit <https://github.com/numpy/numpy/issues/13635#issuecomment-506088698>`_,
+she had developed a stronger output function in response to related discussions
+during the long birth of the new `BitGenerator` system. We NumPy developers
+chose to be "conservative" and use the XSL-RR variant that had undergone
+a longer period of testing at that time. The DXSM output function adopts
+a "xorshift-multiply" construction used in strong integer hashes that has much
+better avalanche properties than the XSL-RR output function. While there are
+"pathological" pairs of increments that induce "bad" additive constants that
+relate the two streams, the vast majority of pairs induce "good" additive
+constants that make the merely-distinct streams of LCG states into
+practically-independent output streams. Indeed, now the claim we once made
+about `PCG64` is actually true of `PCG64DXSM`: collisions are possible, but
+both streams have to simultaneously be both "close" in the 128 bit state space
+*and* "close" in the 127-bit increment space, so that would be less likely than
+the negligible chance of colliding in the 128-bit internal `SeedSequence` pool.
+The DXSM output function is more computationally intensive than XSL-RR, but
+some optimizations in the LCG more than make up for the performance hit on most
+machines, so `PCG64DXSM` is a good, safe upgrade. There are, of course, an
+infinite number of stronger output functions that one could consider, but most
+will have a greater computational cost, and the DXSM output function has now
+received many CPU cycles of testing via ``PractRand`` at this time.