Skip to content

Commit

Permalink
Merge pull request #18906 from rkern/enh/pcg64dxsm
Browse files Browse the repository at this point in the history
ENH: Add PCG64DXSM BitGenerator
  • Loading branch information
charris committed May 5, 2021
2 parents 0f7f313 + d97c634 commit 73a0892
Show file tree
Hide file tree
Showing 19 changed files with 2,678 additions and 22 deletions.
17 changes: 17 additions & 0 deletions doc/release/upcoming_changes/18906.new_function.rst
@@ -0,0 +1,17 @@
.. currentmodule:: numpy.random

Add `PCG64DXSM` `BitGenerator`
------------------------------

Uses of the `PCG64` `BitGenerator` in a massively-parallel context have been
shown to have statistical weaknesses that were not apparent at the first
release in numpy 1.17. Most users will never observe this weakness and are
safe to continue to use `PCG64`. We have introduced a new `PCG64DXSM`
`BitGenerator` that will eventually become the new default `BitGenerator`
implementation used by `default_rng` in future releases. `PCG64DXSM` solves
the statistical weakness while preserving the performance and the features of
`PCG64`.

See :ref:`upgrading-pcg64` for more details.

.. currentmodule:: numpy
12 changes: 8 additions & 4 deletions doc/source/reference/random/bit_generators/index.rst
Expand Up @@ -15,10 +15,13 @@ Supported BitGenerators

The included BitGenerators are:

* PCG-64 - The default. A fast generator that supports many parallel streams
and can be advanced by an arbitrary amount. See the documentation for
:meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
author's page`_ for more details about this class of PRNG.
* PCG-64 - The default. A fast generator that can be advanced by an arbitrary
amount. See the documentation for :meth:`~.PCG64.advance`. PCG-64 has
a period of :math:`2^{128}`. See the `PCG author's page`_ for more details
about this class of PRNG.
* PCG-64 DXSM - An upgraded version of PCG-64 with better statistical
properties in parallel contexts. See :ref:`upgrading-pcg64` for more
information on these improvements.
* MT19937 - The standard Python BitGenerator. Adds a `MT19937.jumped`
function that returns a new generator with state as-if :math:`2^{128}` draws have
been made.
Expand All @@ -43,6 +46,7 @@ The included BitGenerators are:

MT19937 <mt19937>
PCG64 <pcg64>
PCG64DXSM <pcg64dxsm>
Philox <philox>
SFC64 <sfc64>

Expand Down
32 changes: 32 additions & 0 deletions doc/source/reference/random/bit_generators/pcg64dxsm.rst
@@ -0,0 +1,32 @@
Permuted Congruential Generator (64-bit, PCG64 DXSM)
----------------------------------------------------

.. currentmodule:: numpy.random

.. autoclass:: PCG64DXSM
:members: __init__
:exclude-members: __init__

State
=====

.. autosummary::
:toctree: generated/

~PCG64DXSM.state

Parallel generation
===================
.. autosummary::
:toctree: generated/

~PCG64DXSM.advance
~PCG64DXSM.jumped

Extending
=========
.. autosummary::
:toctree: generated/

~PCG64DXSM.cffi
~PCG64DXSM.ctypes
4 changes: 4 additions & 0 deletions doc/source/reference/random/index.rst
Expand Up @@ -222,6 +222,9 @@ one of three ways:
* :ref:`independent-streams`
* :ref:`parallel-jumped`

Users with a very large amount of parallelism will want to consult
:ref:`upgrading-pcg64`.

Concepts
--------
.. toctree::
Expand All @@ -230,6 +233,7 @@ Concepts
generator
Legacy Generator (RandomState) <legacy>
BitGenerators, SeedSequences <bit_generators/index>
Upgrading PCG64 with PCG64DXSM <upgrading-pcg64>

Features
--------
Expand Down
13 changes: 8 additions & 5 deletions doc/source/reference/random/parallel.rst
Expand Up @@ -88,10 +88,11 @@ territory ([2]_).
estimate the naive upper bound on a napkin and take comfort knowing
that the probability is actually lower.
.. [2] In this calculation, we can ignore the amount of numbers drawn from each
stream. Each of the PRNGs we provide has some extra protection built in
.. [2] In this calculation, we can mostly ignore the amount of numbers drawn from each
stream. See :ref:`upgrading-pcg64` for the technical details about
`PCG64`. The other PRNGs we provide have some extra protection built in
that avoids overlaps if the `~SeedSequence` pools differ in the
slightest bit. `PCG64` has :math:`2^{127}` separate cycles
slightest bit. `PCG64DXSM` has :math:`2^{127}` separate cycles
determined by the seed in addition to the position in the
:math:`2^{128}` long period for each cycle, so one has to both get on or
near the same cycle *and* seed a nearby position in the cycle.
Expand Down Expand Up @@ -150,12 +151,14 @@ BitGenerator, the size of the jump and the bits in the default unsigned random
are listed below.

+-----------------+-------------------------+-------------------------+-------------------------+
| BitGenerator | Period | Jump Size | Bits |
| BitGenerator | Period | Jump Size | Bits per Draw |
+=================+=========================+=========================+=========================+
| MT19937 | :math:`2^{19937}` | :math:`2^{128}` | 32 |
| MT19937 | :math:`2^{19937}-1` | :math:`2^{128}` | 32 |
+-----------------+-------------------------+-------------------------+-------------------------+
| PCG64 | :math:`2^{128}` | :math:`~2^{127}` ([3]_) | 64 |
+-----------------+-------------------------+-------------------------+-------------------------+
| PCG64DXSM | :math:`2^{128}` | :math:`~2^{127}` ([3]_) | 64 |
+-----------------+-------------------------+-------------------------+-------------------------+
| Philox | :math:`2^{256}` | :math:`2^{128}` | 64 |
+-----------------+-------------------------+-------------------------+-------------------------+

Expand Down
6 changes: 3 additions & 3 deletions doc/source/reference/random/performance.py
Expand Up @@ -3,9 +3,9 @@
import pandas as pd

import numpy as np
from numpy.random import MT19937, PCG64, Philox, SFC64
from numpy.random import MT19937, PCG64, PCG64DXSM, Philox, SFC64

PRNGS = [MT19937, PCG64, Philox, SFC64]
PRNGS = [MT19937, PCG64, PCG64DXSM, Philox, SFC64]

funcs = {}
integers = 'integers(0, 2**{bits},size=1000000, dtype="uint{bits}")'
Expand Down Expand Up @@ -53,7 +53,7 @@
col[key] = 1000 * min(t)
table['RandomState'] = pd.Series(col)

columns = ['MT19937','PCG64','Philox','SFC64', 'RandomState']
columns = ['MT19937', 'PCG64', 'PCG64DXSM', 'Philox', 'SFC64', 'RandomState']
table = pd.DataFrame(table)
order = np.log(table).mean().sort_values().index
table = table.T
Expand Down
9 changes: 6 additions & 3 deletions doc/source/reference/random/performance.rst
Expand Up @@ -5,9 +5,12 @@ Performance

Recommendation
**************
The recommended generator for general use is `PCG64`. It is
statistically high quality, full-featured, and fast on most platforms, but
somewhat slow when compiled for 32-bit processes.

The recommended generator for general use is `PCG64` or its upgraded variant
`PCG64DXSM` for heavily-parallel use cases. They are statistically high quality,
full-featured, and fast on most platforms, but somewhat slow when compiled for
32-bit processes. See :ref:`upgrading-pcg64` for details on when heavy
parallelism would indicate using `PCG64DXSM`.

`Philox` is fairly slow, but its statistical properties have
very high quality, and it is easy to get assuredly-independent stream by using
Expand Down
152 changes: 152 additions & 0 deletions doc/source/reference/random/upgrading-pcg64.rst
@@ -0,0 +1,152 @@
.. _upgrading-pcg64:

.. currentmodule:: numpy.random

Upgrading ``PCG64`` with ``PCG64DXSM``
--------------------------------------

Uses of the `PCG64` `BitGenerator` in a massively-parallel context have been
shown to have statistical weaknesses that were not apparent at the first
release in numpy 1.17. Most users will never observe this weakness and are
safe to continue to use `PCG64`. We have introduced a new `PCG64DXSM`
`BitGenerator` that will eventually become the new default `BitGenerator`
implementation used by `default_rng` in future releases. `PCG64DXSM` solves
the statistical weakness while preserving the performance and the features of
`PCG64`.

Does this affect me?
====================

If you

1. only use a single `Generator` instance,
2. only use `RandomState` or the functions in `numpy.random`,
3. only use the `PCG64.jumped` method to generate parallel streams,
4. explicitly use a `BitGenerator` other than `PCG64`,

then this weakness does not affect you at all. Carry on.

If you use moderate numbers of parallel streams created with `default_rng` or
`SeedSequence.spawn`, in the 1000s, then the chance of observing this weakness
is negligibly small. You can continue to use `PCG64` comfortably.

If you use very large numbers of parallel streams, in the millions, and draw
large amounts of numbers from each, then the chance of observing this weakness
can become non-negligible, if still small. An example of such a use case would
be a very large distributed reinforcement learning problem with millions of
long Monte Carlo playouts each generating billions of random number draws. Such
use cases should consider using `PCG64DXSM` explicitly or another
modern `BitGenerator` like `SFC64` or `Philox`, but it is unlikely that any
old results you may have calculated are invalid. In any case, the weakness is
a kind of `Birthday Paradox <https://en.wikipedia.org/wiki/Birthday_problem>`_
collision. That is, a single pair of parallel streams out of the millions,
considered together, might fail a stringent set of statistical tests of
randomness. The remaining millions of streams would all be perfectly fine, and
the effect of the bad pair in the whole calculation is very likely to be
swamped by the remaining streams in most applications.

.. _upgrading-pcg64-details:

Technical Details
=================

Like many PRNG algorithms, `PCG64` is constructed from a transition function,
which advances a 128-bit state, and an output function, that mixes the 128-bit
state into a 64-bit integer to be output. One of the guiding design principles
of the PCG family of PRNGs is to balance the computational cost (and
pseudorandomness strength) between the transition function and the output
function. The transition function is a 128-bit linear congruential generator
(LCG), which consists of multiplying the 128-bit state with a fixed
multiplication constant and then adding a user-chosen increment, in 128-bit
modular arithmetic. LCGs are well-analyzed PRNGs with known weaknesses, though
128-bit LCGs are large enough to pass stringent statistical tests on their own,
with only the trivial output function. The output function of `PCG64` is
intended to patch up some of those known weaknesses by doing "just enough"
scrambling of the bits to assist in the statistical properties without adding
too much computational cost.

One of these known weaknesses is that advancing the state of the LCG by steps
numbering a power of two (``bg.advance(2**N)``) will leave the lower ``N`` bits
identical to the state that was just left. For a single stream drawn from
sequentially, this is of little consequence. The remaining :math:`128-N` bits provide
plenty of pseudorandomness that will be mixed in for any practical ``N`` that can
be observed in a single stream, which is why one does not need to worry about
this if you only use a single stream in your application. Similarly, the
`PCG64.jumped` method uses a carefully chosen number of steps to avoid creating
these collisions. However, once you start creating "randomly-initialized"
parallel streams, either using OS entropy by calling `default_rng` repeatedly
or using `SeedSequence.spawn`, then we need to consider how many lower bits
need to "collide" in order to create a bad pair of streams, and then evaluate
the probability of creating such a collision.
`Empirically <https://github.com/numpy/numpy/issues/16313>`_, it has been
determined that if one shares the lower 58 bits of state and shares an
increment, then the pair of streams, when interleaved, will fail
`PractRand <http://pracrand.sourceforge.net/>`_ in
a reasonable amount of time, after drawing a few gigabytes of data. Following
the standard Birthday Paradox calculations for a collision of 58 bits, we can
see that we can create :math:`2^{29}`, or about half a billion, streams which is when
the probability of such a collision becomes high. Half a billion streams is
quite high, and the amount of data each stream needs to draw before the
statistical correlations become apparent to even the strict ``PractRand`` tests
is in the gigabytes. But this is on the horizon for very large applications
like distributed reinforcement learning. There are reasons to expect that even
in these applications a collision probably will not have a practical effect in
the total result, since the statistical problem is constrained to just the
colliding pair.

Now, let us consider the case when the increment is not constrained to be the
same. Our implementation of `PCG64` seeds both the state and the increment;
that is, two calls to `default_rng` (almost certainly) have different states
and increments. Upon our first release, we believed that having the seeded
increment would provide a certain amount of extra protection, that one would
have to be "close" in both the state space and increment space in order to
observe correlations (``PractRand`` failures) in a pair of streams. If that were
true, then the "bottleneck" for collisions would be the 128-bit entropy pool
size inside of `SeedSequence` (and 128-bit collisions are in the
"preposterously unlikely" category). Unfortunately, this is not true.

One of the known properties of an LCG is that different increments create
*distinct* streams, but with a known relationship. Each LCG has an orbit that
traverses all :math:`2^{128}` different 128-bit states. Two LCGs with different
increments are related in that one can "rotate" the orbit of the first LCG
(advance it by a number of steps that we can compute from the two increments)
such that then both LCGs will always then have the same state, up to an
additive constant and maybe an inversion of the bits. If you then iterate both
streams in lockstep, then the states will *always* remain related by that same
additive constant (and the inversion, if present). Recall that `PCG64` is
constructed from both a transition function (the LCG) and an output function.
It was expected that the scrambling effect of the output function would have
been strong enough to make the distinct streams practically independent (i.e.
"passing the ``PractRand`` tests") unless the two increments were
pathologically related to each other (e.g. 1 and 3). The output function XSL-RR
of the then-standard PCG algorithm that we implemented in `PCG64` turns out to
be too weak to cover up for the 58-bit collision of the underlying LCG that we
described above. For any given pair of increments, the size of the "colliding"
space of states is the same, so for this weakness, the extra distinctness
provided by the increments does not translate into extra protection from
statistical correlations that ``PractRand`` can detect.

Fortunately, strengthening the output function is able to correct this weakness
and *does* turn the extra distinctness provided by differing increments into
additional protection from these low-bit collisions. To the `PCG author's
credit <https://github.com/numpy/numpy/issues/13635#issuecomment-506088698>`_,
she had developed a stronger output function in response to related discussions
during the long birth of the new `BitGenerator` system. We NumPy developers
chose to be "conservative" and use the XSL-RR variant that had undergone
a longer period of testing at that time. The DXSM output function adopts
a "xorshift-multiply" construction used in strong integer hashes that has much
better avalanche properties than the XSL-RR output function. While there are
"pathological" pairs of increments that induce "bad" additive constants that
relate the two streams, the vast majority of pairs induce "good" additive
constants that make the merely-distinct streams of LCG states into
practically-independent output streams. Indeed, now the claim we once made
about `PCG64` is actually true of `PCG64DXSM`: collisions are possible, but
both streams have to simultaneously be both "close" in the 128 bit state space
*and* "close" in the 127-bit increment space, so that would be less likely than
the negligible chance of colliding in the 128-bit internal `SeedSequence` pool.
The DXSM output function is more computationally intensive than XSL-RR, but
some optimizations in the LCG more than make up for the performance hit on most
machines, so `PCG64DXSM` is a good, safe upgrade. There are, of course, an
infinite number of stronger output functions that one could consider, but most
will have a greater computational cost, and the DXSM output function has now
received many CPU cycles of testing via ``PractRand`` at this time.
6 changes: 4 additions & 2 deletions numpy/random/__init__.py
Expand Up @@ -17,6 +17,7 @@
--------------------------------------------- ---
MT19937
PCG64
PCG64DXSM
Philox
SFC64
============================================= ===
Expand Down Expand Up @@ -183,13 +184,14 @@
from ._generator import Generator, default_rng
from .bit_generator import SeedSequence, BitGenerator
from ._mt19937 import MT19937
from ._pcg64 import PCG64
from ._pcg64 import PCG64, PCG64DXSM
from ._philox import Philox
from ._sfc64 import SFC64
from .mtrand import *

__all__ += ['Generator', 'RandomState', 'SeedSequence', 'MT19937',
'Philox', 'PCG64', 'SFC64', 'default_rng', 'BitGenerator']
'Philox', 'PCG64', 'PCG64DXSM', 'SFC64', 'default_rng',
'BitGenerator']


def __RandomState_ctor():
Expand Down
5 changes: 4 additions & 1 deletion numpy/random/__init__.pyi
Expand Up @@ -3,7 +3,10 @@ from typing import List
from numpy.random._generator import Generator as Generator
from numpy.random._generator import default_rng as default_rng
from numpy.random._mt19937 import MT19937 as MT19937
from numpy.random._pcg64 import PCG64 as PCG64
from numpy.random._pcg64 import (
PCG64 as PCG64,
PCG64DXSM as PCG64DXSM,
)
from numpy.random._philox import Philox as Philox
from numpy.random._sfc64 import SFC64 as SFC64
from numpy.random.bit_generator import BitGenerator as BitGenerator
Expand Down
14 changes: 14 additions & 0 deletions numpy/random/_pcg64.pyi
Expand Up @@ -32,3 +32,17 @@ class PCG64(BitGenerator):
value: _PCG64State,
) -> None: ...
def advance(self, delta: int) -> PCG64: ...

class PCG64DXSM(BitGenerator):
def __init__(self, seed: Union[None, _ArrayLikeInt_co, SeedSequence] = ...) -> None: ...
def jumped(self, jumps: int = ...) -> PCG64DXSM: ...
@property
def state(
self,
) -> _PCG64State: ...
@state.setter
def state(
self,
value: _PCG64State,
) -> None: ...
def advance(self, delta: int) -> PCG64DXSM: ...

0 comments on commit 73a0892

Please sign in to comment.