summaryrefslogtreecommitdiff
path: root/Doc/library/random.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/random.rst')
-rw-r--r--Doc/library/random.rst263
1 files changed, 197 insertions, 66 deletions
diff --git a/Doc/library/random.rst b/Doc/library/random.rst
index 8ae242c029..4f251574a3 100644
--- a/Doc/library/random.rst
+++ b/Doc/library/random.rst
@@ -46,10 +46,24 @@ from sources provided by the operating system.
.. warning::
The pseudo-random generators of this module should not be used for
- security purposes.
+ security purposes. For security or cryptographic uses, see the
+ :mod:`secrets` module.
+.. seealso::
+
+ M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
+ equidistributed uniform pseudorandom number generator", ACM Transactions on
+ Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
+
+
+ `Complementary-Multiply-with-Carry recipe
+ <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
+ random number generator with a long period and comparatively simple update
+ operations.
-Bookkeeping functions:
+
+Bookkeeping functions
+---------------------
.. function:: seed(a=None, version=2)
@@ -93,7 +107,8 @@ Bookkeeping functions:
:meth:`randrange` to handle arbitrarily large ranges.
-Functions for integers:
+Functions for integers
+----------------------
.. function:: randrange(stop)
randrange(start, stop[, step])
@@ -116,23 +131,55 @@ Functions for integers:
``randrange(a, b+1)``.
-Functions for sequences:
+Functions for sequences
+-----------------------
.. function:: choice(seq)
Return a random element from the non-empty sequence *seq*. If *seq* is empty,
raises :exc:`IndexError`.
+.. function:: choices(population, weights=None, *, cum_weights=None, k=1)
+
+ Return a *k* sized list of elements chosen from the *population* with replacement.
+ If the *population* is empty, raises :exc:`IndexError`.
+
+ If a *weights* sequence is specified, selections are made according to the
+ relative weights. Alternatively, if a *cum_weights* sequence is given, the
+ selections are made according to the cumulative weights (perhaps computed
+ using :func:`itertools.accumulate`). For example, the relative weights
+ ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
+ ``[10, 15, 45, 50]``. Internally, the relative weights are converted to
+ cumulative weights before making selections, so supplying the cumulative
+ weights saves work.
+
+ If neither *weights* nor *cum_weights* are specified, selections are made
+ with equal probability. If a weights sequence is supplied, it must be
+ the same length as the *population* sequence. It is a :exc:`TypeError`
+ to specify both *weights* and *cum_weights*.
+
+ The *weights* or *cum_weights* can use any numeric type that interoperates
+ with the :class:`float` values returned by :func:`random` (that includes
+ integers, floats, and fractions but excludes decimals).
+
+ .. versionadded:: 3.6
+
.. function:: shuffle(x[, random])
- Shuffle the sequence *x* in place. The optional argument *random* is a
- 0-argument function returning a random float in [0.0, 1.0); by default, this is
- the function :func:`.random`.
+ Shuffle the sequence *x* in place.
+
+ The optional argument *random* is a 0-argument function returning a random
+ float in [0.0, 1.0); by default, this is the function :func:`.random`.
+
+ To shuffle an immutable sequence and return a new shuffled list, use
+ ``sample(x, k=len(x))`` instead.
- Note that for even rather small ``len(x)``, the total number of permutations of
- *x* is larger than the period of most random number generators; this implies
- that most permutations of a long sequence can never be generated.
+ Note that even for small ``len(x)``, the total number of permutations of *x*
+ can quickly grow larger than the period of most random number generators.
+ This implies that most permutations of a long sequence can never be
+ generated. For example, a sequence of length 2080 is the largest that
+ can fit within the period of the Mersenne Twister random number generator.
.. function:: sample(population, k)
@@ -149,13 +196,16 @@ Functions for sequences:
Members of the population need not be :term:`hashable` or unique. If the population
contains repeats, then each occurrence is a possible selection in the sample.
- To choose a sample from a range of integers, use an :func:`range` object as an
+ To choose a sample from a range of integers, use a :func:`range` object as an
argument. This is especially fast and space efficient for sampling from a large
- population: ``sample(range(10000000), 60)``.
+ population: ``sample(range(10000000), k=60)``.
If the sample size is larger than the population size, a :exc:`ValueError`
is raised.
+Real-valued distributions
+-------------------------
+
The following functions generate specific real-valued distributions. Function
parameters are named after the corresponding variables in the distribution's
equation, as used in common mathematical practice; most of these equations can
@@ -250,7 +300,8 @@ be found in any statistics text.
parameter.
-Alternative Generator:
+Alternative Generator
+---------------------
.. class:: SystemRandom([seed])
@@ -262,19 +313,6 @@ Alternative Generator:
:exc:`NotImplementedError` if called.
-.. seealso::
-
- M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
- equidistributed uniform pseudorandom number generator", ACM Transactions on
- Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
-
-
- `Complementary-Multiply-with-Carry recipe
- <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
- random number generator with a long period and comparatively simple update
- operations.
-
-
Notes on Reproducibility
------------------------
@@ -296,53 +334,146 @@ change across Python versions, but two aspects are guaranteed not to change:
Examples and Recipes
--------------------
-Basic usage::
+Basic examples::
- >>> random.random() # Random float x, 0.0 <= x < 1.0
+ >>> random() # Random float: 0.0 <= x < 1.0
0.37444887175646646
- >>> random.uniform(1, 10) # Random float x, 1.0 <= x < 10.0
- 1.1800146073117523
+ >>> uniform(2.5, 10.0) # Random float: 2.5 <= x < 10.0
+ 3.1800146073117523
+
+ >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds
+ 5.148957571865031
- >>> random.randrange(10) # Integer from 0 to 9
+ >>> randrange(10) # Integer from 0 to 9 inclusive
7
- >>> random.randrange(0, 101, 2) # Even integer from 0 to 100
+ >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive
26
- >>> random.choice('abcdefghij') # Single random element
- 'c'
-
- >>> items = [1, 2, 3, 4, 5, 6, 7]
- >>> random.shuffle(items)
- >>> items
- [7, 3, 2, 5, 6, 4, 1]
-
- >>> random.sample([1, 2, 3, 4, 5], 3) # Three samples without replacement
- [4, 1, 5]
-
-A common task is to make a :func:`random.choice` with weighted probabilities.
+ >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence
+ 'draw'
+
+ >>> deck = 'ace two three four'.split()
+ >>> shuffle(deck) # Shuffle a list
+ >>> deck
+ ['four', 'two', 'ace', 'three']
+
+ >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement
+ [40, 10, 50, 30]
+
+Simulations::
+
+ >>> # Six roulette wheel spins (weighted sampling with replacement)
+ >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
+ ['red', 'green', 'black', 'black', 'red', 'black']
+
+ >>> # Deal 20 cards without replacement from a deck of 52 playing cards
+ >>> # and determine the proportion of cards with a ten-value
+ >>> # (a ten, jack, queen, or king).
+ >>> deck = collections.Counter(tens=16, low_cards=36)
+ >>> seen = sample(list(deck.elements()), k=20)
+ >>> seen.count('tens') / 20
+ 0.15
+
+ >>> # Estimate the probability of getting 5 or more heads from 7 spins
+ >>> # of a biased coin that settles on heads 60% of the time.
+ >>> trial = lambda: choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
+ >>> sum(trial() for i in range(10000)) / 10000
+ 0.4169
+
+ >>> # Probability of the median of 5 samples being in middle two quartiles
+ >>> trial = lambda : 2500 <= sorted(choices(range(10000), k=5))[2] < 7500
+ >>> sum(trial() for i in range(10000)) / 10000
+ 0.7958
+
+Example of `statistical bootstrapping
+<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
+with replacement to estimate a confidence interval for the mean of a sample of
+size five::
+
+ # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
+ from statistics import mean
+ from random import choices
+
+ data = 1, 2, 4, 4, 10
+ means = sorted(mean(choices(data, k=5)) for i in range(20))
+ print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
+ f'interval from {means[1]:.1f} to {means[-2]:.1f}')
+
+Example of a `resampling permutation test
+<https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
+to determine the statistical significance or `p-value
+<https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
+between the effects of a drug versus a placebo::
+
+ # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
+ from statistics import mean
+ from random import shuffle
+
+ drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
+ placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
+ observed_diff = mean(drug) - mean(placebo)
+
+ n = 10000
+ count = 0
+ combined = drug + placebo
+ for i in range(n):
+ shuffle(combined)
+ new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
+ count += (new_diff >= observed_diff)
+
+ print(f'{n} label reshufflings produced only {count} instances with a difference')
+ print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
+ print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
+ print(f'hypothesis that there is no difference between the drug and the placebo.')
+
+Simulation of arrival times and service deliveries in a single server queue::
+
+ from random import expovariate, gauss
+ from statistics import mean, median, stdev
+
+ average_arrival_interval = 5.6
+ average_service_time = 5.0
+ stdev_service_time = 0.5
+
+ num_waiting = 0
+ arrivals = []
+ starts = []
+ arrival = service_end = 0.0
+ for i in range(20000):
+ if arrival <= service_end:
+ num_waiting += 1
+ arrival += expovariate(1.0 / average_arrival_interval)
+ arrivals.append(arrival)
+ else:
+ num_waiting -= 1
+ service_start = service_end if num_waiting else arrival
+ service_time = gauss(average_service_time, stdev_service_time)
+ service_end = service_start + service_time
+ starts.append(service_start)
+
+ waits = [start - arrival for arrival, start in zip(arrivals, starts)]
+ print(f'Mean wait: {mean(waits):.1f}. Stdev wait: {stdev(waits):.1f}.')
+ print(f'Median wait: {median(waits):.1f}. Max wait: {max(waits):.1f}.')
-If the weights are small integer ratios, a simple technique is to build a sample
-population with repeats::
-
- >>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
- >>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
- >>> population
- ['Red', 'Red', 'Red', 'Blue', 'Blue', 'Yellow', 'Green', 'Green', 'Green', 'Green']
-
- >>> random.choice(population)
- 'Green'
-
-A more general approach is to arrange the weights in a cumulative distribution
-with :func:`itertools.accumulate`, and then locate the random value with
-:func:`bisect.bisect`::
-
- >>> choices, weights = zip(*weighted_choices)
- >>> cumdist = list(itertools.accumulate(weights))
- >>> cumdist # [3, 3+2, 3+2+1, 3+2+1+4]
- [3, 5, 6, 10]
+.. seealso::
- >>> x = random.random() * cumdist[-1]
- >>> choices[bisect.bisect(cumdist, x)]
- 'Blue'
+ `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
+ a video tutorial by
+ `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
+ on statistical analysis using just a few fundamental concepts
+ including simulation, sampling, shuffling, and cross-validation.
+
+ `Economics Simulation
+ <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
+ a simulation of a marketplace by
+ `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
+ use of many of the tools and distributions provided by this module
+ (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
+
+ `A Concrete Introduction to Probability (using Python)
+ <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
+ a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
+ the basics of probability theory, how to write simulations, and
+ how to perform data analysis using Python.