diff options
Diffstat (limited to 'Doc/library/random.rst')
-rw-r--r-- | Doc/library/random.rst | 263 |
1 files changed, 197 insertions, 66 deletions
diff --git a/Doc/library/random.rst b/Doc/library/random.rst index 8ae242c029..4f251574a3 100644 --- a/Doc/library/random.rst +++ b/Doc/library/random.rst @@ -46,10 +46,24 @@ from sources provided by the operating system. .. warning:: The pseudo-random generators of this module should not be used for - security purposes. + security purposes. For security or cryptographic uses, see the + :mod:`secrets` module. +.. seealso:: + + M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally + equidistributed uniform pseudorandom number generator", ACM Transactions on + Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998. + + + `Complementary-Multiply-with-Carry recipe + <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative + random number generator with a long period and comparatively simple update + operations. -Bookkeeping functions: + +Bookkeeping functions +--------------------- .. function:: seed(a=None, version=2) @@ -93,7 +107,8 @@ Bookkeeping functions: :meth:`randrange` to handle arbitrarily large ranges. -Functions for integers: +Functions for integers +---------------------- .. function:: randrange(stop) randrange(start, stop[, step]) @@ -116,23 +131,55 @@ Functions for integers: ``randrange(a, b+1)``. -Functions for sequences: +Functions for sequences +----------------------- .. function:: choice(seq) Return a random element from the non-empty sequence *seq*. If *seq* is empty, raises :exc:`IndexError`. +.. function:: choices(population, weights=None, *, cum_weights=None, k=1) + + Return a *k* sized list of elements chosen from the *population* with replacement. + If the *population* is empty, raises :exc:`IndexError`. + + If a *weights* sequence is specified, selections are made according to the + relative weights. Alternatively, if a *cum_weights* sequence is given, the + selections are made according to the cumulative weights (perhaps computed + using :func:`itertools.accumulate`). For example, the relative weights + ``[10, 5, 30, 5]`` are equivalent to the cumulative weights + ``[10, 15, 45, 50]``. Internally, the relative weights are converted to + cumulative weights before making selections, so supplying the cumulative + weights saves work. + + If neither *weights* nor *cum_weights* are specified, selections are made + with equal probability. If a weights sequence is supplied, it must be + the same length as the *population* sequence. It is a :exc:`TypeError` + to specify both *weights* and *cum_weights*. + + The *weights* or *cum_weights* can use any numeric type that interoperates + with the :class:`float` values returned by :func:`random` (that includes + integers, floats, and fractions but excludes decimals). + + .. versionadded:: 3.6 + .. function:: shuffle(x[, random]) - Shuffle the sequence *x* in place. The optional argument *random* is a - 0-argument function returning a random float in [0.0, 1.0); by default, this is - the function :func:`.random`. + Shuffle the sequence *x* in place. + + The optional argument *random* is a 0-argument function returning a random + float in [0.0, 1.0); by default, this is the function :func:`.random`. + + To shuffle an immutable sequence and return a new shuffled list, use + ``sample(x, k=len(x))`` instead. - Note that for even rather small ``len(x)``, the total number of permutations of - *x* is larger than the period of most random number generators; this implies - that most permutations of a long sequence can never be generated. + Note that even for small ``len(x)``, the total number of permutations of *x* + can quickly grow larger than the period of most random number generators. + This implies that most permutations of a long sequence can never be + generated. For example, a sequence of length 2080 is the largest that + can fit within the period of the Mersenne Twister random number generator. .. function:: sample(population, k) @@ -149,13 +196,16 @@ Functions for sequences: Members of the population need not be :term:`hashable` or unique. If the population contains repeats, then each occurrence is a possible selection in the sample. - To choose a sample from a range of integers, use an :func:`range` object as an + To choose a sample from a range of integers, use a :func:`range` object as an argument. This is especially fast and space efficient for sampling from a large - population: ``sample(range(10000000), 60)``. + population: ``sample(range(10000000), k=60)``. If the sample size is larger than the population size, a :exc:`ValueError` is raised. +Real-valued distributions +------------------------- + The following functions generate specific real-valued distributions. Function parameters are named after the corresponding variables in the distribution's equation, as used in common mathematical practice; most of these equations can @@ -250,7 +300,8 @@ be found in any statistics text. parameter. -Alternative Generator: +Alternative Generator +--------------------- .. class:: SystemRandom([seed]) @@ -262,19 +313,6 @@ Alternative Generator: :exc:`NotImplementedError` if called. -.. seealso:: - - M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally - equidistributed uniform pseudorandom number generator", ACM Transactions on - Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998. - - - `Complementary-Multiply-with-Carry recipe - <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative - random number generator with a long period and comparatively simple update - operations. - - Notes on Reproducibility ------------------------ @@ -296,53 +334,146 @@ change across Python versions, but two aspects are guaranteed not to change: Examples and Recipes -------------------- -Basic usage:: +Basic examples:: - >>> random.random() # Random float x, 0.0 <= x < 1.0 + >>> random() # Random float: 0.0 <= x < 1.0 0.37444887175646646 - >>> random.uniform(1, 10) # Random float x, 1.0 <= x < 10.0 - 1.1800146073117523 + >>> uniform(2.5, 10.0) # Random float: 2.5 <= x < 10.0 + 3.1800146073117523 + + >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds + 5.148957571865031 - >>> random.randrange(10) # Integer from 0 to 9 + >>> randrange(10) # Integer from 0 to 9 inclusive 7 - >>> random.randrange(0, 101, 2) # Even integer from 0 to 100 + >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive 26 - >>> random.choice('abcdefghij') # Single random element - 'c' - - >>> items = [1, 2, 3, 4, 5, 6, 7] - >>> random.shuffle(items) - >>> items - [7, 3, 2, 5, 6, 4, 1] - - >>> random.sample([1, 2, 3, 4, 5], 3) # Three samples without replacement - [4, 1, 5] - -A common task is to make a :func:`random.choice` with weighted probabilities. + >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence + 'draw' + + >>> deck = 'ace two three four'.split() + >>> shuffle(deck) # Shuffle a list + >>> deck + ['four', 'two', 'ace', 'three'] + + >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement + [40, 10, 50, 30] + +Simulations:: + + >>> # Six roulette wheel spins (weighted sampling with replacement) + >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6) + ['red', 'green', 'black', 'black', 'red', 'black'] + + >>> # Deal 20 cards without replacement from a deck of 52 playing cards + >>> # and determine the proportion of cards with a ten-value + >>> # (a ten, jack, queen, or king). + >>> deck = collections.Counter(tens=16, low_cards=36) + >>> seen = sample(list(deck.elements()), k=20) + >>> seen.count('tens') / 20 + 0.15 + + >>> # Estimate the probability of getting 5 or more heads from 7 spins + >>> # of a biased coin that settles on heads 60% of the time. + >>> trial = lambda: choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5 + >>> sum(trial() for i in range(10000)) / 10000 + 0.4169 + + >>> # Probability of the median of 5 samples being in middle two quartiles + >>> trial = lambda : 2500 <= sorted(choices(range(10000), k=5))[2] < 7500 + >>> sum(trial() for i in range(10000)) / 10000 + 0.7958 + +Example of `statistical bootstrapping +<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling +with replacement to estimate a confidence interval for the mean of a sample of +size five:: + + # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm + from statistics import mean + from random import choices + + data = 1, 2, 4, 4, 10 + means = sorted(mean(choices(data, k=5)) for i in range(20)) + print(f'The sample mean of {mean(data):.1f} has a 90% confidence ' + f'interval from {means[1]:.1f} to {means[-2]:.1f}') + +Example of a `resampling permutation test +<https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_ +to determine the statistical significance or `p-value +<https://en.wikipedia.org/wiki/P-value>`_ of an observed difference +between the effects of a drug versus a placebo:: + + # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson + from statistics import mean + from random import shuffle + + drug = [54, 73, 53, 70, 73, 68, 52, 65, 65] + placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46] + observed_diff = mean(drug) - mean(placebo) + + n = 10000 + count = 0 + combined = drug + placebo + for i in range(n): + shuffle(combined) + new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):]) + count += (new_diff >= observed_diff) + + print(f'{n} label reshufflings produced only {count} instances with a difference') + print(f'at least as extreme as the observed difference of {observed_diff:.1f}.') + print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null') + print(f'hypothesis that there is no difference between the drug and the placebo.') + +Simulation of arrival times and service deliveries in a single server queue:: + + from random import expovariate, gauss + from statistics import mean, median, stdev + + average_arrival_interval = 5.6 + average_service_time = 5.0 + stdev_service_time = 0.5 + + num_waiting = 0 + arrivals = [] + starts = [] + arrival = service_end = 0.0 + for i in range(20000): + if arrival <= service_end: + num_waiting += 1 + arrival += expovariate(1.0 / average_arrival_interval) + arrivals.append(arrival) + else: + num_waiting -= 1 + service_start = service_end if num_waiting else arrival + service_time = gauss(average_service_time, stdev_service_time) + service_end = service_start + service_time + starts.append(service_start) + + waits = [start - arrival for arrival, start in zip(arrivals, starts)] + print(f'Mean wait: {mean(waits):.1f}. Stdev wait: {stdev(waits):.1f}.') + print(f'Median wait: {median(waits):.1f}. Max wait: {max(waits):.1f}.') -If the weights are small integer ratios, a simple technique is to build a sample -population with repeats:: - - >>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)] - >>> population = [val for val, cnt in weighted_choices for i in range(cnt)] - >>> population - ['Red', 'Red', 'Red', 'Blue', 'Blue', 'Yellow', 'Green', 'Green', 'Green', 'Green'] - - >>> random.choice(population) - 'Green' - -A more general approach is to arrange the weights in a cumulative distribution -with :func:`itertools.accumulate`, and then locate the random value with -:func:`bisect.bisect`:: - - >>> choices, weights = zip(*weighted_choices) - >>> cumdist = list(itertools.accumulate(weights)) - >>> cumdist # [3, 3+2, 3+2+1, 3+2+1+4] - [3, 5, 6, 10] +.. seealso:: - >>> x = random.random() * cumdist[-1] - >>> choices[bisect.bisect(cumdist, x)] - 'Blue' + `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_ + a video tutorial by + `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_ + on statistical analysis using just a few fundamental concepts + including simulation, sampling, shuffling, and cross-validation. + + `Economics Simulation + <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_ + a simulation of a marketplace by + `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective + use of many of the tools and distributions provided by this module + (gauss, uniform, sample, betavariate, choice, triangular, and randrange). + + `A Concrete Introduction to Probability (using Python) + <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_ + a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering + the basics of probability theory, how to write simulations, and + how to perform data analysis using Python. |