1 files changed, 547 insertions, 0 deletions
diff --git a/Doc/library/itertools.rst b/Doc/library/itertools.rst
new file mode 100644
index 0000000000..9f9cb24864
--- /dev/null
+++ b/Doc/library/itertools.rst
@@ -0,0 +1,547 @@
+
+:mod:`itertools` --- Functions creating iterators for efficient looping
+=======================================================================
+
+.. module:: itertools
+   :synopsis: Functions creating iterators for efficient looping.
+.. moduleauthor:: Raymond Hettinger <python@rcn.com>
+.. sectionauthor:: Raymond Hettinger <python@rcn.com>
+
+
+.. versionadded:: 2.3
+
+This module implements a number of iterator building blocks inspired by
+constructs from the Haskell and SML programming languages.  Each has been recast
+in a form suitable for Python.
+
+The module standardizes a core set of fast, memory efficient tools that are
+useful by themselves or in combination.  Standardization helps avoid the
+readability and reliability problems which arise when many different individuals
+create their own slightly varying implementations, each with their own quirks
+and naming conventions.
+
+The tools are designed to combine readily with one another.  This makes it easy
+to construct more specialized tools succinctly and efficiently in pure Python.
+
+For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
+sequence ``f(0), f(1), ...``.  This toolbox provides :func:`imap` and
+:func:`count` which can be combined to form ``imap(f, count())`` and produce an
+equivalent result.
+
+Likewise, the functional tools are designed to work well with the high-speed
+functions provided by the :mod:`operator` module.
+
+The module author welcomes suggestions for other basic building blocks to be
+added to future versions of the module.
+
+Whether cast in pure python form or compiled code, tools that use iterators are
+more memory efficient (and faster) than their list based counterparts. Adopting
+the principles of just-in-time manufacturing, they create data when and where
+needed instead of consuming memory with the computer equivalent of "inventory".
+
+The performance advantage of iterators becomes more acute as the number of
+elements increases -- at some point, lists grow large enough to severely impact
+memory cache performance and start running slowly.
+
+
+.. seealso::
+
+   The Standard ML Basis Library, `The Standard ML Basis Library
+   <http://www.standardml.org/Basis/>`_.
+
+   Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
+   Libraries <http://www.haskell.org/definition/>`_.
+
+
+.. _itertools-functions:
+
+Itertool functions
+------------------
+
+The following module functions all construct and return iterators. Some provide
+streams of infinite length, so they should only be accessed by functions or
+loops that truncate the stream.
+
+
+.. function:: chain(*iterables)
+
+   Make an iterator that returns elements from the first iterable until it is
+   exhausted, then proceeds to the next iterable, until all of the iterables are
+   exhausted.  Used for treating consecutive sequences as a single sequence.
+   Equivalent to::
+
+      def chain(*iterables):
+          for it in iterables:
+              for element in it:
+                  yield element
+
+
+.. function:: count([n])
+
+   Make an iterator that returns consecutive integers starting with *n*. If not
+   specified *n* defaults to zero.   Does not currently support python long
+   integers.  Often used as an argument to :func:`imap` to generate consecutive
+   data points. Also, used with :func:`izip` to add sequence numbers.  Equivalent
+   to::
+
+      def count(n=0):
+          while True:
+              yield n
+              n += 1
+
+   Note, :func:`count` does not check for overflow and will return negative numbers
+   after exceeding ``sys.maxint``.  This behavior may change in the future.
+
+
+.. function:: cycle(iterable)
+
+   Make an iterator returning elements from the iterable and saving a copy of each.
+   When the iterable is exhausted, return elements from the saved copy.  Repeats
+   indefinitely.  Equivalent to::
+
+      def cycle(iterable):
+          saved = []
+          for element in iterable:
+              yield element
+              saved.append(element)
+          while saved:
+              for element in saved:
+                    yield element
+
+   Note, this member of the toolkit may require significant auxiliary storage
+   (depending on the length of the iterable).
+
+
+.. function:: dropwhile(predicate, iterable)
+
+   Make an iterator that drops elements from the iterable as long as the predicate
+   is true; afterwards, returns every element.  Note, the iterator does not produce
+   *any* output until the predicate first becomes false, so it may have a lengthy
+   start-up time.  Equivalent to::
+
+      def dropwhile(predicate, iterable):
+          iterable = iter(iterable)
+          for x in iterable:
+              if not predicate(x):
+                  yield x
+                  break
+          for x in iterable:
+              yield x
+
+
+.. function:: groupby(iterable[, key])
+
+   Make an iterator that returns consecutive keys and groups from the *iterable*.
+   The *key* is a function computing a key value for each element.  If not
+   specified or is ``None``, *key* defaults to an identity function and returns
+   the element unchanged.  Generally, the iterable needs to already be sorted on
+   the same key function.
+
+   The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix.  It
+   generates a break or new group every time the value of the key function changes
+   (which is why it is usually necessary to have sorted the data using the same key
+   function).  That behavior differs from SQL's GROUP BY which aggregates common
+   elements regardless of their input order.
+
+   The returned group is itself an iterator that shares the underlying iterable
+   with :func:`groupby`.  Because the source is shared, when the :func:`groupby`
+   object is advanced, the previous group is no longer visible.  So, if that data
+   is needed later, it should be stored as a list::
+
+      groups = []
+      uniquekeys = []
+      data = sorted(data, key=keyfunc)
+      for k, g in groupby(data, keyfunc):
+          groups.append(list(g))      # Store group iterator as a list
+          uniquekeys.append(k)
+
+   :func:`groupby` is equivalent to::
+
+      class groupby(object):
+          def __init__(self, iterable, key=None):
+              if key is None:
+                  key = lambda x: x
+              self.keyfunc = key
+              self.it = iter(iterable)
+              self.tgtkey = self.currkey = self.currvalue = []
+          def __iter__(self):
+              return self
+          def __next__(self):
+              while self.currkey == self.tgtkey:
+                  self.currvalue = next(self.it) # Exit on StopIteration
+                  self.currkey = self.keyfunc(self.currvalue)
+              self.tgtkey = self.currkey
+              return (self.currkey, self._grouper(self.tgtkey))
+          def _grouper(self, tgtkey):
+              while self.currkey == tgtkey:
+                  yield self.currvalue
+                  self.currvalue = next(self.it) # Exit on StopIteration
+                  self.currkey = self.keyfunc(self.currvalue)
+
+   .. versionadded:: 2.4
+
+
+.. function:: ifilter(predicate, iterable)
+
+   Make an iterator that filters elements from iterable returning only those for
+   which the predicate is ``True``. If *predicate* is ``None``, return the items
+   that are true. Equivalent to::
+
+      def ifilter(predicate, iterable):
+          if predicate is None:
+              predicate = bool
+          for x in iterable:
+              if predicate(x):
+                  yield x
+
+
+.. function:: ifilterfalse(predicate, iterable)
+
+   Make an iterator that filters elements from iterable returning only those for
+   which the predicate is ``False``. If *predicate* is ``None``, return the items
+   that are false. Equivalent to::
+
+      def ifilterfalse(predicate, iterable):
+          if predicate is None:
+              predicate = bool
+          for x in iterable:
+              if not predicate(x):
+                  yield x
+
+
+.. function:: imap(function, *iterables)
+
+   Make an iterator that computes the function using arguments from each of the
+   iterables.  If *function* is set to ``None``, then :func:`imap` returns the
+   arguments as a tuple.  Like :func:`map` but stops when the shortest iterable is
+   exhausted instead of filling in ``None`` for shorter iterables.  The reason for
+   the difference is that infinite iterator arguments are typically an error for
+   :func:`map` (because the output is fully evaluated) but represent a common and
+   useful way of supplying arguments to :func:`imap`. Equivalent to::
+
+      def imap(function, *iterables):
+          iterables = map(iter, iterables)
+          while True:
+              args = [next(i) for i in iterables]
+              if function is None:
+                  yield tuple(args)
+              else:
+                  yield function(*args)
+
+
+.. function:: islice(iterable, [start,] stop [, step])
+
+   Make an iterator that returns selected elements from the iterable. If *start* is
+   non-zero, then elements from the iterable are skipped until start is reached.
+   Afterward, elements are returned consecutively unless *step* is set higher than
+   one which results in items being skipped.  If *stop* is ``None``, then iteration
+   continues until the iterator is exhausted, if at all; otherwise, it stops at the
+   specified position.  Unlike regular slicing, :func:`islice` does not support
+   negative values for *start*, *stop*, or *step*.  Can be used to extract related
+   fields from data where the internal structure has been flattened (for example, a
+   multi-line report may list a name field on every third line).  Equivalent to::
+
+      def islice(iterable, *args):
+          s = slice(*args)
+          it = iter(range(s.start or 0, s.stop or sys.maxint, s.step or 1))
+          nexti = next(it)
+          for i, element in enumerate(iterable):
+              if i == nexti:
+                  yield element
+                  nexti = next(it)
+
+   If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
+   then the step defaults to one.
+
+   .. versionchanged:: 2.5
+      accept ``None`` values for default *start* and *step*.
+
+
+.. function:: izip(*iterables)
+
+   Make an iterator that aggregates elements from each of the iterables. Like
+   :func:`zip` except that it returns an iterator instead of a list.  Used for
+   lock-step iteration over several iterables at a time.  Equivalent to::
+
+      def izip(*iterables):
+          iterables = map(iter, iterables)
+          while iterables:
+              result = [next(it) for it in iterables]
+              yield tuple(result)
+
+   .. versionchanged:: 2.4
+      When no iterables are specified, returns a zero length iterator instead of
+      raising a :exc:`TypeError` exception.
+
+   Note, the left-to-right evaluation order of the iterables is guaranteed. This
+   makes possible an idiom for clustering a data series into n-length groups using
+   ``izip(*[iter(s)]*n)``.  For data that doesn't fit n-length groups exactly, the
+   last tuple can be pre-padded with fill values using ``izip(*[chain(s,
+   [None]*(n-1))]*n)``.
+
+   Note, when :func:`izip` is used with unequal length inputs, subsequent
+   iteration over the longer iterables cannot reliably be continued after
+   :func:`izip` terminates.  Potentially, up to one entry will be missing from
+   each of the left-over iterables. This occurs because a value is fetched from
+   each iterator in- turn, but the process ends when one of the iterators
+   terminates.  This leaves the last fetched values in limbo (they cannot be
+   returned in a final, incomplete tuple and they are cannot be pushed back into
+   the iterator for retrieval with ``next(it)``).  In general, :func:`izip`
+   should only be used with unequal length inputs when you don't care about
+   trailing, unmatched values from the longer iterables.
+
+
+.. function:: izip_longest(*iterables[, fillvalue])
+
+   Make an iterator that aggregates elements from each of the iterables. If the
+   iterables are of uneven length, missing values are filled-in with *fillvalue*.
+   Iteration continues until the longest iterable is exhausted.  Equivalent to::
+
+      def izip_longest(*args, **kwds):
+          fillvalue = kwds.get('fillvalue')
+          def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
+              yield counter()         # yields the fillvalue, or raises IndexError
+          fillers = repeat(fillvalue)
+          iters = [chain(it, sentinel(), fillers) for it in args]
+          try:
+              for tup in izip(*iters):
+                  yield tup
+          except IndexError:
+              pass
+
+   If one of the iterables is potentially infinite, then the :func:`izip_longest`
+   function should be wrapped with something that limits the number of calls (for
+   example :func:`islice` or :func:`takewhile`).
+
+   .. versionadded:: 2.6
+
+
+.. function:: repeat(object[, times])
+
+   Make an iterator that returns *object* over and over again. Runs indefinitely
+   unless the *times* argument is specified. Used as argument to :func:`imap` for
+   invariant parameters to the called function.  Also used with :func:`izip` to
+   create an invariant part of a tuple record.  Equivalent to::
+
+      def repeat(object, times=None):
+          if times is None:
+              while True:
+                  yield object
+          else:
+              for i in range(times):
+                  yield object
+
+
+.. function:: starmap(function, iterable)
+
+   Make an iterator that computes the function using arguments tuples obtained from
+   the iterable.  Used instead of :func:`imap` when argument parameters are already
+   grouped in tuples from a single iterable (the data has been "pre-zipped").  The
+   difference between :func:`imap` and :func:`starmap` parallels the distinction
+   between ``function(a,b)`` and ``function(*c)``. Equivalent to::
+
+      def starmap(function, iterable):
+          iterable = iter(iterable)
+          while True:
+              yield function(*next(iterable))
+
+
+.. function:: takewhile(predicate, iterable)
+
+   Make an iterator that returns elements from the iterable as long as the
+   predicate is true.  Equivalent to::
+
+      def takewhile(predicate, iterable):
+          for x in iterable:
+              if predicate(x):
+                  yield x
+              else:
+                  break
+
+
+.. function:: tee(iterable[, n=2])
+
+   Return *n* independent iterators from a single iterable. The case where ``n==2``
+   is equivalent to::
+
+      def tee(iterable):
+          def gen(next, data={}, cnt=[0]):
+              for i in count():
+                  if i == cnt[0]:
+                      item = data[i] = next()
+                      cnt[0] += 1
+                  else:
+                      item = data.pop(i)
+                  yield item
+          it = iter(iterable)
+          return (gen(it.__next__), gen(it.__next__))
+
+   Note, once :func:`tee` has made a split, the original *iterable* should not be
+   used anywhere else; otherwise, the *iterable* could get advanced without the tee
+   objects being informed.
+
+   Note, this member of the toolkit may require significant auxiliary storage
+   (depending on how much temporary data needs to be stored). In general, if one
+   iterator is going to use most or all of the data before the other iterator, it
+   is faster to use :func:`list` instead of :func:`tee`.
+
+   .. versionadded:: 2.4
+
+
+.. _itertools-example:
+
+Examples
+--------
+
+The following examples show common uses for each tool and demonstrate ways they
+can be combined. ::
+
+   >>> amounts = [120.15, 764.05, 823.14]
+   >>> for checknum, amount in izip(count(1200), amounts):
+   ...     print 'Check %d is for $%.2f' % (checknum, amount)
+   ...
+   Check 1200 is for $120.15
+   Check 1201 is for $764.05
+   Check 1202 is for $823.14
+
+   >>> import operator
+   >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
+   ...    print cube
+   ...
+   1
+   8
+   27
+   64
+
+   >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
+   ...                '', 'martin', '', 'walter', '', 'mark']
+   >>> for name in islice(reportlines, 3, None, 2):
+   ...    print name.title()
+   ...
+   Alex
+   Laura
+   Martin
+   Walter
+   Mark
+
+   # Show a dictionary sorted and grouped by value
+   >>> from operator import itemgetter
+   >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
+   >>> di = sorted(d.iteritems(), key=itemgetter(1))
+   >>> for k, g in groupby(di, key=itemgetter(1)):
+   ...     print k, map(itemgetter(0), g)
+   ...
+   1 ['a', 'c', 'e']
+   2 ['b', 'd', 'f']
+   3 ['g']
+
+   # Find runs of consecutive numbers using groupby.  The key to the solution
+   # is differencing with a range so that consecutive numbers all appear in
+   # same group.
+   >>> data = [ 1,  4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
+   >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
+   ...     print map(operator.itemgetter(1), g)
+   ... 
+   [1]
+   [4, 5, 6]
+   [10]
+   [15, 16, 17, 18]
+   [22]
+   [25, 26, 27, 28]
+
+
+
+.. _itertools-recipes:
+
+Recipes
+-------
+
+This section shows recipes for creating an extended toolset using the existing
+itertools as building blocks.
+
+The extended tools offer the same high performance as the underlying toolset.
+The superior memory performance is kept by processing elements one at a time
+rather than bringing the whole iterable into memory all at once. Code volume is
+kept small by linking the tools together in a functional style which helps
+eliminate temporary variables.  High speed is retained by preferring
+"vectorized" building blocks over the use of for-loops and generators which
+incur interpreter overhead. ::
+
+   def take(n, seq):
+       return list(islice(seq, n))
+
+   def enumerate(iterable):
+       return izip(count(), iterable)
+
+   def tabulate(function):
+       "Return function(0), function(1), ..."
+       return imap(function, count())
+
+   def iteritems(mapping):
+       return izip(mapping.iterkeys(), mapping.itervalues())
+
+   def nth(iterable, n):
+       "Returns the nth item or raise StopIteration"
+       return islice(iterable, n, None).next()
+
+   def all(seq, pred=None):
+       "Returns True if pred(x) is true for every element in the iterable"
+       for elem in ifilterfalse(pred, seq):
+           return False
+       return True
+
+   def any(seq, pred=None):
+       "Returns True if pred(x) is true for at least one element in the iterable"
+       for elem in ifilter(pred, seq):
+           return True
+       return False
+
+   def no(seq, pred=None):
+       "Returns True if pred(x) is false for every element in the iterable"
+       for elem in ifilter(pred, seq):
+           return False
+       return True
+
+   def quantify(seq, pred=None):
+       "Count how many times the predicate is true in the sequence"
+       return sum(imap(pred, seq))
+
+   def padnone(seq):
+       """Returns the sequence elements and then returns None indefinitely.
+
+       Useful for emulating the behavior of the built-in map() function.
+       """
+       return chain(seq, repeat(None))
+
+   def ncycles(seq, n):
+       "Returns the sequence elements n times"
+       return chain(*repeat(seq, n))
+
+   def dotproduct(vec1, vec2):
+       return sum(imap(operator.mul, vec1, vec2))
+
+   def flatten(listOfLists):
+       return list(chain(*listOfLists))
+
+   def repeatfunc(func, times=None, *args):
+       """Repeat calls to func with specified arguments.
+
+       Example:  repeatfunc(random.random)
+       """
+       if times is None:
+           return starmap(func, repeat(args))
+       else:
+           return starmap(func, repeat(args, times))
+
+   def pairwise(iterable):
+       "s -> (s0,s1), (s1,s2), (s2, s3), ..."
+       a, b = tee(iterable)
+       next(b, None)
+       return izip(a, b)
+
+   def grouper(n, iterable, padvalue=None):
+       "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
+       return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
+
+
+