summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorViktor Dukhovni <ietf-dane@dukhovni.org>2021-10-09 02:22:36 -0400
committerViktor Dukhovni <ietf-dane@dukhovni.org>2021-10-09 02:31:29 -0400
commita4d04c7516e6b8955a7856b8d682f6888f9f7b13 (patch)
treefa8709e6255acf3c0630c8c35900218366fec36f
parent8afe02bdc66334588ad98c5053c6ad8b644a35ae (diff)
downloadhaskell-a4d04c7516e6b8955a7856b8d682f6888f9f7b13.tar.gz
Backport of !6555 to GHC 9.0
-rw-r--r--libraries/base/Data/Foldable.hs214
1 files changed, 156 insertions, 58 deletions
diff --git a/libraries/base/Data/Foldable.hs b/libraries/base/Data/Foldable.hs
index 4fb8b45ba9..a98156300b 100644
--- a/libraries/base/Data/Foldable.hs
+++ b/libraries/base/Data/Foldable.hs
@@ -52,7 +52,7 @@ module Data.Foldable (
-- * Overview
-- $overview
- -- ** Chirality
+ -- ** Expectation of efficient left-to-right iteration
-- $chirality
-- ** Recursive and corecursive reduction
@@ -97,6 +97,9 @@ module Data.Foldable (
-- * Notes
-- $notes
+ -- ** Generally linear-time `elem`
+ -- $linear
+
-- * See also
-- $also
) where
@@ -149,9 +152,11 @@ infix 4 `elem`, `notElem`
-- > | Node (Tree a) a (Tree a)
-- > deriving Foldable
--
--- A more detailed description can be found in the overview section of
+-- A more detailed description can be found in the __Overview__ section of
-- "Data.Foldable#overview".
--
+-- For the class laws see the __Laws__ section of "Data.Foldable#laws".
+--
class Foldable t where
{-# MINIMAL foldMap | foldr #-}
@@ -1007,7 +1012,7 @@ foldrM f z0 xs = foldl c return xs z0
-- Another way of thinking about @foldlM@ is that it amounts to an application
-- to @z@ of a Kleisli composition:
--
--- > foldrM f z t =
+-- > foldlM f z t =
-- > flip f a >=> flip f b >=> ... >=> flip f x >=> flip f y $ z
--
-- The monadic effects of @foldlM@ are sequenced from left to right.
@@ -1476,7 +1481,9 @@ this could be particularly bad (see #10830).
For the common case of lists, switching the implementations of maximumBy and
minimumBy to foldl1 solves the issue, assuming GHC's strictness analysis can then
-make these functions only use O(1) stack space. As of base 4.16, we have switched to employing foldl' over foldl1, not relying on GHC's optimiser. See https://gitlab.haskell.org/ghc/ghc/-/issues/17867 for more context.
+make these functions only use O(1) stack space. As of base 4.16, we have
+switched to employing foldl' over foldl1, not relying on GHC's optimiser. See
+https://gitlab.haskell.org/ghc/ghc/-/issues/17867 for more context.
-}
--------------
@@ -1484,17 +1491,18 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- $overview
--
-- #overview#
--- Foldable structures are reduced to a summary value by accumulating
--- contributions to the result one element at a time.
+-- The Foldabla class generalises some common "Data.List" functions to
+-- structures that can be reduced to a summary value one element at a time.
--
-- == Left and right folds
--
-- #leftright#
--- Merging the contribution of the current element with an accumulator value
--- from a partial result is performed by an /operator/ function, either
--- explicitly provided by the caller as in `foldr`, implicit count as in
--- `length`, or partly implicit as in `foldMap` (where each element is mapped
--- into a 'Monoid', and the monoid's `mappend` operator performs the merge).
+-- The contribution of each element to the final result is combined with an
+-- accumulator via a suitable /operator/. The operator may be explicitly
+-- provided by the caller as with `foldr` or may be implicit as in `length`.
+-- In the case of `foldMap`, the caller provides a function mapping each
+-- element into a suitable 'Monoid', which makes it possible to merge the
+-- per-element contributions via that monoid's `mappend` function.
--
-- A key distinction is between left-associative and right-associative
-- folds:
@@ -1539,7 +1547,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- The third and final argument is a @Foldable@ structure containing elements
-- @(a, b, c, &#x2026;)@.
--
--- * __'foldl''__ takes an operator function of the form:
+-- * __'foldl''__ takes an operator argument of the form:
--
-- @
-- f :: b -- accumulated fold of the initial elements
@@ -1561,7 +1569,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- strictness is space efficiency: the final result can be computed without
-- storing a potentially deep stack of lazy intermediate results.
--
--- * __`foldr`__ takes an operator function of the form:
+-- * __`foldr`__ takes an operator argument of the form:
--
-- @
-- f :: a -- current element
@@ -1584,7 +1592,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- `foldr` is well suited to define both [corecursive](#corec)
-- and [short-circuit](#short) reductions.
--
--- When the operator is always strict in the second argument, 'foldl'' is
+-- When the operator is always strict in its second argument, 'foldl'' is
-- generally a better choice than `foldr`. When `foldr` is called with a
-- strict operator, evaluation cannot begin until the last element is
-- reached, by which point a deep stack of pending function applications
@@ -1627,7 +1635,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- type-specific interfaces. If nevertheless a @Foldable@ instance is
-- provided, the material in the sections that follow applies to these also, by
-- replacing each method with one with the opposite associativity (when
--- available) and switching the order of arguments in the /operator/ function.
+-- available) and switching the order of arguments in the fold's /operator/.
--
-- You may need to pay careful attention to strictness of the fold's /operator/
-- when its strictness is different between its first and second argument.
@@ -1878,7 +1886,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- `all` :: Foldable t => (a -> Bool) -> t a -> Bool
-- @
--
--- * Many instances of '<|>' (e.g. the 'Maybe' instance) are conditionally
+-- * Many instances of @('<|>')@ (e.g. the 'Maybe' instance) are conditionally
-- lazy, and use or don't use their second argument depending on the value
-- of the first. These are used with the folds below, which terminate as
-- early as possible, but otherwise generally keep going. Some instances
@@ -1891,7 +1899,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- `msum` :: (Foldable t, MonadPlus m) => t (m a) -> m a
-- @
--
--- * Likewise, the '*>' operator in some `Applicative` functors, and '>>'
+-- * Likewise, the @('*>')@ operator in some `Applicative` functors, and @('>>')@
-- in some monads are conditionally lazy and can /short-circuit/ a chain of
-- computations. The below folds will terminate as early as possible, but
-- even infinite loops can be productive here, when evaluated solely for
@@ -1958,10 +1966,10 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- They do have specialised uses, but are best avoided when in doubt.
--
-- @
--- 'foldr'' :: (a -> b -> b) -> b -> t a -> b
--- `foldl` :: (b -> a -> b) -> b -> t a -> b
--- `foldl1` :: (a -> a -> a) -> t a -> a
--- `foldrM` :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m b
+-- 'foldr'' :: Foldable t => (a -> b -> b) -> b -> t a -> b
+-- 'foldl' :: Foldable t => (b -> a -> b) -> b -> t a -> b
+-- 'foldl1' :: Foldable t => (a -> a -> a) -> t a -> a
+-- 'foldrM' :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m b
-- @
--
-- The lazy left-folds (used corecursively) and 'foldrM' (used to sequence
@@ -2082,36 +2090,45 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
--
-- #strictlazy#
--
--- The left fold:
+-- Sometimes, it is useful for the result of applying 'foldr' to be a
+-- /function/. This is done by mapping the structure elements to functions
+-- with the same argument and result types. The per-element functions are then
+-- composed to give the final result.
--
--- @foldl' f z [a, b, &#x2026;, x, y]@
+-- For example, we can /flip/ the strict left fold 'foldl'' by writing:
--
--- can be expanded as:
+-- > foldl' f z xs = flippedFoldl' f xs z
--
--- @
--- id . g y . g x . &#x2026; . g b . g a $ z
--- where g = flip f
--- @
+-- with the function 'flippedFoldl'' defined as below, with 'seq' used to
+-- ensure the strictness in the accumulator:
+--
+-- > flippedFoldl' f [] z = z
+-- > flippedFoldl' f (x : xs) z = z `seq` flippedFoldl' f xs (f z x)
+--
+-- Rewriting to use lambdas, this is:
+--
+-- > flippedFoldl' f [] = \ b -> b
+-- > flippedFoldl' f (x : xs) = \ b -> b `seq` r (f b x)
+-- > where r = flippedFoldl' f xs
--
--- In which to maintain the expected strictness we need to perform function
--- application eagerly, and composition lazily. To that end we introduce a new
--- function @f'@ which maps each element @x@ to an eager application of @g x@
--- to its argument, followed by an application of a lazily computed composition
--- (@k@) of the @g e@ functions for the remaining elements @e@:
+-- The above has the form of a right fold, enabling a rewrite to:
--
--- > f' x k z = k $! (g x) z = k $! f z x
+-- > flippedFoldl' f = \ xs -> foldr f' id xs
+-- > where f' x r = \ b -> b `seq` r (f b x)
--
--- We see that a lazy 'foldr' of the @g e@ endomorphisms, with @f'@ as as the
--- operator, in fact yields a strict left fold, that avoids building a
--- deep chain of intermediate thunks:
+-- We can now unflip this to get 'foldl'':
--
--- > foldl' f z0 xs = foldr f' id xs z0
--- > where f' x k z = k $! f z x
+-- > foldl' f z = \ xs -> foldr f' id xs z
+-- > -- \ xs -> flippedFoldl' f xs z
+-- > where f' x r = \ b -> b `seq` r (f b x)
--
--- The function applied to @z0@ is built corecursively, and its terms are
--- applied eagerly to the accumulator before further terms are applied to
--- the result. So, as promised, this will run in constant space, and GHC
--- is able to optimise this to an efficient loop.
+-- The function __@foldr f' id xs@__ applied to @z@ is built corecursively, and
+-- its terms are applied to an eagerly evaluated accumulator before further
+-- terms are applied to the result. As required, this runs in constant space,
+-- and can be optimised to an efficient loop.
+--
+-- (The actual definition of 'foldl'' labels the lambdas in the definition of
+-- __@f'@__ above as /oneShot/, which enables further optimisations).
--------------
@@ -2311,15 +2328,53 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
--------------
-- $laws
+-- #laws#
+--
+-- The type constructor 'Endo' from "Data.Monoid", associates with each type
+-- __@b@__ the __@newtype@__-encapulated type of functions mapping __@b@__ to
+-- itself. Functions from a type to itself are called /endomorphisms/, hence
+-- the name /Endo/. The type __@Endo b@__ is a 'Monoid' under function
+-- composition:
+--
+-- > newtype Endo b = Endo { appEndo :: b -> b }
+-- > instance Semigroup Endo b where
+-- > Endo f <> Endo g = Endo (f . g)
+-- > instance Monoid Endo b where
+-- > mempty = Endo id
+--
+-- For every 'Monoid' m, we also have a 'Dual' monoid __@Dual m@__ which
+-- combines elements in the opposite order:
+--
+-- > newtype Dual m = Dual { getDual :: m }
+-- > instance Semigroup m => Semigroup Dual m where
+-- > Dual a <> Dual b = Dual (b <> a)
+-- > instance Monoid m => Monoid Dual m where
+-- > mempty = Dual mempty
+--
+-- With the above preliminaries out of the way, 'Foldable' instances are
+-- expected to satisfy the following laws:
--
--- @Foldable@ instances are expected to satisfy the following laws:
+-- The 'foldr' method must be equivalent in value and strictness to replacing
+-- each element __@a@__ of a 'Foldable' structure with __@Endo (f a)@__,
+-- composing these via 'foldMap' and applying the result to the base case
+-- __@z@__:
--
-- > foldr f z t = appEndo (foldMap (Endo . f) t ) z
--
+-- Likewise, the 'foldl' method must be equivalent in value and strictness
+-- to composing the functions __@flip f a@__ in reverse order and applying
+-- the result to the base case:
+--
-- > foldl f z t = appEndo (getDual (foldMap (Dual . Endo . flip f) t)) z
--
+-- When the elements of the structure are taken from a 'Monoid', the
+-- defintion of 'fold' must agree with __@foldMap id@__:
+--
-- > fold = foldMap id
--
+-- The 'length' method must agree with a 'foldMap' mapping each element to
+-- __@Sum 1@__ (The 'Sum' type abstracts numbers as a monoid under addition).
+--
-- > length = getSum . foldMap (Sum . const 1)
--
-- @sum@, @product@, @maximum@, and @minimum@ should all be essentially
@@ -2331,9 +2386,10 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- but are generally more efficient when defined more directly as:
--
-- > sum = foldl' (+) 0
--- > sum = foldl' (*) 1
+-- > product = foldl' (*) 1
--
--- If the type is also a 'Functor' instance, it should satisfy
+-- If the 'Foldable' structure has a 'Functor' instance, then for every
+-- function __@f@__ mapping the elements into a 'Monoid', it should satisfy:
--
-- > foldMap f = fold . fmap f
--
@@ -2347,23 +2403,22 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- $notes
--
-- #notes#
--- The absence of a 'Functor' superclass allows
--- @Foldable@ structures to impose constraints on their element types. Thus,
--- Sets are @Foldable@, even though @Set@ imposes an 'Ord' constraint on its
--- elements (this precludes defining a @Functor@ instance for @Set@).
+-- Since 'Foldable' does not have 'Functor' as a superclass, it is possible to
+-- define 'Foldable' instances for structures that constrain their element
+-- types. Therefore, __@Set@__ can be 'Foldable', even though sets keep their
+-- elements in ascending order. This requires the elements to be comparable,
+-- which precludes defining a 'Functor' instance for @Set@.
--
--- The @Foldable@ class makes it possible to use idioms familiar from the List
+-- The 'Foldable' class makes it possible to use idioms familiar from the @List@
-- type with container structures that are better suited to the task at hand.
--- This allows a user to substitute more appropriate @Foldable@ data types
--- for Lists without requiring new idioms (see [[1\]](#uselistsnot) for when
--- not to use lists).
+-- This supports use of more appropriate 'Foldable' data types, such as @Seq@,
+-- @Set@, @NonEmpty@, etc., without requiring new idioms (see
+-- [[1\]](#uselistsnot) for when not to use lists).
--
--- The more general methods of the @Foldable@ class are now exported by the
+-- The more general methods of the 'Foldable' class are now exported by the
-- "Prelude" in place of the original List-specific methods (see the
-- [FTP Proposal](https://wiki.haskell.org/Foldable_Traversable_In_Prelude)).
--- The List-specific variants are for now still available in "GHC.OldList", but
--- that module is intended only as a transitional aid, and may be removed in
--- the future.
+-- The List-specific variants are still available in "Data.List".
--
-- Surprises can arise from the @Foldable@ instance of the 2-tuple @(a,)@ which
-- now behaves as a 1-element @Foldable@ container in its second slot. In
@@ -2387,6 +2442,45 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- to something other than a list, the call-site will no longer compile until
-- appropriate changes are made.
+-- $linear
+--
+-- It is perhaps worth noting that since the __`elem`__ function in the
+-- Foldable class carries only an __`Eq`__ constraint on the element type,
+-- search for the presence or absence of an element in the structure generally
+-- takes \(\mathcal{O}(n)\) time, even for ordered structures like __@Set@__
+-- that are potentially capable of performing the search faster. (The @member@
+-- function of the @Set@ module carries an `Ord` constraint, and can perform
+-- the search in \(\mathcal{O}(log\ n)\) time).
+--
+-- An alternative to Foldable's __`elem`__ method is required in order to
+-- abstract potentially faster than linear search over general container
+-- structures. This can be achieved by defining an additional type class (e.g.
+-- @HasMember@ below). Instances of such a type class (that are also
+-- `Foldable') can employ the `elem` linear search as a last resort, when
+-- faster search is not supported.
+--
+-- > {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
+-- >
+-- > import qualified Data.Set as Set
+-- >
+-- > class Eq a => HasMember t a where
+-- > member :: a -> t a -> Bool
+-- >
+-- > instance Eq a => HasMember [] a where
+-- > member = elem
+-- > [...]
+-- > instance Ord a => HasMember Set.Set a where
+-- > member = Set.member
+--
+-- The above suggests that 'elem' may be a misfit in the 'Foldable' class.
+-- Alternative design ideas are solicited on GHC's bug tracker via issue
+-- [\#20421](https://gitlab.haskell.org/ghc/ghc/-/issues/20421).
+--
+-- Note that some structure-specific optimisations may of course be possible
+-- directly in the corresponding @Foldable@ instance, e.g. with @Set@ the size
+-- of the set is known in advance, without iterating to count the elements, and
+-- its `length` instance takes advantage of this to return the size directly.
+
--------------
-- $also
@@ -2400,3 +2494,7 @@ make these functions only use O(1) stack space. As of base 4.16, we have switche
-- by Jeremy Gibbons and Bruno Oliveira,
-- in /Mathematically-Structured Functional Programming/, 2006, online at
-- <http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/#iterator>.
+--
+-- * [3] \"A tutorial on the universality and expressiveness of fold\",
+-- by Graham Hutton, J\. Functional Programming 9 (4): 355–372, July 1999,
+-- online at <http://www.cs.nott.ac.uk/~pszgmh/fold.pdf>.