From 0065d5ab628975892cea1ec7303f968c3338cbe1 Mon Sep 17 00:00:00 2001 From: Simon Marlow Date: Fri, 7 Apr 2006 02:05:11 +0000 Subject: Reorganisation of the source tree Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too. --- docs/comm/exts/ndp.html | 360 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 360 insertions(+) create mode 100644 docs/comm/exts/ndp.html (limited to 'docs/comm/exts/ndp.html') diff --git a/docs/comm/exts/ndp.html b/docs/comm/exts/ndp.html new file mode 100644 index 0000000000..0c94c3960b --- /dev/null +++ b/docs/comm/exts/ndp.html @@ -0,0 +1,360 @@ + + + + + The GHC Commentary - Parallel Arrays + + + +

The GHC Commentary - Parallel Arrays

+ This section describes an experimental extension by high-performance + arrays, which comprises special syntax for array types and array + comprehensions, a set of optimising program transformations, and a set + of special purpose libraries. The extension is currently only partially + implemented, but the development will be tracked here. +

+ Parallel arrays originally got their name from the aim to provide an + architecture-independent programming model for a range of parallel + computers. However, since experiments showed that the approach is also + worthwhile for sequential array code, the emphasis has shifted to their + parallel evaluation semantics: As soon as any element in a parallel + array is demanded, all the other elements are evaluated, too. This + makes parallel arrays more strict than standard Haskell 98 + arrays, but also opens the door for a loop-based implementation + strategy that leads to significantly more efficient code. +

+ The programming model as well as the use of the flattening + transformation, which is central to the approach, has its origin in + the programming language Nesl. + +

More Sugar: Special Syntax for Array Comprehensions

+ The option -fparr, which is a dynamic hsc option that can + be reversed with -fno-parr, enables special syntax for + parallel arrays, which is not essential to using parallel arrays, but + makes for significantly more concise programs. The switch works by + making the lexical analyser (located in Lex.lhs) + recognise the tokens [: and :]. Given that + the additional productions in the parser (located in Parser.y) + cannot be triggered without the lexer generating the necessary tokens, + there is no need to alter the behaviour of the parser. +

+ The following additional syntax is accepted (the non-terminals are those + from the Haskell 98 language + definition): +

+atype -> '[:' type ':]				     (parallel array type)
+
+aexp  -> '[:' exp1 ',' ... ',' expk ':]'             (explicit array, k >= 0)
+      |  '[:' exp1 [',' exp2] '..' exp3 ':]'	     (arithmetic array sequence)
+      |  '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)
+
+quals -> qual1 ',' ... ',' qualn	             (qualifier list, n >= 1)
+
+apat  -> '[:' pat1 ',' ... ',' patk ':]'	     (array pattern, k >= 0)
+

+ Moreover, the extended comprehension syntax that allows for parallel + qualifiers (i.e., qualifiers separated by "|") is also + supported in list comprehensions. In general, the similarity to the + special syntax for list is obvious. The two main differences are that + (a) arithmetic array sequences are always finite and (b) + [::] is not treated as a constructor in expressions and + patterns, but rather as a special case of the explicit array syntax. + The former is a simple consequence of the parallel evaluation semantics + of parallel arrays and the latter is due to arrays not being a + constructor-based data type. +

+ As a naming convention, types and functions that are concerned with + parallel arrays usually contain the string parr or + PArr (often as a prefix), and where corresponding types or + functions for handling lists exist, the name is identical, except for + containing the substring parr instead of list + (possibly in caps). +

+ The following implications are worth noting explicitly: +

As the value and pattern [::] is an empty explicit + parallel array (i.e., something of the form ExplicitPArr ty + [] in the AST). This is in contrast to lists, which use the + nil-constructor instead. In the case of parallel arrays, using a + constructor would be rather awkward, as it is not a constructor-based + type. (This becomes rather clear in the desugarer.) +
As a consequence, array patterns have the general form [:p1, + p2, ..., pn:], where n >= 0. Thus, two array + patterns overlap iff they have the same length -- an important property + for the pattern matching compiler. +

+ +

Prelude Support for Parallel Arrays

+ The Prelude module PrelPArr + defines the standard operations and their types on parallel arrays and + provides a basic implementation based on boxed arrays. The interface of + PrelPArr is oriented by H98's PrelList, but + leaving out all functions that require infinite structures and adding + frequently needed array operations, such as permutations. Parallel + arrays are quite unqiue in that they use an entirely different + representation as soon as the flattening transformation is activated, + which is described further below. In particular, PrelPArr + defines the type [::] and operations to create, process, + and inspect parallel arrays. The type as well as the names of some of + the operations are also hardwired in TysWiredIn + (see the definition of parrTyCon in this module) and PrelNames. + This is again very much like the case of lists, where the type is + defined in PrelBase + and similarly wired in; however, for lists the entirely + constructor-based definition is exposed to user programs, which is not + the case for parallel arrays. + +

Desugaring Parallel Arrays

+ The parallel array extension requires the desugarer to replace all + occurrences of (1) explicit parallel arrays, (2) array patterns, and (3) + array comprehensions by a suitable combination of invocations of + operations defined in PrelPArr. + +

Explicit Parallel Arrays

+ These are by far the simplest to remove. We simply replace every + occurrence of [:e₁, ..., + e_n:] by +

+ + toP [e₁, ..., e_n] + +

+ i.e., we build a list of the array elements, which is, then, converted + into a parallel array. + +

Parallel Array Patterns

+ Array patterns are much more tricky as element positions may contain + further patterns and the pattern matching compiler + requires us to flatten all those out. But before we turn to the gory + details, here first the basic idea: A flat array pattern matches exactly + iff it's length corresponds to the length of the matched array. Hence, + if we have a set of flat array patterns matching an array value + a, it suffices to generate a Core case + expression that scrutinises lengthP a and has one + alternative for every length of array occuring in one of the patterns. + Moreover, there needs to be a default case catching all other array + lengths. In each alternative, array indexing (i.e., the functions + !:) is used to bind array elements to the corresponding + pattern variables. This sounds easy enough and is essentially what the + parallel array equation of the function DsUtils.mkCoAlgCaseMatchResult + does. +

+ Unfortunately, however, the pattern matching compiler expects that it + can turn (almost) any pattern into variable patterns, literals, or + constructor applications by way of the functions Match.tidy1. + And to make matters worse, some weird machinery in the module Check + insists on being able to reverse the process (essentially to pretty + print patterns in warnings for incomplete or overlapping patterns). +

+ The solution to this is an (unlimited) set of fake constructors + for parallel arrays, courtesy of TysWiredIn.parrFakeCon. + In other words, any pattern of the form [:p₁, + ..., p_n:] is transformed into +

+ + MkPArrayn p₁ ... p_n + +

+ by Match.tidy1, then, run through the rest of the pattern + matching compiler, and finally, picked up by + DsUtils.mkCoAlgCaseMatchResult, which converts it into a + case expression as outlined above. +

+ As an example consider the source expression +

+case v of
+  [:x1:]         -> e1
+  [:x2, x3, x4:] -> e2
+  _		 -> e3

+ Match.tidy1 converts it into a form that is equivalent to +

+case v of {
+  MkPArr1 x1       -> e1;
+  MkPArr2 x2 x3 x4 -> e2;
+  _	           -> e3;
+}

+ which DsUtils.mkCoAlgCaseMatchResult turns into the + following Core code: +

+      case lengthP v of
+        Int# i# -> 
+	  case i# of l {
+	    DFT ->					  e3
+	    1   -> let x1 = v!:0                       in e1
+	    3   -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
+	  }

+ +

Parallel Array Comprehensions

+ The most challenging construct of the three are array comprehensions. + In principle, it would be possible to transform them in essentially the + same way as list comprehensions, but this would lead to abysmally slow + code as desugaring of list comprehensions generates code that is + optimised for sequential, constructor-based structures. In contrast, + array comprehensions need to be transformed into code that solely relies + on collective operations and avoids the creation of many small + intermediate arrays. +

+ The transformation is implemented by the function DsListComp.dsPArrComp. + In the following, we denote this transformation function by the form + <<e>> pa ea, where e + is the comprehension to be compiled and the arguments pa + and ea denote a pattern and the currently processed array + expression, respectively. The invariant constraining these two + arguments is that all elements in the array produced by ea + will successfully match against pa. +

+ Given a source-level comprehensions [:e | qss:], we compile + it with <<[:e | qss:]>> () [:():] using the + rules +

+<<[:e' |           :]>> pa ea = mapP (\pa -> e') ea
+<<[:e' | b     , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
+<<[:e' | p <- e, qs:]>> pa ea = 
+  let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
+  in
+  <<[:e' | qs:]>> (pa, p) (crossP ea ef)
+<<[:e' | let ds, qs:]>> pa ea = 
+  <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) 
+    	      (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
+where
+  {x_1, ..., x_n} = DV (ds)		-- Defined Variables
+<<[:e' | qs | qss:]>>   pa ea = 
+  <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) 
+    	       (zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
+where
+  {x_1, ..., x_n} = DV (qs)

+ We assume the denotation of crossP to be given by +

+crossP       :: [:a:] -> [:b:] -> [:(a, b):]
+crossP a1 a2  = let
+  		len1 = lengthP a1
+  		len2 = lengthP a2
+  		x1   = concatP $ mapP (replicateP len2) a1
+  		x2   = concatP $ replicateP len1 a2
+  	      in
+  	      zipP x1 x2

+ For a more efficient implementation of crossP, see + PrelPArr. +

+ Moreover, the following optimisations are important: +

In the p <- e rule, if pa == (), drop + it and simplify the crossP ea e to e. +
We assume that fusion will optimise sequences of array processing + combinators. +
FIXME: Do we want to have the following function? +
```
+mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]
```
+
+
+ Even with fusion (mapP (\p -> e) . filterP (\p -> + b)) may still result in redundant pattern matching + operations. (Let's wait with this until we have seen what the + Simplifier does to the generated code.) +

+ +

Doing Away With Nested Arrays: The Flattening Transformation

+ On the quest towards an entirely unboxed representation of parallel + arrays, the flattening transformation is the essential ingredient. GHC + uses a substantially + improved version of the transformation whose original form was + described by Blelloch & Sabot. The flattening transformation + replaces values of type [:a:] as well as functions + operating on these values by alternative, more efficient data structures + and functions. +

+ The flattening machinery is activated by the option + -fflatten, which is a static hsc option. It consists of + two steps: function vectorisation and array specialisation. Only the + first of those is implemented so far. If selected, the transformation + is applied to a module in Core form immediately after the desugarer, before the Mighty Simplifier gets to do its + job. After vectorisation, the Core program can be dumped using the + option -ddump-vect. The is a good reason for us to perform + flattening immediately after the desugarer. In HscMain.hscRecomp + the so-called persistent compiler state is maintained, which + contains all the information about imported interface files needed to + lookup the details of imported names (e.g., during renaming and type + checking). However, all this information is zapped before the + simplifier is invoked (supposedly to reduce the space-consumption of + GHC). As flattening has to get at all kinds of identifiers from Prelude + modules, we need to do it before the relevant information in the + persistent compiler state is gone. + +

+ As flattening generally requires all libraries to be compiled for + flattening (just like profiling does), there is a compiler way + "ndp", which can be selected using the way option code + -ndp. This option will automagically select + -fparr and -fflatten. + +

`FlattenMonad`

+ The module FlattenMonad + implements the monad Flatten that is used during + vectorisation to keep track of various sets of bound variables and a + variable substitution map; moreover, it provides a supply of new uniques + and allows us to look up names in the persistent compiler state (i.e., + imported identifiers). +

+ In order to be able to look up imported identifiers in the persistent + compiler state, it is important that these identifies are included into + the free variable lists computed by the renamer. More precisely, all + names needed by flattening are included in the names produced by + RnEnv.getImplicitModuleFVs. To avoid putting + flattening-dependent lists of names into the renamer, the module + FlattenInfo exports namesNeededForFlattening. + + [FIXME: It might be worthwhile to document in the non-Flattening part of + the Commentary that the persistent compiler state is zapped after + desugaring and how the free variables determined by the renamer imply + which names are imported.] + +

+ +Last modified: Tue Feb 12 01:44:21 EST 2002 + + + + -- cgit v1.2.1