docs/comm/exts/ndp.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
    <title>The GHC Commentary - Parallel Arrays</title>
  </head>

  <body BGCOLOR="FFFFFF">
    <h1>The GHC Commentary - Parallel Arrays</h1>
    <p>
      This section describes an experimental extension by high-performance
      arrays, which comprises special syntax for array types and array
      comprehensions, a set of optimising program transformations, and a set
      of special purpose libraries.  The extension is currently only partially
      implemented, but the development will be tracked here.
    <p>
      Parallel arrays originally got their name from the aim to provide an
      architecture-independent programming model for a range of parallel
      computers.  However, since experiments showed that the approach is also
      worthwhile for sequential array code, the emphasis has shifted to their
      parallel evaluation semantics: As soon as any element in a parallel
      array is demanded, all the other elements are evaluated, too.  This
      makes parallel arrays more strict than <a
      href="http://haskell.org/onlinelibrary/array.html">standard Haskell 98
      arrays</a>, but also opens the door for a loop-based implementation
      strategy that leads to significantly more efficient code.
    <p>
      The programming model as well as the use of the <em>flattening
      transformation</em>, which is central to the approach, has its origin in
      the programming language <a
      href="http://www.cs.cmu.edu/~scandal/nesl.html">Nesl</a>.

    <h2>More Sugar: Special Syntax for Array Comprehensions</h2>
    <p>
      The option <code>-fparr</code>, which is a dynamic hsc option that can
      be reversed with <code>-fno-parr</code>, enables special syntax for
      parallel arrays, which is not essential to using parallel arrays, but
      makes for significantly more concise programs.  The switch works by
      making the lexical analyser (located in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Lex.lhs"><code>Lex.lhs</code></a>) 
      recognise the tokens <code>[:</code> and <code>:]</code>.   Given that
      the additional productions in the parser (located in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Parser.y"><code>Parser.y</code></a>) 
      cannot be triggered without the lexer generating the necessary tokens,
      there is no need to alter the behaviour of the parser.
    <p>
      The following additional syntax is accepted (the non-terminals are those
      from the <a href="http://haskell.org/onlinereport/">Haskell 98 language
      definition</a>):
    <p>
      <blockquote><pre>
atype -> '[:' type ':]				     (parallel array type)

aexp  -> '[:' exp1 ',' ... ',' expk ':]'             (explicit array, k >= 0)
      |  '[:' exp1 [',' exp2] '..' exp3 ':]'	     (arithmetic array sequence)
      |  '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)

quals -> qual1 ',' ... ',' qualn	             (qualifier list, n >= 1)

apat  -> '[:' pat1 ',' ... ',' patk ':]'	     (array pattern, k >= 0)
</pre>
    </blockquote>
    <p>
      Moreover, the extended comprehension syntax that allows for <em>parallel
      qualifiers</em> (i.e., qualifiers separated by "<code>|</code>") is also
      supported in list comprehensions.  In general, the similarity to the
      special syntax for list is obvious.  The two main differences are that
      (a) arithmetic array sequences are always finite and (b)
      <code>[::]</code> is not treated as a constructor in expressions and
      patterns, but rather as a special case of the explicit array syntax.
      The former is a simple consequence of the parallel evaluation semantics
      of parallel arrays and the latter is due to arrays not being a
      constructor-based data type.
    <p>
      As a naming convention, types and functions that are concerned with
      parallel arrays usually contain the string <code>parr</code> or
      <code>PArr</code> (often as a prefix), and where corresponding types or
      functions for handling lists exist, the name is identical, except for
      containing the substring <code>parr</code> instead of <code>list</code>
      (possibly in caps).
    <p>
      The following implications are worth noting explicitly:
    <ul>
      <li>As the value and pattern <code>[::]</code> is an empty explicit
	parallel array (i.e., something of the form <code>ExplicitPArr ty
	[]</code> in the AST).  This is in contrast to lists, which use the
	nil-constructor instead.  In the case of parallel arrays, using a
	constructor would be rather awkward, as it is not a constructor-based
	type. (This becomes rather clear in the desugarer.)
      <li>As a consequence, array patterns have the general form <code>[:p1,
	  p2, ..., pn:]</code>, where <code>n</code> >= 0.  Thus, two array
	patterns overlap iff they have the same length -- an important property
	for the pattern matching compiler.
    </ul>

    <h2>Prelude Support for Parallel Arrays</h2>
    <p>
      The Prelude module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>
      defines the standard operations and their types on parallel arrays and
      provides a basic implementation based on boxed arrays.  The interface of
      <code>PrelPArr</code> is oriented by H98's <code>PrelList</code>, but
      leaving out all functions that require infinite structures and adding
      frequently needed array operations, such as permutations.  Parallel
      arrays are quite unqiue in that they use an entirely different
      representation as soon as the flattening transformation is activated,
      which is described further below.  In particular, <code>PrelPArr</code>
      defines the type <code>[::]</code> and operations to create, process,
      and inspect parallel arrays.  The type as well as the names of some of
      the operations are also hardwired in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a>
      (see the definition of <code>parrTyCon</code> in this module) and <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/PrelNames.lhs"><code>PrelNames</code></a>.
      This is again very much like the case of lists, where the type is
      defined in <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelBase.lhs"><code>PrelBase</code></a>
      and similarly wired in; however, for lists the entirely
      constructor-based definition is exposed to user programs, which is not
      the case for parallel arrays.

    <h2>Desugaring Parallel Arrays</h2>
    <p>
      The parallel array extension requires the desugarer to replace all
      occurrences of (1) explicit parallel arrays, (2) array patterns, and (3)
      array comprehensions by a suitable combination of invocations of
      operations defined in <code>PrelPArr</code>.

    <h4>Explicit Parallel Arrays</h4>
    <p>
      These are by far the simplest to remove.  We simply replace every
      occurrence of <code>[:<i>e<sub>1</sub></i>, ...,
      <i>e<sub>n</sub></i>:]</code> by
    <blockquote>
      <code>
	toP [<i>e<sub>1</sub></i>, ..., <i>e<sub>n</sub></i>]
      </code>
    </blockquote>
    <p>
      i.e., we build a list of the array elements, which is, then, converted
      into a parallel array.

    <h4>Parallel Array Patterns</h4>
    <p>
      Array patterns are much more tricky as element positions may contain
      further patterns and the <a
      href="../the-beast/desugar.html#patmat">pattern matching compiler</a>
      requires us to flatten all those out.  But before we turn to the gory
      details, here first the basic idea: A flat array pattern matches exactly
      iff it's length corresponds to the length of the matched array.  Hence,
      if we have a set of flat array patterns matching an array value
      <code>a</code>, it suffices to generate a Core <code>case</code>
      expression that scrutinises <code>lengthP a</code> and has one
      alternative for every length of array occuring in one of the patterns.
      Moreover, there needs to be a default case catching all other array
      lengths.  In each alternative, array indexing (i.e., the functions
      <code>!:</code>) is used to bind array elements to the corresponding
      pattern variables.  This sounds easy enough and is essentially what the
      parallel array equation of the function <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsUtils.lhs"><code>DsUtils</code></a><code>.mkCoAlgCaseMatchResult</code>
      does.
    <p>
      Unfortunately, however, the pattern matching compiler expects that it
      can turn (almost) any pattern into variable patterns, literals, or
      constructor applications by way of the functions <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Match.lhs"><code>Match</code></a><code>.tidy1</code>.
      And to make matters worse, some weird machinery in the module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Check.lhs"><code>Check</code></a> 
      insists on being able to reverse the process (essentially to pretty
      print patterns in warnings for incomplete or overlapping patterns).
    <p>
      The solution to this is an (unlimited) set of <em>fake</em> constructors
      for parallel arrays, courtesy of <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a><code>.parrFakeCon</code>.  
      In other words, any pattern of the form <code>[:<i>p<sub>1</sub></i>,
      ..., <i>p<sub>n</sub></i>:]</code> is transformed into 
    <blockquote>
      <code>
	MkPArray<i>n</i> <i>p<sub>1</sub></i> ... <i>p<sub>n</sub></i>
      </code>
    </blockquote>
    <p>
      by <code>Match.tidy1</code>, then, run through the rest of the pattern
      matching compiler, and finally, picked up by
      <code>DsUtils.mkCoAlgCaseMatchResult</code>, which converts it into a
      <code>case</code> expression as outlined above.
    <p>
      As an example consider the source expression
    <blockquote><pre>
case v of
  [:x1:]         -> e1
  [:x2, x3, x4:] -> e2
  _		 -> e3</pre>
    </blockquote>
    <p>
      <code>Match.tidy1</code> converts it into a form that is equivalent to
    <blockquote><pre>
case v of {
  MkPArr1 x1       -> e1;
  MkPArr2 x2 x3 x4 -> e2;
  _	           -> e3;
}</pre>
    </blockquote>
    <p>
      which <code>DsUtils.mkCoAlgCaseMatchResult</code> turns into the
      following Core code:
    <blockquote><pre>
      case lengthP v of
        Int# i# -> 
	  case i# of l {
	    DFT ->					  e3
	    1   -> let x1 = v!:0                       in e1
	    3   -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
	  }</pre>
    </blockquote>

    <h4>Parallel Array Comprehensions</h4>
    <p>
      The most challenging construct of the three are array comprehensions.
      In principle, it would be possible to transform them in essentially the
      same way as list comprehensions, but this would lead to abysmally slow
      code as desugaring of list comprehensions generates code that is
      optimised for sequential, constructor-based structures.  In contrast,
      array comprehensions need to be transformed into code that solely relies
      on collective operations and avoids the creation of many small
      intermediate arrays.  
    <p>
      The transformation is implemented by the function <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsListComp.lhs"><code>DsListComp</code></a><code>.dsPArrComp</code>.
      In the following, we denote this transformation function by the form
      <code>&lt;&lt;<i>e</i>&gt;&gt; pa ea</code>, where <code><i>e</i></code>
      is the comprehension to be compiled and the arguments <code>pa</code>
      and <code>ea</code> denote a pattern and the currently processed array
      expression, respectively.  The invariant constraining these two
      arguments is that all elements in the array produced by <code>ea</code>
      will <em>successfully</em> match against <code>pa</code>.
    <p>
      Given a source-level comprehensions <code>[:e | qss:]</code>, we compile
      it with <code>&lt;&lt;[:e | qss:]&gt;&gt; () [:():]</code> using the
      rules 
    <blockquote><pre>
<<[:e' |           :]>> pa ea = mapP (\pa -> e') ea
<<[:e' | b     , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
<<[:e' | p <- e, qs:]>> pa ea = 
  let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
  in
  <<[:e' | qs:]>> (pa, p) (crossP ea ef)
<<[:e' | let ds, qs:]>> pa ea = 
  <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) 
    	      (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
where
  {x_1, ..., x_n} = DV (ds)		-- Defined Variables
<<[:e' | qs | qss:]>>   pa ea = 
  <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) 
    	       (zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
where
  {x_1, ..., x_n} = DV (qs)</pre>
    </blockquote>
    <p>
      We assume the denotation of <code>crossP</code> to be given by
    <blockquote><pre>
crossP       :: [:a:] -> [:b:] -> [:(a, b):]
crossP a1 a2  = let
  		len1 = lengthP a1
  		len2 = lengthP a2
  		x1   = concatP $ mapP (replicateP len2) a1
  		x2   = concatP $ replicateP len1 a2
  	      in
  	      zipP x1 x2</pre>
    </blockquote>
    <p>
      For a more efficient implementation of <code>crossP</code>, see
      <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>.
    <p>
      Moreover, the following optimisations are important:
    <ul>
      <li>In the <code>p &lt;- e</code> rule, if <code>pa == ()</code>, drop
	it and simplify the <code>crossP ea e</code> to <code>e</code>.
      <li>We assume that fusion will optimise sequences of array processing
	combinators.
      <li>FIXME: Do we want to have the following function?
	<blockquote><pre>
mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]</pre>
	</blockquote>
	<p>
	  Even with fusion <code>(mapP (\p -&gt; e) . filterP (\p -&gt;
	  b))</code> may still result in redundant pattern matching
	  operations.  (Let's wait with this until we have seen what the
	  Simplifier does to the generated code.)
    </ul>

    <h2>Doing Away With Nested Arrays: The Flattening Transformation</h2>
    <p>
      On the quest towards an entirely unboxed representation of parallel
      arrays, the flattening transformation is the essential ingredient.  GHC
      uses a <a
      href="http://www.cse.unsw.edu.au/~chak/papers/CK00.html">substantially
      improved version</a> of the transformation whose original form was
      described by Blelloch &amp; Sabot.  The flattening transformation
      replaces values of type <code>[:a:]</code> as well as functions
      operating on these values by alternative, more efficient data structures
      and functions.
    <p>
      The flattening machinery is activated by the option
      <code>-fflatten</code>, which is a static hsc option.  It consists of
      two steps: function vectorisation and array specialisation.  Only the
      first of those is implemented so far.  If selected, the transformation
      is applied to a module in Core form immediately after the <a
      href="../the-beast/desugar.html">desugarer,</a> before the <a
      href="../the-beast/simplifier.html">Mighty Simplifier</a> gets to do its
      job.  After vectorisation, the Core program can be dumped using the
      option <code>-ddump-vect</code>.  The is a good reason for us to perform
      flattening immediately after the desugarer.  In <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscRecomp</code>
      the so-called <em>persistent compiler state</em> is maintained, which
      contains all the information about imported interface files needed to
      lookup the details of imported names (e.g., during renaming and type
      checking).  However, all this information is zapped before the
      simplifier is invoked (supposedly to reduce the space-consumption of
      GHC).  As flattening has to get at all kinds of identifiers from Prelude
      modules, we need to do it before the relevant information in the
      persistent compiler state is gone.

    <p>
      As flattening generally requires all libraries to be compiled for
      flattening (just like profiling does), there is a <em>compiler way</em>
      <code>"ndp"</code>, which can be selected using the way option code
      <code>-ndp</code>.  This option will automagically select
      <code>-fparr</code> and <code>-fflatten</code>. 

    <h4><code>FlattenMonad</code></h4>
    <p>
      The module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/ndpFlatten/FlattenMonad.lhs"><code>FlattenMonad</code></a>
      implements the monad <code>Flatten</code> that is used during
      vectorisation to keep track of various sets of bound variables and a
      variable substitution map; moreover, it provides a supply of new uniques
      and allows us to look up names in the persistent compiler state (i.e.,
      imported identifiers).
    <p>
      In order to be able to look up imported identifiers in the persistent
      compiler state, it is important that these identifies are included into
      the free variable lists computed by the renamer.  More precisely, all
      names needed by flattening are included in the names produced by
      <code>RnEnv.getImplicitModuleFVs</code>.  To avoid putting
      flattening-dependent lists of names into the renamer, the module
      <code>FlattenInfo</code> exports <code>namesNeededForFlattening</code>.

      [FIXME: It might be worthwhile to document in the non-Flattening part of
      the Commentary that the persistent compiler state is zapped after
      desugaring and how the free variables determined by the renamer imply
      which names are imported.]
    
    <p><small>
<!-- hhmts start -->
Last modified: Tue Feb 12 01:44:21 EST 2002
<!-- hhmts end -->
    </small>
  </body>
</html>