1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<title>The GHC Commentary - Parallel Arrays</title>
</head>
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - Parallel Arrays</h1>
<p>
This section describes an experimental extension by high-performance
arrays, which comprises special syntax for array types and array
comprehensions, a set of optimising program transformations, and a set
of special purpose libraries. The extension is currently only partially
implemented, but the development will be tracked here.
<p>
Parallel arrays originally got their name from the aim to provide an
architecture-independent programming model for a range of parallel
computers. However, since experiments showed that the approach is also
worthwhile for sequential array code, the emphasis has shifted to their
parallel evaluation semantics: As soon as any element in a parallel
array is demanded, all the other elements are evaluated, too. This
makes parallel arrays more strict than <a
href="http://haskell.org/onlinelibrary/array.html">standard Haskell 98
arrays</a>, but also opens the door for a loop-based implementation
strategy that leads to significantly more efficient code.
<p>
The programming model as well as the use of the <em>flattening
transformation</em>, which is central to the approach, has its origin in
the programming language <a
href="http://www.cs.cmu.edu/~scandal/nesl.html">Nesl</a>.
<h2>More Sugar: Special Syntax for Array Comprehensions</h2>
<p>
The option <code>-fparr</code>, which is a dynamic hsc option that can
be reversed with <code>-fno-parr</code>, enables special syntax for
parallel arrays, which is not essential to using parallel arrays, but
makes for significantly more concise programs. The switch works by
making the lexical analyser (located in <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Lex.lhs"><code>Lex.lhs</code></a>)
recognise the tokens <code>[:</code> and <code>:]</code>. Given that
the additional productions in the parser (located in <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Parser.y"><code>Parser.y</code></a>)
cannot be triggered without the lexer generating the necessary tokens,
there is no need to alter the behaviour of the parser.
<p>
The following additional syntax is accepted (the non-terminals are those
from the <a href="http://haskell.org/onlinereport/">Haskell 98 language
definition</a>):
<p>
<blockquote><pre>
atype -> '[:' type ':] (parallel array type)
aexp -> '[:' exp1 ',' ... ',' expk ':]' (explicit array, k >= 0)
| '[:' exp1 [',' exp2] '..' exp3 ':]' (arithmetic array sequence)
| '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)
quals -> qual1 ',' ... ',' qualn (qualifier list, n >= 1)
apat -> '[:' pat1 ',' ... ',' patk ':]' (array pattern, k >= 0)
</pre>
</blockquote>
<p>
Moreover, the extended comprehension syntax that allows for <em>parallel
qualifiers</em> (i.e., qualifiers separated by "<code>|</code>") is also
supported in list comprehensions. In general, the similarity to the
special syntax for list is obvious. The two main differences are that
(a) arithmetic array sequences are always finite and (b)
<code>[::]</code> is not treated as a constructor in expressions and
patterns, but rather as a special case of the explicit array syntax.
The former is a simple consequence of the parallel evaluation semantics
of parallel arrays and the latter is due to arrays not being a
constructor-based data type.
<p>
As a naming convention, types and functions that are concerned with
parallel arrays usually contain the string <code>parr</code> or
<code>PArr</code> (often as a prefix), and where corresponding types or
functions for handling lists exist, the name is identical, except for
containing the substring <code>parr</code> instead of <code>list</code>
(possibly in caps).
<p>
The following implications are worth noting explicitly:
<ul>
<li>As the value and pattern <code>[::]</code> is an empty explicit
parallel array (i.e., something of the form <code>ExplicitPArr ty
[]</code> in the AST). This is in contrast to lists, which use the
nil-constructor instead. In the case of parallel arrays, using a
constructor would be rather awkward, as it is not a constructor-based
type. (This becomes rather clear in the desugarer.)
<li>As a consequence, array patterns have the general form <code>[:p1,
p2, ..., pn:]</code>, where <code>n</code> >= 0. Thus, two array
patterns overlap iff they have the same length -- an important property
for the pattern matching compiler.
</ul>
<h2>Prelude Support for Parallel Arrays</h2>
<p>
The Prelude module <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>
defines the standard operations and their types on parallel arrays and
provides a basic implementation based on boxed arrays. The interface of
<code>PrelPArr</code> is oriented by H98's <code>PrelList</code>, but
leaving out all functions that require infinite structures and adding
frequently needed array operations, such as permutations. Parallel
arrays are quite unqiue in that they use an entirely different
representation as soon as the flattening transformation is activated,
which is described further below. In particular, <code>PrelPArr</code>
defines the type <code>[::]</code> and operations to create, process,
and inspect parallel arrays. The type as well as the names of some of
the operations are also hardwired in <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a>
(see the definition of <code>parrTyCon</code> in this module) and <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/PrelNames.lhs"><code>PrelNames</code></a>.
This is again very much like the case of lists, where the type is
defined in <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelBase.lhs"><code>PrelBase</code></a>
and similarly wired in; however, for lists the entirely
constructor-based definition is exposed to user programs, which is not
the case for parallel arrays.
<h2>Desugaring Parallel Arrays</h2>
<p>
The parallel array extension requires the desugarer to replace all
occurrences of (1) explicit parallel arrays, (2) array patterns, and (3)
array comprehensions by a suitable combination of invocations of
operations defined in <code>PrelPArr</code>.
<h4>Explicit Parallel Arrays</h4>
<p>
These are by far the simplest to remove. We simply replace every
occurrence of <code>[:<i>e<sub>1</sub></i>, ...,
<i>e<sub>n</sub></i>:]</code> by
<blockquote>
<code>
toP [<i>e<sub>1</sub></i>, ..., <i>e<sub>n</sub></i>]
</code>
</blockquote>
<p>
i.e., we build a list of the array elements, which is, then, converted
into a parallel array.
<h4>Parallel Array Patterns</h4>
<p>
Array patterns are much more tricky as element positions may contain
further patterns and the <a
href="../the-beast/desugar.html#patmat">pattern matching compiler</a>
requires us to flatten all those out. But before we turn to the gory
details, here first the basic idea: A flat array pattern matches exactly
iff it's length corresponds to the length of the matched array. Hence,
if we have a set of flat array patterns matching an array value
<code>a</code>, it suffices to generate a Core <code>case</code>
expression that scrutinises <code>lengthP a</code> and has one
alternative for every length of array occuring in one of the patterns.
Moreover, there needs to be a default case catching all other array
lengths. In each alternative, array indexing (i.e., the functions
<code>!:</code>) is used to bind array elements to the corresponding
pattern variables. This sounds easy enough and is essentially what the
parallel array equation of the function <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsUtils.lhs"><code>DsUtils</code></a><code>.mkCoAlgCaseMatchResult</code>
does.
<p>
Unfortunately, however, the pattern matching compiler expects that it
can turn (almost) any pattern into variable patterns, literals, or
constructor applications by way of the functions <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Match.lhs"><code>Match</code></a><code>.tidy1</code>.
And to make matters worse, some weird machinery in the module <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Check.lhs"><code>Check</code></a>
insists on being able to reverse the process (essentially to pretty
print patterns in warnings for incomplete or overlapping patterns).
<p>
The solution to this is an (unlimited) set of <em>fake</em> constructors
for parallel arrays, courtesy of <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a><code>.parrFakeCon</code>.
In other words, any pattern of the form <code>[:<i>p<sub>1</sub></i>,
..., <i>p<sub>n</sub></i>:]</code> is transformed into
<blockquote>
<code>
MkPArray<i>n</i> <i>p<sub>1</sub></i> ... <i>p<sub>n</sub></i>
</code>
</blockquote>
<p>
by <code>Match.tidy1</code>, then, run through the rest of the pattern
matching compiler, and finally, picked up by
<code>DsUtils.mkCoAlgCaseMatchResult</code>, which converts it into a
<code>case</code> expression as outlined above.
<p>
As an example consider the source expression
<blockquote><pre>
case v of
[:x1:] -> e1
[:x2, x3, x4:] -> e2
_ -> e3</pre>
</blockquote>
<p>
<code>Match.tidy1</code> converts it into a form that is equivalent to
<blockquote><pre>
case v of {
MkPArr1 x1 -> e1;
MkPArr2 x2 x3 x4 -> e2;
_ -> e3;
}</pre>
</blockquote>
<p>
which <code>DsUtils.mkCoAlgCaseMatchResult</code> turns into the
following Core code:
<blockquote><pre>
case lengthP v of
Int# i# ->
case i# of l {
DFT -> e3
1 -> let x1 = v!:0 in e1
3 -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
}</pre>
</blockquote>
<h4>Parallel Array Comprehensions</h4>
<p>
The most challenging construct of the three are array comprehensions.
In principle, it would be possible to transform them in essentially the
same way as list comprehensions, but this would lead to abysmally slow
code as desugaring of list comprehensions generates code that is
optimised for sequential, constructor-based structures. In contrast,
array comprehensions need to be transformed into code that solely relies
on collective operations and avoids the creation of many small
intermediate arrays.
<p>
The transformation is implemented by the function <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsListComp.lhs"><code>DsListComp</code></a><code>.dsPArrComp</code>.
In the following, we denote this transformation function by the form
<code><<<i>e</i>>> pa ea</code>, where <code><i>e</i></code>
is the comprehension to be compiled and the arguments <code>pa</code>
and <code>ea</code> denote a pattern and the currently processed array
expression, respectively. The invariant constraining these two
arguments is that all elements in the array produced by <code>ea</code>
will <em>successfully</em> match against <code>pa</code>.
<p>
Given a source-level comprehensions <code>[:e | qss:]</code>, we compile
it with <code><<[:e | qss:]>> () [:():]</code> using the
rules
<blockquote><pre>
<<[:e' | :]>> pa ea = mapP (\pa -> e') ea
<<[:e' | b , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
<<[:e' | p <- e, qs:]>> pa ea =
let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
in
<<[:e' | qs:]>> (pa, p) (crossP ea ef)
<<[:e' | let ds, qs:]>> pa ea =
<<[:e' | qs:]>> (pa, (x_1, ..., x_n))
(mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
where
{x_1, ..., x_n} = DV (ds) -- Defined Variables
<<[:e' | qs | qss:]>> pa ea =
<<[:e' | qss:]>> (pa, (x_1, ..., x_n))
(zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
where
{x_1, ..., x_n} = DV (qs)</pre>
</blockquote>
<p>
We assume the denotation of <code>crossP</code> to be given by
<blockquote><pre>
crossP :: [:a:] -> [:b:] -> [:(a, b):]
crossP a1 a2 = let
len1 = lengthP a1
len2 = lengthP a2
x1 = concatP $ mapP (replicateP len2) a1
x2 = concatP $ replicateP len1 a2
in
zipP x1 x2</pre>
</blockquote>
<p>
For a more efficient implementation of <code>crossP</code>, see
<a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>.
<p>
Moreover, the following optimisations are important:
<ul>
<li>In the <code>p <- e</code> rule, if <code>pa == ()</code>, drop
it and simplify the <code>crossP ea e</code> to <code>e</code>.
<li>We assume that fusion will optimise sequences of array processing
combinators.
<li>FIXME: Do we want to have the following function?
<blockquote><pre>
mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]</pre>
</blockquote>
<p>
Even with fusion <code>(mapP (\p -> e) . filterP (\p ->
b))</code> may still result in redundant pattern matching
operations. (Let's wait with this until we have seen what the
Simplifier does to the generated code.)
</ul>
<h2>Doing Away With Nested Arrays: The Flattening Transformation</h2>
<p>
On the quest towards an entirely unboxed representation of parallel
arrays, the flattening transformation is the essential ingredient. GHC
uses a <a
href="http://www.cse.unsw.edu.au/~chak/papers/CK00.html">substantially
improved version</a> of the transformation whose original form was
described by Blelloch & Sabot. The flattening transformation
replaces values of type <code>[:a:]</code> as well as functions
operating on these values by alternative, more efficient data structures
and functions.
<p>
The flattening machinery is activated by the option
<code>-fflatten</code>, which is a static hsc option. It consists of
two steps: function vectorisation and array specialisation. Only the
first of those is implemented so far. If selected, the transformation
is applied to a module in Core form immediately after the <a
href="../the-beast/desugar.html">desugarer,</a> before the <a
href="../the-beast/simplifier.html">Mighty Simplifier</a> gets to do its
job. After vectorisation, the Core program can be dumped using the
option <code>-ddump-vect</code>. The is a good reason for us to perform
flattening immediately after the desugarer. In <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscRecomp</code>
the so-called <em>persistent compiler state</em> is maintained, which
contains all the information about imported interface files needed to
lookup the details of imported names (e.g., during renaming and type
checking). However, all this information is zapped before the
simplifier is invoked (supposedly to reduce the space-consumption of
GHC). As flattening has to get at all kinds of identifiers from Prelude
modules, we need to do it before the relevant information in the
persistent compiler state is gone.
<p>
As flattening generally requires all libraries to be compiled for
flattening (just like profiling does), there is a <em>compiler way</em>
<code>"ndp"</code>, which can be selected using the way option code
<code>-ndp</code>. This option will automagically select
<code>-fparr</code> and <code>-fflatten</code>.
<h4><code>FlattenMonad</code></h4>
<p>
The module <a
href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/ndpFlatten/FlattenMonad.lhs"><code>FlattenMonad</code></a>
implements the monad <code>Flatten</code> that is used during
vectorisation to keep track of various sets of bound variables and a
variable substitution map; moreover, it provides a supply of new uniques
and allows us to look up names in the persistent compiler state (i.e.,
imported identifiers).
<p>
In order to be able to look up imported identifiers in the persistent
compiler state, it is important that these identifies are included into
the free variable lists computed by the renamer. More precisely, all
names needed by flattening are included in the names produced by
<code>RnEnv.getImplicitModuleFVs</code>. To avoid putting
flattening-dependent lists of names into the renamer, the module
<code>FlattenInfo</code> exports <code>namesNeededForFlattening</code>.
[FIXME: It might be worthwhile to document in the non-Flattening part of
the Commentary that the persistent compiler state is zapped after
desugaring and how the free variables determined by the renamer imply
which names are imported.]
<p><small>
<!-- hhmts start -->
Last modified: Tue Feb 12 01:44:21 EST 2002
<!-- hhmts end -->
</small>
</body>
</html>
|