ghc/docs/users_guide/parallel.lit


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662

% both concurrent and parallel
%************************************************************************
%*                                                                      *
\section[concurrent-and-parallel]{Concurrent and Parallel Haskell}
\index{Concurrent Haskell}
\index{Parallel Haskell}
%*                                                                      *
%************************************************************************

Concurrent and Parallel Haskell are Glasgow extensions to Haskell
which let you structure your program as a group of independent
`threads'.

Concurrent and Parallel Haskell have very different purposes.

Concurrent Haskell is for applications which have an inherent
structure of interacting, concurrent tasks (i.e. `threads').  Threads
in such programs may be {\em required}.  For example, if a concurrent
thread has been spawned to handle a mouse click, it isn't
optional---the user wants something done!

A Concurrent Haskell program implies multiple `threads' running within
a single Unix process on a single processor.

Simon Peyton Jones and Sigbjorn Finne have a paper available,
``Concurrent Haskell: preliminary version.''
(draft available via \tr{ftp}
from \tr{ftp.dcs.gla.ac.uk/pub/glasgow-fp/drafts}).

Parallel Haskell is about {\em speed}---spawning threads onto multiple
processors so that your program will run faster.  The `threads'
are always {\em advisory}---if the runtime system thinks it can
get the job done more quickly by sequential execution, then fine.

A Parallel Haskell program implies multiple processes running on
multiple processors, under a PVM (Parallel Virtual Machine) framework.

Parallel Haskell is new with GHC 0.26; it is more about ``research
fun'' than about ``speed.'' That will change.  There is no paper about
Parallel Haskell.  That will change, too.

Some details about Concurrent and Parallel Haskell follow.

%************************************************************************
%*                                                                      *
\subsection{Concurrent and Parallel Haskell---language features}
\index{Concurrent Haskell---features}
\index{Parallel Haskell---features}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsection{Features specific to Concurrent Haskell}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsubsection{The \tr{Concurrent} interface (recommended)}
\index{Concurrent interface}
%*                                                                      *
%************************************************************************

GHC provides a \tr{Concurrent} module, a common interface to a
collection of useful concurrency abstractions, including those
mentioned in the ``concurrent paper''.

Just put \tr{import Concurrent} into your modules, and away you go.
NB: intended for use with the \tr{-fhaskell-1.3} flag.

To create a ``required thread'':

\begin{verbatim}
forkIO :: IO a -> IO a
\end{verbatim}

The \tr{Concurrent} interface also provides access to ``I-Vars''
and ``M-Vars'', which are two flavours of {\em synchronising variables}.
\index{synchronising variables (Glasgow extension)}
\index{concurrency -- synchronising variables}

\tr{_IVars}\index{_IVars (Glasgow extension)} are write-once
variables.  They start out empty, and any threads that attempt to read
them will block until they are filled.  Once they are written, any
blocked threads are freed, and additional reads are permitted.
Attempting to write a value to a full \tr{_IVar} results in a runtime
error.  Interface:
\begin{verbatim}
type IVar a = _IVar a -- more convenient name

newIVar     :: IO (_IVar a)
readIVar    :: _IVar a -> IO a
writeIVar   :: _IVar a -> a -> IO ()
\end{verbatim}

\tr{_MVars}\index{_MVars (Glasgow extension)} are rendezvous points,
mostly for concurrent threads.  They begin empty, and any attempt to
read an empty \tr{_MVar} blocks.  When an \tr{_MVar} is written, a
single blocked thread may be freed.  Reading an \tr{_MVar} toggles its
state from full back to empty.  Therefore, any value written to an
\tr{_MVar} may only be read once.  Multiple reads and writes are
allowed, but there must be at least one read between any two
writes. Interface:
\begin{verbatim}
type MVar a  = _MVar a -- more convenient name

newEmptyMVar :: IO (_MVar a)
newMVar      :: a -> IO (_MVar a)
takeMVar     :: _MVar a -> IO a
putMVar      :: _MVar a -> a -> IO ()
readMVar     :: _MVar a -> IO a
swapMVar     :: _MVar a -> a -> IO a
\end{verbatim}

A {\em channel variable} (@CVar@) is a one-element channel, as
described in the paper:

\begin{verbatim}
data CVar a
newCVar :: IO (CVar a)
putCVar :: CVar a -> a -> IO ()
getCVar :: CVar a -> IO a
\end{verbatim}

A @Channel@ is an unbounded channel:

\begin{verbatim}
data Chan a 
newChan         :: IO (Chan a)
putChan         :: Chan a -> a -> IO ()
getChan         :: Chan a -> IO a
dupChan         :: Chan a -> IO (Chan a)
unGetChan       :: Chan a -> a -> IO ()
getChanContents :: Chan a -> IO [a]
\end{verbatim}

General and quantity semaphores:

\begin{verbatim}
data QSem
newQSem     :: Int   -> IO QSem
waitQSem    :: QSem  -> IO ()
signalQSem  :: QSem  -> IO ()

data QSemN
newQSemN    :: Int   -> IO QSemN
signalQSemN :: QSemN -> Int -> IO ()
waitQSemN   :: QSemN -> Int -> IO ()
\end{verbatim}

Merging streams---binary and n-ary:

\begin{verbatim}
mergeIO  :: [a]   -> [a] -> IO [a]
nmergeIO :: [[a]] -> IO [a]
\end{verbatim}

A {\em Sample variable} (@SampleVar@) is slightly different from a
normal @_MVar@:
\begin{itemize}
\item Reading an empty @SampleVar@ causes the reader to block
    (same as @takeMVar@ on empty @_MVar@).
\item Reading a filled @SampleVar@ empties it and returns value.
    (same as @takeMVar@)
\item Writing to an empty @SampleVar@ fills it with a value, and
potentially, wakes up a blocked reader  (same as for @putMVar@ on empty @_MVar@).
\item Writing to a filled @SampleVar@ overwrites the current value.
 (different from @putMVar@ on full @_MVar@.)
\end{itemize}

\begin{verbatim}
type SampleVar a = _MVar (Int, _MVar a)

emptySampleVar :: SampleVar a -> IO ()
newSampleVar   :: IO (SampleVar a)
readSample     :: SampleVar a -> IO a
writeSample    :: SampleVar a -> a -> IO ()
\end{verbatim}

Finally, there are operations to delay a concurrent thread, and to
make one wait:\index{delay a concurrent thread}
\index{wait for a file descriptor}
\begin{verbatim}
threadDelay :: Int -> IO () -- delay rescheduling for N microseconds
threadWait  :: Int -> IO () -- wait for input on specified file descriptor
\end{verbatim}

%************************************************************************
%*                                                                      *
\subsubsection{Features specific to Parallel Haskell}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsubsection{The \tr{Parallel} interface (recommended)}
\index{Parallel interface}
%*                                                                      *
%************************************************************************

GHC provides two functions for controlling parallel execution, through
the \tr{Parallel} interface:
\begin{verbatim}
interface Parallel where
infixr 0 `par`
infixr 1 `seq`

par :: a -> b -> b
seq :: a -> b -> b
\end{verbatim}

The expression \tr{(x `par` y)} {\em sparks} the evaluation of \tr{x}
(to weak head normal form) and returns \tr{y}.  Sparks are queued for
execution in FIFO order, but are not executed immediately.  At the
next heap allocation, the currently executing thread will yield
control to the scheduler, and the scheduler will start a new thread
(until reaching the active thread limit) for each spark which has not
already been evaluated to WHNF.

The expression \tr{(x `seq` y)} evaluates \tr{x} to weak head normal
form and then returns \tr{y}.  The \tr{seq} primitive can be used to
force evaluation of an expression beyond WHNF, or to impose a desired
execution sequence for the evaluation of an expression.

For example, consider the following parallel version of our old
nemesis, \tr{nfib}:

\begin{verbatim}
import Parallel

nfib :: Int -> Int
nfib n | n <= 1 = 1
       | otherwise = par n1 (seq n2 (n1 + n2 + 1))
                     where n1 = nfib (n-1) 
                           n2 = nfib (n-2)
\end{verbatim}

For values of \tr{n} greater than 1, we use \tr{par} to spark a thread
to evaluate \tr{nfib (n-1)}, and then we use \tr{seq} to force the
parent thread to evaluate \tr{nfib (n-2)} before going on to add
together these two subexpressions.  In this divide-and-conquer
approach, we only spark a new thread for one branch of the computation
(leaving the parent to evaluate the other branch).  Also, we must use
\tr{seq} to ensure that the parent will evaluate \tr{n2} {\em before}
\tr{n1} in the expression \tr{(n1 + n2 + 1)}.  It is not sufficient to
reorder the expression as \tr{(n2 + n1 + 1)}, because the compiler may
not generate code to evaluate the addends from left to right.

%************************************************************************
%*                                                                      *
\subsubsubsection{Underlying functions and primitives}
\index{parallelism primitives}
\index{primitives for parallelism}
%*                                                                      *
%************************************************************************

The functions \tr{par} and \tr{seq} are really just renamings:
\begin{verbatim}
par a b = _par_ a b
seq a b = _seq_ a b
\end{verbatim}

The functions \tr{_par_} and \tr{_seq_} are built into GHC, and unfold
into uses of the \tr{par#} and \tr{seq#} primitives, respectively.  If
you'd like to see this with your very own eyes, just run GHC with the
\tr{-ddump-simpl} option.  (Anything for a good time...)

You can use \tr{_par_} and \tr{_seq_} in Concurrent Haskell, though
I'm not sure why you would want to.

%************************************************************************
%*                                                                      *
\subsubsection{Features common to Concurrent and Parallel Haskell}
%*                                                                      *
%************************************************************************

Actually, you can use the \tr{`par`} and \tr{`seq`} combinators
(really for Parallel Haskell) in Concurrent Haskell as well.
But doing things like ``\tr{par} to \tr{forkIO} many required threads''
counts as ``jumping out the 9th-floor window, just to see what happens.''

%************************************************************************
%*                                                                      *
\subsubsubsection{Scheduling policy for concurrent/parallel threads}
\index{Scheduling---concurrent/parallel}
\index{Concurrent/parallel scheduling}
%*                                                                      *
%************************************************************************

Runnable threads are scheduled in round-robin fashion.  Context
switches are signalled by the generation of new sparks or by the
expiry of a virtual timer (the timer interval is configurable with the
\tr{-C[<num>]}\index{-C<num> RTS option (concurrent, parallel)} RTS option).
However, a context switch doesn't really happen until the next heap
allocation.  If you want extremely short time slices, the \tr{-C} RTS
option can be used to force a context switch at each and every heap
allocation.

When a context switch occurs, pending sparks which have not already
been reduced to weak head normal form are turned into new threads.
However, there is a limit to the number of active threads (runnable or
blocked) which are allowed at any given time.  This limit can be
adjusted with the \tr{-t<num>}\index{-t <num> RTS option (concurrent, parallel)}
RTS option (the default is 32).  Once the
thread limit is reached, any remaining sparks are deferred until some
of the currently active threads are completed.

%************************************************************************
%*                                                                      *
\subsection{How to use Concurrent and Parallel Haskell}
%*                                                                      *
%************************************************************************

[You won't get far unless your GHC system was configured/built with
concurrency and/or parallelism enabled.  (They require separate
library modules.)  The relevant section of the installation guide says
how to do this.]

%************************************************************************
%*                                                                      *
\subsubsection{Using Concurrent Haskell}
\index{Concurrent Haskell---use}
%*                                                                      *
%************************************************************************

To compile a program as Concurrent Haskell, use the \tr{-concurrent}
option,\index{-concurrent option} both when compiling {\em and
linking}.  You will probably need the \tr{-fglasgow-exts} option, too.

Three RTS options are provided for modifying the behaviour of the
threaded runtime system.  See the descriptions of \tr{-C[<us>]}, \tr{-q},
and \tr{-t<num>} in \Sectionref{parallel-rts-opts}.

%************************************************************************
%*                                                                      *
\subsubsubsection[concurrent-problems]{Potential problems with Concurrent Haskell}
\index{Concurrent Haskell problems}
\index{problems, Concurrent Haskell}
%*                                                                      *
%************************************************************************

The main thread in a Concurrent Haskell program is given its own
private stack space, but all other threads are given stack space from
the heap.  Stack space for the main thread can be
adjusted as usual with the \tr{-K} RTS
option,\index{-K RTS option (concurrent, parallel)} but if this
private stack space is exhausted, the main thread will switch to stack
segments in the heap, just like any other thread.  Thus, problems
which would normally result in stack overflow in ``sequential Haskell''
can be expected to result in heap overflow when using threads.

The concurrent runtime system uses black holes as synchronisation
points for subexpressions which are shared among multiple threads.  In
``sequential Haskell'', a black hole indicates a cyclic data
dependency, which is a fatal error.  However, in concurrent execution, a
black hole may simply indicate that the desired expression is being
evaluated by another thread.  Therefore, when a thread encounters a
black hole, it simply blocks and waits for the black hole to be
updated.  Cyclic data dependencies will result in deadlock, and the
program will fail to terminate.

Because the concurrent runtime system uses black holes as
synchronisation points, it is not possible to disable black-holing
with the \tr{-N} RTS option.\index{-N RTS option} Therefore, the use
of signal handlers (including timeouts) with the concurrent runtime
system can lead to problems if a thread attempts to enter a black hole
that was created by an abandoned computation.  The use of signal
handlers in conjunction with threads is strongly discouraged.


%************************************************************************
%*                                                                      *
\subsubsection{Using Parallel Haskell}
\index{Parallel Haskell---use}
%*                                                                      *
%************************************************************************

[You won't be able to execute parallel Haskell programs unless PVM3
(Parallel Virtual Machine, version 3) is installed at your site.]

To compile a Haskell program for parallel execution under PVM, use the
\tr{-parallel} option,\index{-parallel option} both when compiling
{\em and linking}.  You will probably want to \tr{import Parallel}
into your Haskell modules.

To run your parallel program, once PVM is going, just invoke it ``as
normal''.  The main extra RTS option is \tr{-N<n>}, to say how many
PVM ``processors'' your program to run on.  (For more details of
all relevant RTS options, please see \sectionref{parallel-rts-opts}.)

In truth, running Parallel Haskell programs and getting information
out of them (e.g., activity profiles) is a battle with the vagaries of
PVM, detailed in the following sections.

For example: the stdout and stderr from your parallel program run will
appear in a log file, called something like \tr{/tmp/pvml.NNN}.

%************************************************************************
%*                                                                      *
\subsubsubsection{Dummy's guide to using PVM}
\index{PVM, how to use}
\index{Parallel Haskell---PVM use}
%*                                                                      *
%************************************************************************

Before you can run a parallel program under PVM, you must set the
required environment variables (PVM's idea, not ours); something like,
probably in your \tr{.cshrc} or equivalent:
\begin{verbatim}
setenv PVM_ROOT /wherever/you/put/it
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PVM_DPATH $PVM_ROOT/lib/pvmd
\end{verbatim}

Creating and/or controlling your ``parallel machine'' is a purely-PVM
business; nothing specific to Parallel Haskell.

You use the \tr{pvm}\index{pvm command} command to start PVM on your
machine.  You can then do various things to control/monitor your
``parallel machine;'' the most useful being:

\begin{tabular}{ll}
\tr{Control-D} & exit \tr{pvm}, leaving it running \\
\tr{halt} & kill off this ``parallel machine'' \& exit \\
\tr{add <host>} & add \tr{<host>} as a processor \\
\tr{delete <host>} & delete \tr{<host>} \\
\tr{reset}	& kill what's going, but leave PVM up \\
\tr{conf}       & list the current configuration \\
\tr{ps}         & report processes' status \\
\tr{pstat <pid>} & status of a particular process \\
\end{tabular}

The PVM documentation can tell you much, much more about \tr{pvm}!

%************************************************************************
%*                                                                      *
\subsubsection{Parallelism profiles}
\index{parallelism profiles}
\index{profiles, parallelism}
\index{visualisation tools}
%*                                                                      *
%************************************************************************

With Parallel Haskell programs, we usually don't care about the
results---only with ``how parallel'' it was!  We want pretty pictures.

Parallelism profiles (\`a la \tr{hbcpp}) can be generated with the
\tr{-q}\index{-q RTS option (concurrent, parallel)} RTS option.  The
per-processor profiling info is dumped into files {\em in your home
directory} named \tr{<program>.gr}.  These are then munged into a
PostScript picture, which you can then display.  For example,
to run your program \tr{a.out} on 8 processors, then view the
parallelism profile, do:

\begin{verbatim}
% ./a.out +RTS -N8 -q
% cd			# to home directory
% grs2gr *.???.gr	# combine the 8 .gr files into one
% gr2ps -O temp.gr	# cvt to .ps; output in temp.ps
% ghostview -seascape temp.ps	# look at it!
\end{verbatim}

The scripts for processing the parallelism profiles are distributed
in \tr{ghc/utils/parallel/}.

%************************************************************************
%*                                                                      *
\subsubsection{Activity profiles}
\index{activity profiles}
\index{profiles, activity}
\index{visualisation tools}
%*                                                                      *
%************************************************************************

You can also use the standard GHC ``cost-centre'' profiling to see how
much time each PVM ``processor'' spends

No special compilation flags beyond \tr{-parallel} are required to get
this basic four-activity profile.  Just use the \tr{-P} RTS option,
thusly:
\begin{verbatim}
./a.out +RTS -N7 -P	# 7 processors
\end{verbatim}

The above will create files named \tr{<something>.prof} and/or
\tr{<something>.time} {\em in your home directory}.  You can
process the \tr{.time} files into PostScript using \tr{hp2ps},
\index{hp2ps}
as described elsewhere in this guide.  The only thing is:
because of the weird file names, you probably need to use
\tr{hp2ps} as a filter; so:
\begin{verbatim}
% hp2ps < fooo.001.time > temp.ps
\end{verbatim}

%$$ The first line of the
%$$ \tr{.qp} file contains the name of the program executed, along with
%$$ any program arguments and thread-specific RTS options.  The second
%$$ line contains the date and time of program execution.  The third
%$$ and subsequent lines contain information about thread state transitions.
%$$ 
%$$ The thread state transition lines have the following format:
%$$ \begin{verbatim}
%$$ time transition thread-id thread-name [thread-id thread-name]
%$$ \end{verbatim}
%$$ 
%$$ The \tr{time} is the virtual time elapsed since the program started
%$$ execution, in milliseconds.  The \tr{transition} is a two-letter code
%$$ indicating the ``from'' queue and the ``to'' queue, where each queue
%$$ is one of:
%$$ \begin{itemize}
%$$ \item[\tr{*}] Void: Thread creation or termination.
%$$ \item[\tr{G}] Green: Runnable (or actively running, with \tr{-qv}) threads.
%$$ \item[\tr{A}] Amber: Runnable threads (\tr{-qv} only).
%$$ \item[\tr{R}] Red: Blocked threads.
%$$ \end{itemize}
%$$ The \tr{thread-id} is a unique integer assigned to each thread.  The
%$$ \tr{thread-name} is currently the address of the thread's root closure
%$$ (in hexadecimal).  In the future, it will be the name of the function
%$$ associated with the root of the thread.
%$$ 
%$$ The first \tr{(thread-id, thread-name)} pair identifies the thread
%$$ involved in the indicated transition.  For \tr{RG} and \tr{RA} transitions 
%$$ only, there is a second \tr{(thread-id, thread-name)} pair which identifies
%$$ the thread that released the blocked thread.
%$$ 
%$$ Provided with the GHC distribution is a perl script, \tr{qp2pp}, which
%$$ will convert \tr{.qp} files to \tr{hbcpp}'s \tr{.pp} format, so that
%$$ you can use the \tr{hbcpp} profiling tools, such as \tr{pp2ps92}.  The
%$$ \tr{.pp} format has undergone many changes, so the conversion script
%$$ is not compatible with earlier releases of \tr{hbcpp}.  Note that GHC
%$$ and \tr{hbcpp} use different thread scheduling policies (in
%$$ particular, \tr{hbcpp} threads never move from the green queue to the
%$$ amber queue).  For compatibility, the \tr{qp2pp} script eliminates the
%$$ GHC amber queue, so there is no point in using the verbose (\tr{-qv})
%$$ option if you are only interested in using the \tr{hbcpp} profiling
%$$ tools.

%************************************************************************
%*                                                                      *
\subsubsection[parallel-rts-opts]{RTS options for Concurrent/Parallel Haskell}
\index{RTS options, concurrent}
\index{RTS options, parallel}
\index{Concurrent Haskell---RTS options}
\index{Parallel Haskell---RTS options}
%*                                                                      *
%************************************************************************

Besides the usual runtime system (RTS) options
(\sectionref{runtime-control}), there are a few options particularly
for concurrent/parallel execution.

\begin{description}
\item[\tr{-N<N>}:]
\index{-N<N> RTS option (parallel)}
(PARALLEL ONLY) Use \tr{<N>} PVM processors to run this program;
the default is 2.

\item[\tr{-C[<us>]}:]
\index{-C<us> RTS option}
Sets the context switch interval to \pl{<us>} microseconds.  A context
switch will occur at the next heap allocation after the timer expires.
With \tr{-C0} or \tr{-C}, context switches will occur as often as
possible (at every heap allocation).  By default, context switches
occur every 10 milliseconds.  Note that many interval timers are only
capable of 10 millisecond granularity, so the default setting may be
the finest granularity possible, short of a context switch at every
heap allocation.

\item[\tr{-q[v]}:]
\index{-q RTS option}
Produce a quasi-parallel profile of thread activity, in the file
\tr{<program>.qp}.  In the style of \tr{hbcpp}, this profile records
the movement of threads between the green (runnable) and red (blocked)
queues.  If you specify the verbose suboption (\tr{-qv}), the green
queue is split into green (for the currently running thread only) and
amber (for other runnable threads).  We do not recommend that you use
the verbose suboption if you are planning to use the \tr{hbcpp}
profiling tools or if you are context switching at every heap check
(with \tr{-C}).

\item[\tr{-t<num>}:]
\index{-t<num> RTS option}
Limit the number of concurrent threads per processor to \pl{<num>}.
The default is 32.  Each thread requires slightly over 1K {\em words}
in the heap for thread state and stack objects.  (For 32-bit machines,
this translates to 4K bytes, and for 64-bit machines, 8K bytes.)

\item[\tr{-d}:]
\index{-d RTS option (parallel)}
(PARALLEL ONLY) Turn on debugging.  It pops up one xterm (or GDB, or
something...) per PVM processor.  We use the standard \tr{debugger}
script that comes with PVM3, but we sometimes meddle with the
\tr{debugger2} script.  We include ours in the GHC distribution,
in \tr{ghc/utils/pvm/}.
\end{description}

%************************************************************************
%*                                                                      *
\subsubsubsection[parallel-problems]{Potential problems with Parallel Haskell}
\index{Parallel Haskell---problems}
\index{problems, Parallel Haskell}
%*                                                                      *
%************************************************************************

The ``Potential problems'' for Concurrent Haskell also apply for
Parallel Haskell.  Please see \Sectionref{concurrent-problems}.

%$$ \subsubsubsection[par-notes]{notes for 0.26}
%$$ 
%$$ \begin{verbatim}
%$$ Install PVM somewhere, as it says.  We use 3.3
%$$ 
%$$ pvm.h : can do w/ a link from ghc/includes to its true home (???)
%$$ 
%$$ 
%$$ ghc -gum ... => a.out
%$$ 
%$$     a.out goes to $PVM_ROOT/bin/$PVM_ARCH/$PE
%$$ 
%$$     (profiling outputs go to ~/$PE.<process-num>.<suffix>)
%$$ 
%$$     trinder scripts in: ~trinder/bin/any/instPHIL
%$$ 
%$$ To run:
%$$ 
%$$     Then:
%$$     SysMan [-] N (PEs) args-to-program...
%$$ 
%$$         - ==> debug mode
%$$                 mattson setup: GDB window per task
%$$                 /local/grasp_tmp5/mattson/pvm3/lib/debugger{,2}
%$$ 
%$$                 to set breakpoint, etc, before "run", just modify debugger2
%$$ 
%$$     stderr and stdout are directed to /tmp/pvml.NNN
%$$ 
%$$ Visualisation stuff (normal _mp build):
%$$ 
%$$ +RTS -q         gransim-like profiling
%$$                 (should use exactly-gransim RTS options)
%$$      -qb        binary dumps : not tried, not recommended: hosed!
%$$ 
%$$     ascii dump : same info as gransim, one extra line at top w/
%$$                 start time; all times are ms since then
%$$ 
%$$     dumps appear in $HOME/<program>.nnn.gr
%$$ 
%$$ ~mattson/grs2gr.pl == combine lots into one (fixing times)
%$$ 
%$$ /local/grasp/hwloidl/GrAn/bin/ is where scripts are.
%$$ 
%$$ gr2ps == activity profile (bash script)
%$$ 
%$$ ~mattson/bin/`arch`/gr2qp must be picked up prior to hwloidl's for
%$$ things to work...
%$$ 
%$$ +RTS -[Pp]      (parallel) 4-cost-centre "profiling" (gc,MAIN,msg,idle)
%$$ 
%$$         ToDos: time-profiles from hp2ps: something about zeroth sample;
%$$ \end{verbatim}