diff options
Diffstat (limited to 'docs/users_guide/using-concurrent.rst')
-rw-r--r-- | docs/users_guide/using-concurrent.rst | 196 |
1 files changed, 196 insertions, 0 deletions
diff --git a/docs/users_guide/using-concurrent.rst b/docs/users_guide/using-concurrent.rst new file mode 100644 index 0000000000..c00a294132 --- /dev/null +++ b/docs/users_guide/using-concurrent.rst @@ -0,0 +1,196 @@ +.. _using-concurrent: + +Using Concurrent Haskell +------------------------ + +.. index:: + single: Concurrent Haskell; using + +GHC supports Concurrent Haskell by default, without requiring a special +option or libraries compiled in a certain way. To get access to the +support libraries for Concurrent Haskell, just import +:base-ref:`Control.Concurrent <Control-Concurrent.html>`. +More information on Concurrent Haskell is provided in the documentation +for that module. + +Optionally, the program may be linked with the ``-threaded`` option (see +:ref:`options-linker`. This provides two benefits: + +- It enables the ``-N``\ ``-Nx``\ RTS option RTS option to be used, + which allows threads to run in parallelparallelism on a + multiprocessormultiprocessorSMP or multicoremulticore machine. See + :ref:`using-smp`. + +- If a thread makes a foreign call (and the call is not marked + ``unsafe``), then other Haskell threads in the program will continue + to run while the foreign call is in progress. Additionally, + ``foreign export``\ ed Haskell functions may be called from multiple + OS threads simultaneously. See :ref:`ffi-threads`. + +The following RTS option(s) affect the behaviour of Concurrent Haskell +programs: + +.. index:: + single: RTS options; concurrent + +``-Cs`` + .. index:: + single: -Cs; RTS option + + Sets the context switch interval to ⟨s⟩ seconds. + A context switch will occur at the next heap block allocation after + the timer expires (a heap block allocation occurs every 4k of + allocation). With ``-C0`` or ``-C``, context switches will occur as + often as possible (at every heap block allocation). By default, + context switches occur every 20ms. + +.. _using-smp: + +Using SMP parallelism +--------------------- + +.. index:: + single: parallelism + single: SMP + +GHC supports running Haskell programs in parallel on an SMP (symmetric +multiprocessor). + +There's a fine distinction between *concurrency* and *parallelism*: +parallelism is all about making your program run *faster* by making use +of multiple processors simultaneously. Concurrency, on the other hand, +is a means of abstraction: it is a convenient way to structure a program +that must respond to multiple asynchronous events. + +However, the two terms are certainly related. By making use of multiple +CPUs it is possible to run concurrent threads in parallel, and this is +exactly what GHC's SMP parallelism support does. But it is also possible +to obtain performance improvements with parallelism on programs that do +not use concurrency. This section describes how to use GHC to compile +and run parallel programs, in :ref:`lang-parallel` we describe the +language features that affect parallelism. + +.. _parallel-compile-options: + +Compile-time options for SMP parallelism +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to make use of multiple CPUs, your program must be linked with +the ``-threaded`` option (see :ref:`options-linker`). Additionally, the +following compiler options affect parallelism: + +``-feager-blackholing`` + Blackholing is the act of marking a thunk (lazy computuation) as + being under evaluation. It is useful for three reasons: firstly it + lets us detect certain kinds of infinite loop (the + ``NonTermination`` exception), secondly it avoids certain kinds of + space leak, and thirdly it avoids repeating a computation in a + parallel program, because we can tell when a computation is already + in progress. + + The option ``-feager-blackholing`` causes each thunk to be + blackholed as soon as evaluation begins. The default is "lazy + blackholing", whereby thunks are only marked as being under + evaluation when a thread is paused for some reason. Lazy blackholing + is typically more efficient (by 1-2% or so), because most thunks + don't need to be blackholed. However, eager blackholing can avoid + more repeated computation in a parallel program, and this often + turns out to be important for parallelism. + + We recommend compiling any code that is intended to be run in + parallel with the ``-feager-blackholing`` flag. + +.. _parallel-options: + +RTS options for SMP parallelism +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are two ways to run a program on multiple processors: call +``Control.Concurrent.setNumCapabilities`` from your program, or use the +RTS ``-N`` option. + +``-N⟨x⟩`` + .. index:: + single: -N⟨x⟩; RTS option + + Use ⟨x⟩ simultaneous threads when running the program. + + The runtime manages a set of virtual processors, which we call + *capabilities*, the number of which is determined by the ``-N`` + option. Each capability can run one Haskell thread at a time, so the + number of capabilities is equal to the number of Haskell threads + that can run physically in parallel. A capability is animated by one + or more OS threads; the runtime manages a pool of OS threads for + each capability, so that if a Haskell thread makes a foreign call + (see :ref:`ffi-threads`) another OS thread can take over that + capability. + + Normally ⟨x⟩ should be chosen to match the number of CPU cores on + the machine [1]_. For example, on a dual-core machine we would + probably use ``+RTS -N2 -RTS``. + + Omitting ⟨x⟩, i.e. ``+RTS -N -RTS``, lets the runtime choose the + value of ⟨x⟩ itself based on how many processors are in your + machine. + + Be careful when using all the processors in your machine: if some of + your processors are in use by other programs, this can actually harm + performance rather than improve it. + + Setting ``-N`` also has the effect of enabling the parallel garbage + collector (see :ref:`rts-options-gc`). + + The current value of the ``-N`` option is available to the Haskell + program via ``Control.Concurrent.getNumCapabilities``, and it may be + changed while the program is running by calling + ``Control.Concurrent.setNumCapabilities``. + +The following options affect the way the runtime schedules threads on +CPUs: + +``-qa`` + Use the OS's affinity facilities to try to pin OS threads to CPU + cores. + + When this option is enabled, the OS threads for a capability *i* are + bound to the CPU core *i* using the API provided by the OS for + setting thread affinity. e.g. on Linux GHC uses + ``sched_setaffinity()``. + + Depending on your workload and the other activity on the machine, + this may or may not result in a performance improvement. We + recommend trying it out and measuring the difference. + +``-qm`` + Disable automatic migration for load balancing. Normally the runtime + will automatically try to schedule threads across the available CPUs + to make use of idle CPUs; this option disables that behaviour. Note + that migration only applies to threads; sparks created by ``par`` + are load-balanced separately by work-stealing. + + This option is probably only of use for concurrent programs that + explicitly schedule threads onto CPUs with + ``Control.Concurrent.forkOn``. + +Hints for using SMP parallelism +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Add the ``-s`` RTS option when running the program to see timing stats, +which will help to tell you whether your program got faster by using +more CPUs or not. If the user time is greater than the elapsed time, +then the program used more than one CPU. You should also run the program +without ``-N`` for comparison. + +The output of ``+RTS -s`` tells you how many "sparks" were created and +executed during the run of the program (see :ref:`rts-options-gc`), +which will give you an idea how well your ``par`` annotations are +working. + +GHC's parallelism support has improved in 6.12.1 as a result of much +experimentation and tuning in the runtime system. We'd still be +interested to hear how well it works for you, and we're also interested +in collecting parallel programs to add to our benchmarking suite. + +.. [1] Whether hyperthreading cores should be counted or not is an open + question; please feel free to experiment and let us know what results you + find. |