summaryrefslogtreecommitdiff
path: root/docs/users_guide/using-optimisation.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/users_guide/using-optimisation.rst')
-rw-r--r--docs/users_guide/using-optimisation.rst780
1 files changed, 780 insertions, 0 deletions
diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst
new file mode 100644
index 0000000000..84bf27b4d4
--- /dev/null
+++ b/docs/users_guide/using-optimisation.rst
@@ -0,0 +1,780 @@
+.. _options-optimise:
+
+Optimisation (code improvement)
+-------------------------------
+
+.. index::
+ single: optimisation
+ single: improvement, code
+
+The ``-O*`` options specify convenient "packages" of optimisation flags;
+the ``-f*`` options described later on specify *individual*
+optimisations to be turned on/off; the ``-m*`` options specify
+*machine-specific* optimisations to be turned on/off.
+
+Most of these options are boolean and have options to turn them both "on" and
+"off" (beginning with the prefix ``no-``). For instance, while ``-fspecialise``
+enables specialisation, ``-fno-specialise`` disables it. When multiple flags for
+the same option appear in the command-line they are evaluated from left to
+right. For instance, ``-fno-specialise -fspecialise`` will enable
+specialisation.
+
+It is important to note that the ``-O*`` flags are roughly equivalent to
+combinations of ``-f*`` flags. For this reason, the effect of the
+``-O*`` and ``-f*`` flags is dependent upon the order in which they
+occur on the command line.
+
+For instance, take the example of ``-fno-specialise -O1``. Despite the
+``-fno-specialise`` appearing in the command line, specialisation will
+still be enabled. This is the case as ``-O1`` implies ``-fspecialise``,
+overriding the previous flag. By contrast, ``-O1 -fno-specialise`` will
+compile without specialisation, as one would expect.
+
+.. _optimise-pkgs:
+
+``-O*``: convenient “packages” of optimisation flags.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are *many* options that affect the quality of code produced by
+GHC. Most people only have a general goal, something like "Compile
+quickly" or "Make my program run like greased lightning." The following
+"packages" of optimisations (or lack thereof) should suffice.
+
+Note that higher optimisation levels cause more cross-module
+optimisation to be performed, which can have an impact on how much of
+your program needs to be recompiled when you change something. This is
+one reason to stick to no-optimisation when developing code.
+
+``-O*``
+ .. index::
+ single: -O\* not specified
+
+ This is taken to mean: “Please compile quickly; I'm not
+ over-bothered about compiled-code quality.” So, for example:
+ ``ghc -c Foo.hs``
+
+``-O0``
+ .. index::
+ single: -O0
+
+ Means "turn off all optimisation", reverting to the same settings as
+ if no ``-O`` options had been specified. Saying ``-O0`` can be
+ useful if eg. ``make`` has inserted a ``-O`` on the command line
+ already.
+
+``-O``, ``-O1``
+ .. index::
+ single: -O option
+ single: -O1 option
+ single: optimise; normally
+
+ Means: "Generate good-quality code without taking too long about
+ it." Thus, for example: ``ghc -c -O Main.lhs``
+
+``-O2``
+ .. index::
+ single: -O2 option
+ single: optimise; aggressively
+
+ Means: "Apply every non-dangerous optimisation, even if it means
+ significantly longer compile times."
+
+ The avoided "dangerous" optimisations are those that can make
+ runtime or space *worse* if you're unlucky. They are normally turned
+ on or off individually.
+
+ At the moment, ``-O2`` is *unlikely* to produce better code than
+ ``-O``.
+
+``-Odph``
+ .. index::
+ single: -Odph
+ single: optimise; DPH
+
+ Enables all ``-O2`` optimisation, sets
+ ``-fmax-simplifier-iterations=20`` and ``-fsimplifier-phases=3``.
+ Designed for use with :ref:`Data Parallel Haskell (DPH) <dph>`.
+
+We don't use a ``-O*`` flag for day-to-day work. We use ``-O`` to get
+respectable speed; e.g., when we want to measure something. When we want
+to go for broke, we tend to use ``-O2`` (and we go for lots of coffee
+breaks).
+
+The easiest way to see what ``-O`` (etc.) “really mean” is to run with
+``-v``, then stand back in amazement.
+
+.. _options-f:
+
+``-f*``: platform-independent flags
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: -f\* options (GHC)
+ single: -fno-\* options (GHC)
+
+These flags turn on and off individual optimisations. Flags marked as
+*Enabled by default* are enabled by ``-O``, and as such you shouldn't
+need to set any of them explicitly. A flag ``-fwombat`` can be negated
+by saying ``-fno-wombat``. See :ref:`options-f-compact` for a compact
+list.
+
+``-fcase-merge``
+ .. index::
+ single: -fcase-merge
+
+ *On by default.* Merge immediately-nested case expressions that
+ scrutinse the same variable. For example,
+
+ ::
+
+ case x of
+ Red -> e1
+ _ -> case x of
+ Blue -> e2
+ Green -> e3
+
+ Is transformed to,
+
+ ::
+ case x of
+ Red -> e1
+ Blue -> e2
+ Green -> e2
+
+``-fcall-arity``
+ .. index::
+ single: -fcall-arity
+
+ *On by default.*.
+
+``-fcmm-elim-common-blocks``
+ .. index::
+ single: -felim-common-blocks
+
+ *On by default.*. Enables the common block elimination optimisation
+ in the code generator. This optimisation attempts to find identical
+ Cmm blocks and eliminate the duplicates.
+
+``-fcmm-sink``
+ .. index::
+ single: -fcmm-sink
+
+ *On by default.*. Enables the sinking pass in the code generator.
+ This optimisation attempts to find identical Cmm blocks and
+ eliminate the duplicates attempts to move variable bindings closer
+ to their usage sites. It also inlines simple expressions like
+ literals or registers.
+
+``-fcpr-off``
+ .. index::
+ single: -fcpr-Off
+
+ Switch off CPR analysis in the demand analyser.
+
+``-fcse``
+ .. index::
+ single: -fcse
+
+ *On by default.*. Enables the common-sub-expression elimination
+ optimisation. Switching this off can be useful if you have some
+ ``unsafePerformIO`` expressions that you don't want commoned-up.
+
+``-fdicts-cheap``
+ .. index::
+ single: -fdicts-cheap
+
+ A very experimental flag that makes dictionary-valued expressions
+ seem cheap to the optimiser.
+
+``-fdicts-strict``
+ .. index::
+ single: -fdicts-strict
+
+ Make dictionaries strict.
+
+``-fdmd-tx-dict-sel``
+ .. index::
+ single: -fdmd-tx-dict-sel
+
+ *On by default for ``-O0``, ``-O``, ``-O2``.*
+
+ Use a special demand transformer for dictionary selectors.
+
+``-fdo-eta-reduction``
+ .. index::
+ single: -fdo-eta-reduction
+
+ *On by default.* Eta-reduce lambda expressions, if doing so gets rid
+ of a whole group of lambdas.
+
+``-fdo-lambda-eta-expansion``
+ .. index::
+ single: -fdo-lambda-eta-expansion
+
+ *On by default.* Eta-expand let-bindings to increase their arity.
+
+``-feager-blackholing``
+ .. index::
+ single: -feager-blackholing
+
+ Usually GHC black-holes a thunk only when it switches threads. This
+ flag makes it do so as soon as the thunk is entered. See `Haskell on
+ a shared-memory
+ multiprocessor <http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/>`__.
+
+``-fexcess-precision``
+ .. index::
+ single: -fexcess-precision
+
+ When this option is given, intermediate floating point values can
+ have a *greater* precision/range than the final type. Generally this
+ is a good thing, but some programs may rely on the exact
+ precision/range of ``Float``/``Double`` values and should not use
+ this option for their compilation.
+
+ Note that the 32-bit x86 native code generator only supports
+ excess-precision mode, so neither ``-fexcess-precision`` nor
+ ``-fno-excess-precision`` has any effect. This is a known bug, see
+ :ref:`bugs-ghc`.
+
+``-fexpose-all-unfoldings``
+ .. index::
+ single: -fexpose-all-unfoldings
+
+ An experimental flag to expose all unfoldings, even for very large
+ or recursive functions. This allows for all functions to be inlined
+ while usually GHC would avoid inlining larger functions.
+
+``-ffloat-in``
+ .. index::
+ single: -ffloat-in
+
+ *On by default.* Float let-bindings inwards, nearer their binding
+ site. See `Let-floating: moving bindings to give faster programs
+ (ICFP'96) <http://research.microsoft.com/en-us/um/people/simonpj/papers/float.ps.gz>`__.
+
+ This optimisation moves let bindings closer to their use site. The
+ benefit here is that this may avoid unnecessary allocation if the
+ branch the let is now on is never executed. It also enables other
+ optimisation passes to work more effectively as they have more
+ information locally.
+
+ This optimisation isn't always beneficial though (so GHC applies
+ some heuristics to decide when to apply it). The details get
+ complicated but a simple example is that it is often beneficial to
+ move let bindings outwards so that multiple let bindings can be
+ grouped into a larger single let binding, effectively batching their
+ allocation and helping the garbage collector and allocator.
+
+``-ffull-laziness``
+ .. index::
+ single: -ffull-laziness
+
+ *On by default.* Run the full laziness optimisation (also known as
+ let-floating), which floats let-bindings outside enclosing lambdas,
+ in the hope they will be thereby be computed less often. See
+ `Let-floating: moving bindings to give faster programs
+ (ICFP'96) <http://research.microsoft.com/en-us/um/people/simonpj/papers/float.ps.gz>`__.
+ Full laziness increases sharing, which can lead to increased memory
+ residency.
+
+ .. note::
+ GHC doesn't implement complete full-laziness. When
+ optimisation in on, and ``-fno-full-laziness`` is not given, some
+ transformations that increase sharing are performed, such as
+ extracting repeated computations from a loop. These are the same
+ transformations that a fully lazy implementation would do, the
+ difference is that GHC doesn't consistently apply full-laziness, so
+ don't rely on it.
+
+``-ffun-to-thunk``
+ .. index::
+ single: -ffun-to-thunk
+
+ Worker-wrapper removes unused arguments, but usually we do not
+ remove them all, lest it turn a function closure into a thunk,
+ thereby perhaps creating a space leak and/or disrupting inlining.
+ This flag allows worker/wrapper to remove *all* value lambdas. Off
+ by default.
+
+``-fignore-asserts``
+ .. index::
+ single: -fignore-asserts
+
+ *On by default.*. Causes GHC to ignore uses of the function
+ ``Exception.assert`` in source code (in other words, rewriting
+ ``Exception.assert p e`` to ``e`` (see :ref:`assertions`).
+
+``-fignore-interface-pragmas``
+ .. index::
+ single: -fignore-interface-pragmas
+
+ Tells GHC to ignore all inessential information when reading
+ interface files. That is, even if ``M.hi`` contains unfolding or
+ strictness information for a function, GHC will ignore that
+ information.
+
+``-flate-dmd-anal``
+ .. index::
+ single: -flate-dmd-anal
+
+ Run demand analysis again, at the end of the simplification
+ pipeline. We found some opportunities for discovering strictness
+ that were not visible earlier; and optimisations like
+ ``-fspec-constr`` can create functions with unused arguments which
+ are eliminated by late demand analysis. Improvements are modest, but
+ so is the cost. See notes on the :ghc-wiki:`Trac wiki page <LateDmd>`.
+
+``-fliberate-case``
+ .. index::
+ single: -fliberate-case
+
+ *Off by default, but enabled by -O2.* Turn on the liberate-case
+ transformation. This unrolls recursive function once in its own RHS,
+ to avoid repeated case analysis of free variables. It's a bit like
+ the call-pattern specialiser (``-fspec-constr``) but for free
+ variables rather than arguments.
+
+``-fliberate-case-threshold=n``
+ .. index::
+ single: -fliberate-case-threshold
+
+ *default: 2000.* Set the size threshold for the liberate-case
+ transformation.
+
+``-floopification``
+ .. index::
+ single: -floopification
+
+ *On by default.*
+
+ When this optimisation is enabled the code generator will turn all
+ self-recursive saturated tail calls into local jumps rather than
+ function calls.
+
+``-fmax-inline-alloc-size=n``
+ .. index::
+ single: -fmax-inline-alloc-size
+
+ *default: 128.* Set the maximum size of inline array allocations to n bytes.
+ GHC will allocate non-pinned arrays of statically known size in the current
+ nursery block if they're no bigger than n bytes, ignoring GC overheap. This
+ value should be quite a bit smaller than the block size (typically: 4096).
+
+``-fmax-inline-memcpy-insn=n``
+ .. index::
+ single: -fmax-inline-memcpy-insn
+
+ *default: 32.* Inline ``memcpy`` calls if they would generate no more than n pseudo
+ instructions.
+
+``-fmax-inline-memset-insns=n``
+ .. index::
+ single: -fmax-inline-memset-insns
+
+ *default: 32.* Inline ``memset`` calls if they would generate no more than n pseudo
+ instructions.
+
+``-fmax-relevant-binds=n``
+ .. index::
+ single: -fmax-relevant-bindings
+
+ The type checker sometimes displays a fragment of the type
+ environment in error messages, but only up to some maximum number,
+ set by this flag. The default is 6. Turning it off with
+ ``-fno-max-relevant-bindings`` gives an unlimited number.
+ Syntactically top-level bindings are also usually excluded (since
+ they may be numerous), but ``-fno-max-relevant-bindings`` includes
+ them too.
+
+``-fmax-simplifier-iterations=n``
+ .. index::
+ single: -fmax-simplifier-iterations
+
+ *default: 4.* Sets the maximal number of iterations for the simplifier.
+
+``-fmax-worker-args=n``
+ .. index::
+ single: -fmax-worker-args
+
+ *default: 10.* If a worker has that many arguments, none will be unpacked
+ anymore.
+
+``-fno-opt-coercion``
+ .. index::
+ single: -fno-opt-coercion
+
+ Turn off the coercion optimiser.
+
+``-fno-pre-inlining``
+ .. index::
+ single: -fno-pre-inlining
+
+ Turn off pre-inlining.
+
+``-fno-state-hack``
+ .. index::
+ single: -fno-state-hack
+
+ Turn off the "state hack" whereby any lambda with a ``State#`` token
+ as argument is considered to be single-entry, hence it is considered
+ OK to inline things inside it. This can improve performance of IO
+ and ST monad code, but it runs the risk of reducing sharing.
+
+``-fomit-interface-pragmas``
+ .. index::
+ single: -fomit-interface-pragmas
+
+ Tells GHC to omit all inessential information from the interface
+ file generated for the module being compiled (say M). This means
+ that a module importing M will see only the *types* of the functions
+ that M exports, but not their unfoldings, strictness info, etc.
+ Hence, for example, no function exported by M will be inlined into
+ an importing module. The benefit is that modules that import M will
+ need to be recompiled less often (only when M's exports change their
+ type, not when they change their implementation).
+
+``-fomit-yields``
+ .. index::
+ single: -fomit-yields
+
+ *On by default.* Tells GHC to omit heap checks when no allocation is
+ being performed. While this improves binary sizes by about 5%, it
+ also means that threads run in tight non-allocating loops will not
+ get preempted in a timely fashion. If it is important to always be
+ able to interrupt such threads, you should turn this optimization
+ off. Consider also recompiling all libraries with this optimization
+ turned off, if you need to guarantee interruptibility.
+
+``-fpedantic-bottoms``
+ .. index::
+ single: -fpedantic-bottoms
+
+ Make GHC be more precise about its treatment of bottom (but see also
+ ``-fno-state-hack``). In particular, stop GHC eta-expanding through
+ a case expression, which is good for performance, but bad if you are
+ using ``seq`` on partial applications.
+
+``-fregs-graph``
+ .. index::
+ single: -fregs-graph
+
+ *Off by default due to a performance regression bug. Only applies in
+ combination with the native code generator.* Use the graph colouring
+ register allocator for register allocation in the native code
+ generator. By default, GHC uses a simpler, faster linear register
+ allocator. The downside being that the linear register allocator
+ usually generates worse code.
+
+``-fregs-iterative``
+ .. index::
+ single: -fregs-iterative
+
+ *Off by default, only applies in combination with the native code
+ generator.* Use the iterative coalescing graph colouring register
+ allocator for register allocation in the native code generator. This
+ is the same register allocator as the ``-fregs-graph`` one but also
+ enables iterative coalescing during register allocation.
+
+``-fsimplifier-phases=n``
+ .. index::
+ single: -fsimplifier-phases
+
+ *default: 2.* Set the number of phases for the simplifier. Ignored
+ with -O0.
+
+``-fsimpl-tick-factor=n``
+ .. index::
+ single: -fsimpl-tick-factor
+
+ *default: 100.* GHC's optimiser can diverge if you write rewrite rules
+ (:ref:`rewrite-rules`) that don't terminate, or (less satisfactorily)
+ if you code up recursion through data types (:ref:`bugs-ghc`). To
+ avoid making the compiler fall into an infinite loop, the optimiser
+ carries a "tick count" and stops inlining and applying rewrite rules
+ when this count is exceeded. The limit is set as a multiple of the
+ program size, so bigger programs get more ticks. The
+ ``-fsimpl-tick-factor`` flag lets you change the multiplier. The
+ default is 100; numbers larger than 100 give more ticks, and numbers
+ smaller than 100 give fewer.
+
+ If the tick-count expires, GHC summarises what simplifier steps it
+ has done; you can use ``-fddump-simpl-stats`` to generate a much
+ more detailed list. Usually that identifies the loop quite
+ accurately, because some numbers are very large.
+
+``-fspec-constr``
+ .. index::
+ single: -fspec-constr
+
+ *Off by default, but enabled by -O2.* Turn on call-pattern
+ specialisation; see `Call-pattern specialisation for Haskell
+ programs <http://research.microsoft.com/en-us/um/people/simonpj/papers/spec-constr/index.htm>`__.
+
+ This optimisation specializes recursive functions according to their
+ argument "shapes". This is best explained by example so consider:
+
+ ::
+
+ last :: [a] -> a
+ last [] = error "last"
+ last (x : []) = x
+ last (x : xs) = last xs
+
+ In this code, once we pass the initial check for an empty list we
+ know that in the recursive case this pattern match is redundant. As
+ such ``-fspec-constr`` will transform the above code to:
+
+ ::
+
+ last :: [a] -> a
+ last [] = error "last"
+ last (x : xs) = last' x xs
+ where
+ last' x [] = x
+ last' x (y : ys) = last' y ys
+
+ As well avoid unnecessary pattern matching it also helps avoid
+ unnecessary allocation. This applies when a argument is strict in
+ the recursive call to itself but not on the initial entry. As strict
+ recursive branch of the function is created similar to the above
+ example.
+
+ It is also possible for library writers to instruct GHC to perform
+ call-pattern specialisation extremely aggressively. This is
+ necessary for some highly optimized libraries, where we may want to
+ specialize regardless of the number of specialisations, or the size
+ of the code. As an example, consider a simplified use-case from the
+ ``vector`` library:
+
+ ::
+
+ import GHC.Types (SPEC(..))
+
+ foldl :: (a -> b -> a) -> a -> Stream b -> a
+ {-# INLINE foldl #-}
+ foldl f z (Stream step s _) = foldl_loop SPEC z s
+ where
+ foldl_loop !sPEC z s = case step s of
+ Yield x s' -> foldl_loop sPEC (f z x) s'
+ Skip -> foldl_loop sPEC z s'
+ Done -> z
+
+ Here, after GHC inlines the body of ``foldl`` to a call site, it
+ will perform call-pattern specialisation very aggressively on
+ ``foldl_loop`` due to the use of ``SPEC`` in the argument of the
+ loop body. ``SPEC`` from ``GHC.Types`` is specifically recognised by
+ the compiler.
+
+ (NB: it is extremely important you use ``seq`` or a bang pattern on
+ the ``SPEC`` argument!)
+
+ In particular, after inlining this will expose ``f`` to the loop
+ body directly, allowing heavy specialisation over the recursive
+ cases.
+
+``-fspec-constr-count=n``
+ .. index::
+ single: -fspec-constr-count
+
+ *default: 3.* Set the maximum number of specialisations that will be created for
+ any one function by the SpecConstr transformation.
+
+``-fspec-constr-threshold=n``
+ .. index::
+ single: -fspec-constr-threshold
+
+ *default: 2000.* Set the size threshold for the SpecConstr transformation.
+
+``-fspecialise``
+ .. index::
+ single: -fspecialise
+
+ *On by default.* Specialise each type-class-overloaded function
+ defined in this module for the types at which it is called in this
+ module. If ``-fcross-module-specialise`` is set imported functions
+ that have an INLINABLE pragma (:ref:`inlinable-pragma`) will be
+ specialised as well.
+
+``-fcross-module-specialise``
+ .. index::
+ single: -fcross-module-specialise
+
+ *On by default.* Specialise ``INLINABLE`` (:ref:`inlinable-pragma`)
+ type-class-overloaded functions imported from other modules for the types at
+ which they are called in this module. Note that specialisation must be
+ enabled (by ``-fspecialise``) for this to have any effect.
+
+``-fstatic-argument-transformation``
+ .. index::
+ single: -fstatic-argument-transformation
+
+ Turn on the static argument transformation, which turns a recursive
+ function into a non-recursive one with a local recursive loop. See
+ Chapter 7 of `Andre Santos's PhD
+ thesis <http://research.microsoft.com/en-us/um/people/simonpj/papers/santos-thesis.ps.gz>`__
+
+``-fstrictness``
+ .. index::
+ single: -fstrictness
+
+ *On by default.*. Switch on the strictness analyser. There is a very
+ old paper about GHC's strictness analyser, `Measuring the
+ effectiveness of a simple strictness
+ analyser <http://research.microsoft.com/en-us/um/people/simonpj/papers/simple-strictnes-analyser.ps.gz>`__,
+ but the current one is quite a bit different.
+
+ The strictness analyser figures out when arguments and variables in
+ a function can be treated 'strictly' (that is they are always
+ evaluated in the function at some point). This allow GHC to apply
+ certain optimisations such as unboxing that otherwise don't apply as
+ they change the semantics of the program when applied to lazy
+ arguments.
+
+``-fstrictness-before=⟨n⟩``
+ .. index::
+ single: -fstrictness-before
+
+ Run an additional strictness analysis before simplifier phase ⟨n⟩.
+
+``-funbox-small-strict-fields``
+ .. index::
+ single: -funbox-small-strict-fields
+ single: strict constructor fields
+ single: constructor fields, strict
+
+ *On by default.*. This option causes all constructor fields which
+ are marked strict (i.e. “!”) and which representation is smaller or
+ equal to the size of a pointer to be unpacked, if possible. It is
+ equivalent to adding an ``UNPACK`` pragma (see :ref:`unpack-pragma`)
+ to every strict constructor field that fulfils the size restriction.
+
+ For example, the constructor fields in the following data types
+
+ ::
+
+ data A = A !Int
+ data B = B !A
+ newtype C = C B
+ data D = D !C
+
+ would all be represented by a single ``Int#`` (see
+ :ref:`primitives`) value with ``-funbox-small-strict-fields``
+ enabled.
+
+ This option is less of a sledgehammer than
+ ``-funbox-strict-fields``: it should rarely make things worse. If
+ you use ``-funbox-small-strict-fields`` to turn on unboxing by
+ default you can disable it for certain constructor fields using the
+ ``NOUNPACK`` pragma (see :ref:`nounpack-pragma`).
+
+ Note that for consistency ``Double``, ``Word64``, and ``Int64``
+ constructor fields are unpacked on 32-bit platforms, even though
+ they are technically larger than a pointer on those platforms.
+
+``-funbox-strict-fields``
+ .. index::
+ single: -funbox-strict-fields
+ single: strict constructor fields
+ single: constructor fields, strict
+
+ This option causes all constructor fields which are marked strict
+ (i.e. “!”) to be unpacked if possible. It is equivalent to adding an
+ ``UNPACK`` pragma to every strict constructor field (see
+ :ref:`unpack-pragma`).
+
+ This option is a bit of a sledgehammer: it might sometimes make
+ things worse. Selectively unboxing fields by using ``UNPACK``
+ pragmas might be better. An alternative is to use
+ ``-funbox-strict-fields`` to turn on unboxing by default but disable
+ it for certain constructor fields using the ``NOUNPACK`` pragma (see
+ :ref:`nounpack-pragma`).
+
+``-funfolding-creation-threshold=n``
+ .. index::
+ single: -funfolding-creation-threshold
+ single: inlining, controlling
+ single: unfolding, controlling
+
+ *default: 750.* Governs the maximum size that GHC will allow a
+ function unfolding to be. (An unfolding has a “size” that reflects
+ the cost in terms of “code bloat” of expanding (aka inlining) that
+ unfolding at a call site. A bigger function would be assigned a
+ bigger cost.)
+
+ Consequences: (a) nothing larger than this will be inlined (unless
+ it has an INLINE pragma); (b) nothing larger than this will be
+ spewed into an interface file.
+
+ Increasing this figure is more likely to result in longer compile
+ times than faster code. The ``-funfolding-use-threshold`` is more
+ useful.
+
+``-funfolding-dict-discount=n``
+ .. index::
+ single: -funfolding-dict-discount
+ single: inlining, controlling
+ single: unfolding, controlling
+
+ Default: 30
+
+``-funfolding-fun-discount=n``
+ .. index::
+ single: -funfolding-fun-discount
+ single: inlining, controlling
+ single: unfolding, controlling
+
+ Default: 60
+
+``-funfolding-keeness-factor=n``
+ .. index::
+ single: -funfolding-keeness-factor
+ single: inlining, controlling
+ single: unfolding, controlling
+
+ Default: 1.5
+
+``-funfolding-use-threshold=n``
+ .. index::
+ single: -funfolding-use-threshold
+ single: inlining, controlling
+ single: unfolding, controlling
+
+ *default: 60.* This is the magic cut-off figure for unfolding (aka
+ inlining): below this size, a function definition will be unfolded
+ at the call-site, any bigger and it won't. The size computed for a
+ function depends on two things: the actual size of the expression
+ minus any discounts that apply depending on the context into which
+ the expression is to be inlined.
+
+ The difference between this and ``-funfolding-creation-threshold``
+ is that this one determines if a function definition will be inlined
+ *at a call site*. The other option determines if a function
+ definition will be kept around at all for potential inlining.
+
+``-fvectorisation-avoidance``
+ .. index::
+ single: -fvectorisation-avoidance
+
+ Part of :ref:`Data Parallel Haskell (DPH) <dph>`.
+
+ *On by default.* Enable the *vectorisation* avoidance optimisation.
+ This optimisation only works when used in combination with the
+ ``-fvectorise`` transformation.
+
+ While vectorisation of code using DPH is often a big win, it can
+ also produce worse results for some kinds of code. This optimisation
+ modifies the vectorisation transformation to try to determine if a
+ function would be better of unvectorised and if so, do just that.
+
+``-fvectorise``
+ .. index::
+ single: -fvectorise
+
+ Part of :ref:`Data Parallel Haskell (DPH) <dph>`.
+
+ *Off by default.* Enable the *vectorisation* optimisation
+ transformation. This optimisation transforms the nested data
+ parallelism code of programs using DPH into flat data parallelism.
+ Flat data parallel programs should have better load balancing,
+ enable SIMD parallelism and friendlier cache behaviour.