diff options
author | sheaf <sam.derbyshire@gmail.com> | 2023-04-08 13:42:58 +0200 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2023-05-11 11:55:22 -0400 |
commit | 87eebf98cb485f7c9175330051736e147ade9848 (patch) | |
tree | ffa226b3fefa8b0a03e1798fa4f55affbddf654b /docs/users_guide | |
parent | 630b1fea1e41a1e00860a30742b6ab8ade8a0de0 (diff) | |
download | haskell-87eebf98cb485f7c9175330051736e147ade9848.tar.gz |
Add fused multiply-add instructions
This patch adds eight new primops that fuse a multiplication and an
addition or subtraction:
- `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#`
fmadd x y z is x * y + z, computed with a single rounding step.
This patch implements code generation for these primops in the following
backends:
- X86, AArch64 and PowerPC NCG,
- LLVM
- C
WASM uses the C implementation. The primops are unsupported in the
JavaScript backend.
The following constant folding rules are also provided:
- compute a * b + c when a, b, c are all literals,
- x * y + 0 ==> x * y,
- ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x.
NB: the constant folding rules incorrectly handle signed zero.
This is a known limitation with GHC's floating-point constant folding
rules (#21227), which we hope to resolve in the future.
Diffstat (limited to 'docs/users_guide')
-rw-r--r-- | docs/users_guide/9.8.1-notes.rst | 18 | ||||
-rw-r--r-- | docs/users_guide/using.rst | 18 |
2 files changed, 36 insertions, 0 deletions
diff --git a/docs/users_guide/9.8.1-notes.rst b/docs/users_guide/9.8.1-notes.rst index e7e9acdf75..84d9105efd 100644 --- a/docs/users_guide/9.8.1-notes.rst +++ b/docs/users_guide/9.8.1-notes.rst @@ -142,6 +142,24 @@ Runtime system - ``sameMutVar#``, ``sameTVar#``, ``sameMVar#`` - ``sameIOPort#``, ``eqStableName#``. +- New primops for fused multiply-add operations. These primops combine a + multiplication and an addition, compiling to a single instruction when + the ``-mfma`` flag is enabled and the architecture supports it. + + The new primops are ``fmaddFloat#, fmsubFloat#, fnmaddFloat#, fnmsubFloat# :: Float# -> Float# -> Float# -> Float#`` + and ``fmaddDouble#, fmsubDouble#, fnmaddDouble#, fnmsubDouble# :: Double# -> Double# -> Double# -> Double#``. + + These implement the following operations, while performing one single + rounding at the end, leading to a more accurate result: + + - ``fmaddFloat# x y z``, ``fmaddDouble# x y z`` compute ``x * y + z``. + - ``fmsubFloat# x y z``, ``fmsubDouble# x y z`` compute ``x * y - z``. + - ``fnmaddFloat# x y z``, ``fnmaddDouble# x y z`` compute ``- x * y + z``. + - ``fnmsubFloat# x y z``, ``fnmsubDouble# x y z`` compute ``- x * y - z``. + + Warning: on unsupported architectures, the software emulation provided by + the fallback to the C standard library is not guaranteed to be IEEE-compliant. + ``ghc`` library ~~~~~~~~~~~~~~~ diff --git a/docs/users_guide/using.rst b/docs/users_guide/using.rst index 787b6a0503..8de7dd3533 100644 --- a/docs/users_guide/using.rst +++ b/docs/users_guide/using.rst @@ -1732,6 +1732,24 @@ Some flags only make sense for particular target platforms. :ref:`native code generator <native-code-gen>`. The resulting compiled code will only run on processors that support BMI2 (Intel Haswell and newer, AMD Excavator, Zen and newer). +.. ghc-flag:: -mfma + :shortdesc: Use native FMA instructions for fused multiply-add floating-point operations + :type: dynamic + :category: platform-options + + :since: 9.8.1 + + Use native FMA instructions to implement the fused multiply-add floating-point + operations of the form ``x * y + z``. + This allows computing a multiplication and addition in a single instruction, + without an intermediate rounding step. + Supported architectures: X86 with the FMA3 instruction set (this includes + most consumer processors since 2013), PowerPC and AArch64. + + When this flag is disabled, GHC falls back to the C implementation of fused + multiply-add, which might perform non-IEEE-compliant software emulation on + some platforms (depending on the implementation of the C standard library). + Haddock ------- |