Add fused multiply-add instructions

This patch adds eight new primops that fuse a multiplication and an addition or subtraction: - `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#` fmadd x y z is x * y + z, computed with a single rounding step. This patch implements code generation for these primops in the following backends: - X86, AArch64 and PowerPC NCG, - LLVM - C WASM uses the C implementation. The primops are unsupported in the JavaScript backend. The following constant folding rules are also provided: - compute a * b + c when a, b, c are all literals, - x * y + 0 ==> x * y, - ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x. NB: the constant folding rules incorrectly handle signed zero. This is a known limitation with GHC's floating-point constant folding rules (#21227), which we hope to resolve in the future.
author: sheaf <sam.derbyshire@gmail.com> 2023-04-08 13:42:58 +0200
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2023-05-11 11:55:22 -0400
commit: 87eebf98cb485f7c9175330051736e147ade9848 (patch)
tree: ffa226b3fefa8b0a03e1798fa4f55affbddf654b /docs
parent: 630b1fea1e41a1e00860a30742b6ab8ade8a0de0 (diff)
download: haskell-87eebf98cb485f7c9175330051736e147ade9848.tar.gz
2 files changed, 36 insertions, 0 deletions
diff --git a/docs/users_guide/9.8.1-notes.rst b/docs/users_guide/9.8.1-notes.rst
index e7e9acdf75..84d9105efd 100644
--- a/docs/users_guide/9.8.1-notes.rst
+++ b/docs/users_guide/9.8.1-notes.rst
@@ -142,6 +142,24 @@ Runtime system
     - ``sameMutVar#``, ``sameTVar#``, ``sameMVar#``
     - ``sameIOPort#``, ``eqStableName#``.
 
+- New primops for fused multiply-add operations. These primops combine a
+  multiplication and an addition, compiling to a single instruction when
+  the ``-mfma`` flag is enabled and the architecture supports it.
+
+  The new primops are ``fmaddFloat#, fmsubFloat#, fnmaddFloat#, fnmsubFloat# :: Float# -> Float# -> Float# -> Float#``
+  and ``fmaddDouble#, fmsubDouble#, fnmaddDouble#, fnmsubDouble# :: Double# -> Double# -> Double# -> Double#``.
+
+  These implement the following operations, while performing one single
+  rounding at the end, leading to a more accurate result:
+
+    - ``fmaddFloat# x y z``, ``fmaddDouble# x y z`` compute ``x * y + z``.
+    - ``fmsubFloat# x y z``, ``fmsubDouble# x y z`` compute ``x * y - z``.
+    - ``fnmaddFloat# x y z``, ``fnmaddDouble# x y z`` compute ``- x * y + z``.
+    - ``fnmsubFloat# x y z``, ``fnmsubDouble# x y z`` compute ``- x * y - z``.
+
+  Warning: on unsupported architectures, the software emulation provided by
+  the fallback to the C standard library is not guaranteed to be IEEE-compliant.
+
 ``ghc`` library
 ~~~~~~~~~~~~~~~
 
diff --git a/docs/users_guide/using.rst b/docs/users_guide/using.rst
index 787b6a0503..8de7dd3533 100644
--- a/docs/users_guide/using.rst
+++ b/docs/users_guide/using.rst
@@ -1732,6 +1732,24 @@ Some flags only make sense for particular target platforms.
     :ref:`native code generator <native-code-gen>`. The resulting compiled
     code will only run on processors that support BMI2 (Intel Haswell and newer, AMD Excavator, Zen and newer).
 
+.. ghc-flag:: -mfma
+    :shortdesc: Use native FMA instructions for fused multiply-add floating-point operations
+    :type: dynamic
+    :category: platform-options
+
+    :since: 9.8.1
+
+    Use native FMA instructions to implement the fused multiply-add floating-point
+    operations of the form ``x * y + z``.
+    This allows computing a multiplication and addition in a single instruction,
+    without an intermediate rounding step.
+    Supported architectures: X86 with the FMA3 instruction set (this includes
+    most consumer processors since 2013), PowerPC and AArch64.
+
+    When this flag is disabled, GHC falls back to the C implementation of fused
+    multiply-add, which might perform non-IEEE-compliant software emulation on
+    some platforms (depending on the implementation of the C standard library).
+
 Haddock
 -------
author	sheaf <sam.derbyshire@gmail.com>	2023-04-08 13:42:58 +0200
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2023-05-11 11:55:22 -0400
commit	87eebf98cb485f7c9175330051736e147ade9848 (patch)
tree	ffa226b3fefa8b0a03e1798fa4f55affbddf654b /docs
parent	630b1fea1e41a1e00860a30742b6ab8ade8a0de0 (diff)
download	haskell-87eebf98cb485f7c9175330051736e147ade9848.tar.gz