summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/users_guide/9.8.1-notes.rst18
-rw-r--r--docs/users_guide/using.rst18
2 files changed, 36 insertions, 0 deletions
diff --git a/docs/users_guide/9.8.1-notes.rst b/docs/users_guide/9.8.1-notes.rst
index e7e9acdf75..84d9105efd 100644
--- a/docs/users_guide/9.8.1-notes.rst
+++ b/docs/users_guide/9.8.1-notes.rst
@@ -142,6 +142,24 @@ Runtime system
- ``sameMutVar#``, ``sameTVar#``, ``sameMVar#``
- ``sameIOPort#``, ``eqStableName#``.
+- New primops for fused multiply-add operations. These primops combine a
+ multiplication and an addition, compiling to a single instruction when
+ the ``-mfma`` flag is enabled and the architecture supports it.
+
+ The new primops are ``fmaddFloat#, fmsubFloat#, fnmaddFloat#, fnmsubFloat# :: Float# -> Float# -> Float# -> Float#``
+ and ``fmaddDouble#, fmsubDouble#, fnmaddDouble#, fnmsubDouble# :: Double# -> Double# -> Double# -> Double#``.
+
+ These implement the following operations, while performing one single
+ rounding at the end, leading to a more accurate result:
+
+ - ``fmaddFloat# x y z``, ``fmaddDouble# x y z`` compute ``x * y + z``.
+ - ``fmsubFloat# x y z``, ``fmsubDouble# x y z`` compute ``x * y - z``.
+ - ``fnmaddFloat# x y z``, ``fnmaddDouble# x y z`` compute ``- x * y + z``.
+ - ``fnmsubFloat# x y z``, ``fnmsubDouble# x y z`` compute ``- x * y - z``.
+
+ Warning: on unsupported architectures, the software emulation provided by
+ the fallback to the C standard library is not guaranteed to be IEEE-compliant.
+
``ghc`` library
~~~~~~~~~~~~~~~
diff --git a/docs/users_guide/using.rst b/docs/users_guide/using.rst
index 787b6a0503..8de7dd3533 100644
--- a/docs/users_guide/using.rst
+++ b/docs/users_guide/using.rst
@@ -1732,6 +1732,24 @@ Some flags only make sense for particular target platforms.
:ref:`native code generator <native-code-gen>`. The resulting compiled
code will only run on processors that support BMI2 (Intel Haswell and newer, AMD Excavator, Zen and newer).
+.. ghc-flag:: -mfma
+ :shortdesc: Use native FMA instructions for fused multiply-add floating-point operations
+ :type: dynamic
+ :category: platform-options
+
+ :since: 9.8.1
+
+ Use native FMA instructions to implement the fused multiply-add floating-point
+ operations of the form ``x * y + z``.
+ This allows computing a multiplication and addition in a single instruction,
+ without an intermediate rounding step.
+ Supported architectures: X86 with the FMA3 instruction set (this includes
+ most consumer processors since 2013), PowerPC and AArch64.
+
+ When this flag is disabled, GHC falls back to the C implementation of fused
+ multiply-add, which might perform non-IEEE-compliant software emulation on
+ some platforms (depending on the implementation of the C standard library).
+
Haddock
-------