summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTakenobu Tani <takenobu.hs@gmail.com>2021-09-04 15:07:31 +0900
committerMarge Bot <ben+marge-bot@smart-cactus.org>2022-02-16 05:27:17 -0500
commitacb482cc198bc6fbc5b0f7d93fd493853c77671f (patch)
tree52da660543b2bf19394a3354d10484a4c1560ef1
parentef5cf55d71e84a0a42596b4ec253ecb0d63f149b (diff)
downloadhaskell-acb482cc198bc6fbc5b0f7d93fd493853c77671f.tar.gz
Relax load_load_barrier for aarch64
This patch relaxes the instruction for load_load_barrier(). Current load_load_barrier() implements full-barrier with `dmb sy`. It's too strong to order load-load instructions. We can relax it by using `dmb ld`. If current load_load_barrier() is used for full-barriers (load/store - load/store barrier), this patch is not suitable. See also linux-kernel's smp_rmb() implementation: https://github.com/torvalds/linux/blob/v5.14/arch/arm64/include/asm/barrier.h#L90 Hopefully, it's better to use `dmb ishld` rather than `dmb ld` to improve performance. However, I can't validate effects on a real many-core Arm machine.
-rw-r--r--rts/include/stg/SMP.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/rts/include/stg/SMP.h b/rts/include/stg/SMP.h
index f672009c76..696ec737ac 100644
--- a/rts/include/stg/SMP.h
+++ b/rts/include/stg/SMP.h
@@ -438,7 +438,7 @@ load_load_barrier(void) {
#elif defined(arm_HOST_ARCH)
__asm__ __volatile__ ("dmb" : : : "memory");
#elif defined(aarch64_HOST_ARCH)
- __asm__ __volatile__ ("dmb sy" : : : "memory");
+ __asm__ __volatile__ ("dmb ld" : : : "memory");
#elif defined(riscv64_HOST_ARCH)
__asm__ __volatile__ ("fence r,r" : : : "memory");
#else