summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDavid M Peixotto <dmp@rice.edu>2011-10-19 15:49:06 -0500
committerDavid Terei <davidterei@gmail.com>2011-11-01 03:18:40 -0700
commita9ce36118f0de3aeb427792f8f2c5ae097c94d3f (patch)
treed03c1697a04df842b21bafa214f22140473a2e0d
parentf0ae3f31277ebfe2384fca3f89867f340ae9b492 (diff)
downloadhaskell-a9ce36118f0de3aeb427792f8f2c5ae097c94d3f.tar.gz
Change stack alignment to 16+8 bytes in STG code
This patch changes the STG code so that %rsp to be aligned to a 16-byte boundary + 8. This is the alignment required by the x86_64 ABI on entry to a function. Previously we kept %rsp aligned to a 16-byte boundary, but this was causing problems for the LLVM backend (see #4211). We now don't need to invoke llvm stack mangler on x86_64 targets. Since the stack is now 16+8 byte algined in STG land on x86_64, we don't need to mangle the stack manipulations with the llvm mangler. This patch only modifies the alignement for x86_64 backends. Signed-off-by: David Terei <davidterei@gmail.com>
-rw-r--r--compiler/llvmGen/LlvmMangler.hs6
-rw-r--r--compiler/nativeGen/X86/CodeGen.hs16
-rw-r--r--rts/StgCRun.c46
3 files changed, 39 insertions, 29 deletions
diff --git a/compiler/llvmGen/LlvmMangler.hs b/compiler/llvmGen/LlvmMangler.hs
index 68e92cf651..981bbf2858 100644
--- a/compiler/llvmGen/LlvmMangler.hs
+++ b/compiler/llvmGen/LlvmMangler.hs
@@ -143,11 +143,13 @@ fixTables ss = fixed
have been pushed, so sub 4). GHC though since it always uses jumps keeps
the stack 16 byte aligned on both function calls and function entry.
- We correct the alignment here.
+ We correct the alignment here for Mac OS X i386. The x86_64 target already
+ has the correct alignment since we keep the stack 16+8 aligned throughout
+ STG land for 64-bit targets.
-}
fixupStack :: B.ByteString -> B.ByteString -> B.ByteString
-#if !darwin_TARGET_OS
+#if !darwin_TARGET_OS || x86_64_TARGET_ARCH
fixupStack = const
#else
diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs
index 1efa327002..458f379380 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1842,15 +1842,17 @@ genCCall64 target dest_regs args =
tot_arg_size = arg_size * length stack_args
-- On entry to the called function, %rsp should be aligned
- -- on a 16-byte boundary +8 (i.e. the first stack arg after
- -- the return address is 16-byte aligned). In STG land
- -- %rsp is kept 16-byte aligned (see StgCRun.c), so we just
- -- need to make sure we push a multiple of 16-bytes of args,
- -- plus the return address, to get the correct alignment.
+ -- on a 16-byte boundary +8 (i.e. the first stack arg
+ -- above the return address is 16-byte aligned). In STG
+ -- land %rsp is kept 8-byte aligned (see StgCRun.c), so we
+ -- just need to make sure we pad by eight bytes after
+ -- pushing a multiple of 16-bytes of args to get the
+ -- correct alignment. If we push an odd number of eight byte
+ -- arguments then no padding is needed.
-- Urg, this is hard. We need to feed the delta back into
-- the arg pushing code.
(real_size, adjust_rsp) <-
- if tot_arg_size `rem` 16 == 0
+ if (tot_arg_size + 8) `rem` 16 == 0
then return (tot_arg_size, nilOL)
else do -- we need to adjust...
delta <- getDeltaNat
@@ -1865,7 +1867,7 @@ genCCall64 target dest_regs args =
delta <- getDeltaNat
-- deal with static vs dynamic call targets
- (callinsns,cconv) <-
+ (callinsns,_cconv) <-
case target of
CmmCallee (CmmLit (CmmLabel lbl)) conv
-> -- ToDo: stdcall arg sizes
diff --git a/rts/StgCRun.c b/rts/StgCRun.c
index 7251e64253..11e0543475 100644
--- a/rts/StgCRun.c
+++ b/rts/StgCRun.c
@@ -267,29 +267,36 @@ StgRunIsImplementedInAssembler(void)
"addq %0, %%rsp\n\t"
"retq"
- : : "i"(RESERVED_C_STACK_BYTES+48+8 /*stack frame size*/));
+ : : "i"(RESERVED_C_STACK_BYTES+48 /*stack frame size*/));
/*
- HACK alert!
-
- The x86_64 ABI specifies that on a procedure call, %rsp is
+ The x86_64 ABI specifies that on entry to a procedure, %rsp is
aligned on a 16-byte boundary + 8. That is, the first
argument on the stack after the return address will be
- 16-byte aligned.
-
- Which should be fine: RESERVED_C_STACK_BYTES+48 is a multiple
- of 16 bytes.
+ 16-byte aligned.
+
+ We maintain the 16+8 stack alignment throughout the STG code.
+
+ When we call STG_RUN the stack will be aligned to 16+8. We used
+ to subtract an extra 8 bytes so that %rsp would be 16 byte
+ aligned at all times in STG land. This worked fine for the
+ native code generator which knew that the stack was already
+ aligned on 16 bytes when it generated calls to C functions.
+
+ This arrangemnt caused problems for the LLVM backend. The LLVM
+ code generator would assume that on entry to each function the
+ stack is aligned to 16+8 as required by the ABI. However, since
+ we only enter STG functions by jumping to them with tail calls,
+ the stack was actually aligned to a 16-byte boundary. The LLVM
+ backend had its own mangler that would post-process the
+ assembly code to fixup the stack manipulation code to mainain
+ the correct alignment (see #4211).
+
+ Therefore, we now now keep the stack aligned to 16+8 while in
+ STG land so that LLVM generates correct code without any
+ mangling. The native code generator can handle this alignment
+ just fine by making sure the stack is aligned to a 16-byte
+ boundary before it makes a C-call.
- BUT... when we do a C-call from STG land, gcc likes to put the
- stack alignment adjustment in the prolog. eg. if we're calling
- a function with arguments in regs, gcc will insert 'subq $8,%rsp'
- in the prolog, to keep %rsp aligned (the return address is 8
- bytes, remember). The mangler throws away the prolog, so we
- lose the stack alignment.
-
- The hack is to add this extra 8 bytes to our %rsp adjustment
- here, so that throughout STG code, %rsp is 16-byte aligned,
- ready for a C-call.
-
A quick way to see if this is wrong is to compile this code:
main = System.Exit.exitWith ExitSuccess
@@ -300,7 +307,6 @@ StgRunIsImplementedInAssembler(void)
stack isn't aligned, and calling exitWith from Haskell invokes
shutdownHaskellAndExit using a C call.
- Future gcc releases will almost certainly break this hack...
*/
}