diff options
author | Ben Gamari <ben@smart-cactus.org> | 2020-08-05 19:24:10 -0400 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2020-08-07 08:35:21 -0400 |
commit | 15b36de030ecdd60897bc7a6a02bdeabd0825be4 (patch) | |
tree | 6d1eb105b0b14840dfa133882d63f105408bf10b | |
parent | 6402c1240d5bd768b8fe8b4368413932bedbe107 (diff) | |
download | haskell-15b36de030ecdd60897bc7a6a02bdeabd0825be4.tar.gz |
nativeGen: One approach to fix #18527
Previously the code generator could produce corrupt C call sequences due
to register overlap between MachOp lowerings and the platform's calling
convention. We fix this using a hack described in Note [Evaluate C-call
arguments before placing in destination registers].
-rw-r--r-- | compiler/GHC/Cmm/MachOp.hs | 3 | ||||
-rw-r--r-- | compiler/GHC/CmmToAsm/X86/CodeGen.hs | 100 |
2 files changed, 95 insertions, 8 deletions
diff --git a/compiler/GHC/Cmm/MachOp.hs b/compiler/GHC/Cmm/MachOp.hs index 077f663161..558cb13a7e 100644 --- a/compiler/GHC/Cmm/MachOp.hs +++ b/compiler/GHC/Cmm/MachOp.hs @@ -45,6 +45,9 @@ native code generators to handle. Most operations are parameterised by the 'Width' that they operate on. Some operations have separate signed and unsigned versions, and float and integer versions. + +Note that there are variety of places in the native code generator where we +assume that the code produced for a MachOp does not introduce new blocks. -} data MachOp diff --git a/compiler/GHC/CmmToAsm/X86/CodeGen.hs b/compiler/GHC/CmmToAsm/X86/CodeGen.hs index fa1a7d4884..f210cebb2d 100644 --- a/compiler/GHC/CmmToAsm/X86/CodeGen.hs +++ b/compiler/GHC/CmmToAsm/X86/CodeGen.hs @@ -287,11 +287,11 @@ we construct as a separate data type and the actual control flow graph in the co Instead we now return the new basic block if a statement causes a change in the current block and use the block for all following statements. -For this reason genCCall is also split into two parts. -One for calls which *won't* change the basic blocks in -which successive instructions will be placed. -A different one for calls which *are* known to change the -basic block. +For this reason genCCall is also split into two parts. One for calls which +*won't* change the basic blocks in which successive instructions will be +placed (since they only evaluate CmmExpr, which can only contain MachOps, which +cannot introduce basic blocks in their lowerings). A different one for calls +which *are* known to change the basic block. -} @@ -1028,6 +1028,9 @@ getRegister' _ is32Bit (CmmMachOp mop [x, y]) = do -- dyadic MachOps tmp. This is likely to be better, because the reg alloc can eliminate this reg->reg move here (it won't eliminate the other one, because the move is into the fixed %ecx). + * in the case of C calls the use of ecx here can interfere with arguments. + We avoid this with the hack described in Note [Evaluate C-call + arguments before placing in destination registers] -} shift_code width instr x y{-amount-} = do x_code <- getAnyReg x @@ -2022,6 +2025,7 @@ genCCall is32Bit (PrimTarget (MO_AtomicRMW width amop)) arg <- getNewRegNat format arg_code <- getAnyReg n platform <- ncgPlatform <$> getConfig + let dst_r = getRegisterReg platform (CmmLocal dst) (code, lbl) <- op_code dst_r arg amode return (addr_code `appOL` arg_code arg `appOL` code, Just lbl) @@ -2667,9 +2671,12 @@ genCCall' _ is32Bit target dest_regs args bid = do return code _ -> panic "genCCall: Wrong number of arguments/results for imul2" - _ -> if is32Bit - then genCCall32' target dest_regs args - else genCCall64' target dest_regs args + _ -> do + (instrs0, args') <- evalArgs bid args + instrs1 <- if is32Bit + then genCCall32' target dest_regs args' + else genCCall64' target dest_regs args' + return (instrs0 `appOL` instrs1) where divOp1 platform signed width results [arg_x, arg_y] = divOp platform signed width results Nothing arg_x arg_y @@ -2732,6 +2739,83 @@ genCCall' _ is32Bit target dest_regs args bid = do addSubIntC _ _ _ _ _ _ _ _ = panic "genCCall: Wrong number of arguments/results for addSubIntC" +{- +Note [Evaluate C-call arguments before placing in destination registers] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When producing code for C calls we must take care when placing arguments +in their final registers. Specifically, we must ensure that temporary register +usage due to evaluation of one argument does not clobber a register in which we +already placed a previous argument (e.g. as the code generation logic for +MO_Shl can clobber %rcx due to x86 instruction limitations). + +This is precisely what happened in #18527. Consider this C--: + + (result::I64) = call "ccall" doSomething(_s2hp::I64, 2244, _s2hq::I64, _s2hw::I64 | (1 << _s2hz::I64)); + +Here we are calling the C function `doSomething` with three arguments, the last +involving a non-trivial expression involving MO_Shl. In this case the NCG could +naively generate the following assembly (where $tmp denotes some temporary +register and $argN denotes the register for argument N, as dictated by the +platform's calling convention): + + mov _s2hp, $arg1 # place first argument + mov _s2hq, $arg2 # place second argument + + # Compute 1 << _s2hz + mov _s2hz, %rcx + shl %cl, $tmp + + # Compute (_s2hw | (1 << _s2hz)) + mov _s2hw, $arg3 + or $tmp, $arg3 + + # Perform the call + call func + +This code is outright broken on Windows which assigns $arg1 to %rcx. This means +that the evaluation of the last argument clobbers the first argument. + +To avoid this we use a rather awful hack: when producing code for a C call with +at least one non-trivial argument, we first evaluate all of the arguments into +local registers before moving them into their final calling-convention-defined +homes. This is performed by 'evalArgs'. Here we define "non-trivial" to be an +expression which might contain a MachOp since these are the only cases which +might clobber registers. Furthermore, we use a conservative approximation of +this condition (only looking at the top-level of CmmExprs) to avoid spending +too much effort trying to decide whether we want to take the fast path. + +Note that this hack *also* applies to calls to out-of-line PrimTargets (which +are lowered via a C call) since outOfLineCmmOp produces the call via +(stmtToInstrs (CmmUnsafeForeignCall ...)), which will ultimately end up +back in genCCall{32,64}. +-} + +-- | See Note [Evaluate C-call arguments before placing in destination registers] +evalArgs :: BlockId -> [CmmActual] -> NatM (InstrBlock, [CmmActual]) +evalArgs bid actuals + | any mightContainMachOp actuals = do + regs_blks <- mapM evalArg actuals + return (concatOL $ map fst regs_blks, map snd regs_blks) + | otherwise = return (nilOL, actuals) + where + mightContainMachOp (CmmReg _) = False + mightContainMachOp (CmmRegOff _ _) = False + mightContainMachOp (CmmLit _) = False + mightContainMachOp _ = True + + evalArg :: CmmActual -> NatM (InstrBlock, CmmExpr) + evalArg actual = do + platform <- getPlatform + lreg <- newLocalReg $ cmmExprType platform actual + (instrs, bid1) <- stmtToInstrs bid $ CmmAssign (CmmLocal lreg) actual + -- The above assignment shouldn't change the current block + MASSERT(isNothing bid1) + return (instrs, CmmReg $ CmmLocal lreg) + + newLocalReg :: CmmType -> NatM LocalReg + newLocalReg ty = LocalReg <$> getUniqueM <*> pure ty + -- Note [DIV/IDIV for bytes] -- -- IDIV reminder: |