summaryrefslogtreecommitdiff
path: root/compiler
diff options
context:
space:
mode:
authorReid Barton <rwbarton@gmail.com>2014-08-12 11:11:46 -0400
committerReid Barton <rwbarton@gmail.com>2014-08-12 11:11:50 -0400
commit64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1 (patch)
tree9cc420735b86bb776efe15784f84b7f13203a7cc /compiler
parent9f285fa40f6fb0c8495dbec771d798ac6dfaabee (diff)
downloadhaskell-64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1.tar.gz
x86: zero extend the result of 16-bit popcnt instructions (#9435)
Summary: The 'popcnt r16, r/m16' instruction only writes the low 16 bits of the destination register, so we have to zero-extend the result to a full word as popCnt16# is supposed to return a Word#. For popCnt8# we could instead zero-extend the input to 32 bits and then do a 32-bit popcnt, and not have to zero-extend the result. LLVM produces the 16-bit popcnt sequence with two zero extensions, though, and who am I to argue? Test Plan: - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42" - then ran again adding "WAY=optasm", and verified that the popcnt sequences we generate match the ones produced by LLVM for its @llvm.ctpop.* intrinsics Reviewers: austin, hvr, tibbe Reviewed By: austin, hvr, tibbe Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D147 GHC Trac Issues: #9435
Diffstat (limited to 'compiler')
-rw-r--r--compiler/nativeGen/X86/CodeGen.hs12
1 files changed, 8 insertions, 4 deletions
diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs
index d6fdee13f6..ce7120e24b 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1743,15 +1743,19 @@ genCCall dflags is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst]
if sse4_2
then do code_src <- getAnyReg src
src_r <- getNewRegNat size
+ let dst_r = getRegisterReg platform False (CmmLocal dst)
return $ code_src src_r `appOL`
(if width == W8 then
-- The POPCNT instruction doesn't take a r/m8
unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
- unitOL (POPCNT II16 (OpReg src_r)
- (getRegisterReg platform False (CmmLocal dst)))
+ unitOL (POPCNT II16 (OpReg src_r) dst_r)
else
- unitOL (POPCNT size (OpReg src_r)
- (getRegisterReg platform False (CmmLocal dst))))
+ unitOL (POPCNT size (OpReg src_r) dst_r)) `appOL`
+ (if width == W8 || width == W16 then
+ -- We used a 16-bit destination register above,
+ -- so zero-extend
+ unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
+ else nilOL)
else do
targetExpr <- cmmMakeDynamicReference dflags
CallReference lbl