diff options
author | Reid Barton <rwbarton@gmail.com> | 2014-08-12 11:11:46 -0400 |
---|---|---|
committer | Reid Barton <rwbarton@gmail.com> | 2014-08-12 11:11:50 -0400 |
commit | 64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1 (patch) | |
tree | 9cc420735b86bb776efe15784f84b7f13203a7cc /compiler | |
parent | 9f285fa40f6fb0c8495dbec771d798ac6dfaabee (diff) | |
download | haskell-64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1.tar.gz |
x86: zero extend the result of 16-bit popcnt instructions (#9435)
Summary:
The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
the destination register, so we have to zero-extend the result to
a full word as popCnt16# is supposed to return a Word#.
For popCnt8# we could instead zero-extend the input to 32 bits
and then do a 32-bit popcnt, and not have to zero-extend the result.
LLVM produces the 16-bit popcnt sequence with two zero extensions,
though, and who am I to argue?
Test Plan:
- ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
- then ran again adding "WAY=optasm", and verified that
the popcnt sequences we generate match the ones produced
by LLVM for its @llvm.ctpop.* intrinsics
Reviewers: austin, hvr, tibbe
Reviewed By: austin, hvr, tibbe
Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D147
GHC Trac Issues: #9435
Diffstat (limited to 'compiler')
-rw-r--r-- | compiler/nativeGen/X86/CodeGen.hs | 12 |
1 files changed, 8 insertions, 4 deletions
diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs index d6fdee13f6..ce7120e24b 100644 --- a/compiler/nativeGen/X86/CodeGen.hs +++ b/compiler/nativeGen/X86/CodeGen.hs @@ -1743,15 +1743,19 @@ genCCall dflags is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst] if sse4_2 then do code_src <- getAnyReg src src_r <- getNewRegNat size + let dst_r = getRegisterReg platform False (CmmLocal dst) return $ code_src src_r `appOL` (if width == W8 then -- The POPCNT instruction doesn't take a r/m8 unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL` - unitOL (POPCNT II16 (OpReg src_r) - (getRegisterReg platform False (CmmLocal dst))) + unitOL (POPCNT II16 (OpReg src_r) dst_r) else - unitOL (POPCNT size (OpReg src_r) - (getRegisterReg platform False (CmmLocal dst)))) + unitOL (POPCNT size (OpReg src_r) dst_r)) `appOL` + (if width == W8 || width == W16 then + -- We used a 16-bit destination register above, + -- so zero-extend + unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r)) + else nilOL) else do targetExpr <- cmmMakeDynamicReference dflags CallReference lbl |