diff options
author | Kevin Ryde <user42@zip.com.au> | 2001-11-15 22:45:45 +0100 |
---|---|---|
committer | Kevin Ryde <user42@zip.com.au> | 2001-11-15 22:45:45 +0100 |
commit | c8e3b035e1fdfbebf5200402983159e9491435f2 (patch) | |
tree | 4640bd200440c6c81a9eab387e8b4c7bf186275e /mpn | |
parent | 134f53e3af6cc540c398a815c39500114e9c9968 (diff) | |
download | gmp-c8e3b035e1fdfbebf5200402983159e9491435f2.tar.gz |
More of:
* mpn/x86/pentium4/README: New file.
Diffstat (limited to 'mpn')
-rw-r--r-- | mpn/x86/pentium4/README | 14 |
1 files changed, 11 insertions, 3 deletions
diff --git a/mpn/x86/pentium4/README b/mpn/x86/pentium4/README index 777d9a6c4..72f037c74 100644 --- a/mpn/x86/pentium4/README +++ b/mpn/x86/pentium4/README @@ -63,9 +63,13 @@ Perhaps future chip steppings will be better. NOTES -incl and decl are to be avoided, and instead add $1 and sub $1 used, since -the carry flag is apparently not separately renamed, making incl and decl -dependent on the last (or perhaps all) previous flags-setting instructions. +adcl and sbbl are quite slow at 8 cycles for reg->reg. paddq of 32-bits +within a 64-bit mmx register seems better, though the combination +paddq/psrlq when propagating a carry is still a 4 cycle latency. + +incl and decl should be avoided, instead use add $1 and sub $1. Apparently +the carry flag is not separately renamed, so incl and decl depend on all +previous flags-setting instructions. movq mmx -> mmx does have 6 cycle latency, as noted in the documentation. pxor/por or similar combination at 2 cycles latency can be used instead. @@ -84,6 +88,10 @@ fxsave/fxrestor will be needed if they're used. REFERENCES +Intel Pentium-4 processor manuals, + + http://developer.intel.com/design/pentium4/manuals + "Intel Pentium 4 Processor Optimization Reference Manual", Intel, 2001, order number 248966. Available on-line: |