summaryrefslogtreecommitdiff
path: root/mpn
diff options
context:
space:
mode:
authorKevin Ryde <user42@zip.com.au>2001-11-15 22:45:45 +0100
committerKevin Ryde <user42@zip.com.au>2001-11-15 22:45:45 +0100
commitc8e3b035e1fdfbebf5200402983159e9491435f2 (patch)
tree4640bd200440c6c81a9eab387e8b4c7bf186275e /mpn
parent134f53e3af6cc540c398a815c39500114e9c9968 (diff)
downloadgmp-c8e3b035e1fdfbebf5200402983159e9491435f2.tar.gz
More of:
* mpn/x86/pentium4/README: New file.
Diffstat (limited to 'mpn')
-rw-r--r--mpn/x86/pentium4/README14
1 files changed, 11 insertions, 3 deletions
diff --git a/mpn/x86/pentium4/README b/mpn/x86/pentium4/README
index 777d9a6c4..72f037c74 100644
--- a/mpn/x86/pentium4/README
+++ b/mpn/x86/pentium4/README
@@ -63,9 +63,13 @@ Perhaps future chip steppings will be better.
NOTES
-incl and decl are to be avoided, and instead add $1 and sub $1 used, since
-the carry flag is apparently not separately renamed, making incl and decl
-dependent on the last (or perhaps all) previous flags-setting instructions.
+adcl and sbbl are quite slow at 8 cycles for reg->reg. paddq of 32-bits
+within a 64-bit mmx register seems better, though the combination
+paddq/psrlq when propagating a carry is still a 4 cycle latency.
+
+incl and decl should be avoided, instead use add $1 and sub $1. Apparently
+the carry flag is not separately renamed, so incl and decl depend on all
+previous flags-setting instructions.
movq mmx -> mmx does have 6 cycle latency, as noted in the documentation.
pxor/por or similar combination at 2 cycles latency can be used instead.
@@ -84,6 +88,10 @@ fxsave/fxrestor will be needed if they're used.
REFERENCES
+Intel Pentium-4 processor manuals,
+
+ http://developer.intel.com/design/pentium4/manuals
+
"Intel Pentium 4 Processor Optimization Reference Manual", Intel, 2001,
order number 248966. Available on-line: