*** empty log message ***

author: tege <tege@gmplib.org> 2002-05-21 18:36:42 +0200
committer: tege <tege@gmplib.org> 2002-05-21 18:36:42 +0200
commit: 37e7a3c00c80dacaa487e0c74c36b93d1748fcc7 (patch)
tree: 4219e009fd7e7ab8822e51956b2a8367f6e5d4e6
parent: 9ef9e2b4ec78a88f821da3fc1f875c3d9d87896c (diff)
download: gmp-37e7a3c00c80dacaa487e0c74c36b93d1748fcc7.tar.gz
2 files changed, 11 insertions, 7 deletions
diff --git a/ChangeLog b/ChangeLog
index 2c695a24f..ba5ec890c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -21,6 +21,10 @@ MA 02111-1307, USA.
 
 2002-05-21  Torbjorn Granlund  <tege@swox.com>
 
+	* mpz/set_str.c: Nailify.
+
+	* randlc2x.c (gmp_randinit_lc_2exp): Nailify.
+
 	From Jakub Jelinek:
 	* longlong.h (add_ssaaaa,sub_ddmmss) [64-bit sparc]:
 	Make it actually work.
diff --git a/mpn/alpha/README b/mpn/alpha/README
index 67ed43220..b2bcd08f4 100644
--- a/mpn/alpha/README
+++ b/mpn/alpha/README
@@ -106,16 +106,16 @@ EV5
 EV6
 
 Here we have a really parallel pipeline, capable of issuing up to 4 integer
-instructions per cycle.  One integer multiply instruction can issue each cycle.
-To get optimal speed, we need to pretend we are vectorizing the code, i.e.,
-minimize the depth of recurrences.  In actual practice, it is never possible to
-sustain more than 3.5 insns/cycle due to renaming register constraints.
+instructions per cycle.  In actual practice, it is never possible to sustain
+more than 3.5 integer insns/cycle due to rename register shortage.  One integer
+multiply instruction can issue each cycle.  To get optimal speed, we need to
+pretend we are vectorizing the code, i.e., minimize the depth of recurrences.
 
 There are two dependencies to watch out for.  1) Address arithmetic
 dependencies, and 2) carry propagation dependencies.
 
-We can avoid serializing due to address arithmetic by unrolling the loop, so
-that addresses don't depend heavily on an index variable.  Avoiding serializing
+We can avoid serializing due to address arithmetic by unrolling loops, so that
+addresses don't depend heavily on an index variable.  Avoiding serializing
 because of carry propagation is trickier; the ultimate performance of the code
 will be determined of the number of latency cycles it takes from accepting
 carry-in to a vector point until we can generate carry-out.
@@ -126,7 +126,7 @@ pipelines.  Shifts only execute in U0 and U1, and multiply only in U1.
 CMOV instructions split into two internal instructions, CMOV1 and CMOV2.  CMOV
 split the mapping process (see pg 2-26 in cmpwrgd.pdf), suggesting the CMOV
 should always be placed as the last instruction of an aligned 4 instruction
-block (?).
+block, or perhaps simply avoided.
 
 Perhaps the most important issue is the latency between the L0/U0 and L1/U1
 clusters; a result obtained on either cluster has an extra cycle of latency for
author	tege <tege@gmplib.org>	2002-05-21 18:36:42 +0200
committer	tege <tege@gmplib.org>	2002-05-21 18:36:42 +0200
commit	37e7a3c00c80dacaa487e0c74c36b93d1748fcc7 (patch)
tree	4219e009fd7e7ab8822e51956b2a8367f6e5d4e6
parent	9ef9e2b4ec78a88f821da3fc1f875c3d9d87896c (diff)
download	gmp-37e7a3c00c80dacaa487e0c74c36b93d1748fcc7.tar.gz