diff options
author | Kevin Ryde <user42@zip.com.au> | 2001-05-26 03:00:47 +0200 |
---|---|---|
committer | Kevin Ryde <user42@zip.com.au> | 2001-05-26 03:00:47 +0200 |
commit | 92f1ae4c1e166839c06c62bfae84cb7585a4219c (patch) | |
tree | 02bdb6aad689b9d2befc0e72c92baccc9067bbbb /gmp.texi | |
parent | d7527acad191b434da6dd30e4af9ef37c013290c (diff) | |
download | gmp-92f1ae4c1e166839c06c62bfae84cb7585a4219c.tar.gz |
* gmp.texi (ABI and ISA, Reentrancy): Minor tweaks
(Notes for Package Builds): Note gmp.h is a generated file.
(Notes for Particular Systems): -march=pentiumpro is used for gcc
2.95.4 and up.
(Assembler Loop Unrolling): Mention non power-of-2 unrolling.
(Internals): New chapter.
Diffstat (limited to 'gmp.texi')
-rw-r--r-- | gmp.texi | 417 |
1 files changed, 394 insertions, 23 deletions
@@ -342,6 +342,7 @@ arithmetic library, version @value{VERSION}. * BSD Compatible Functions:: All functions found in BSD MP. * Custom Allocation:: How to customize the internal allocation. * Algorithms:: What happens behind the scenes. +* Internals:: How values are represented behind the scenes. * Contributors:: Who brings your this library? * References:: Some useful papers and books to read. @@ -908,6 +909,7 @@ as simple as configuring with a special @samp{libdir}, or it might require more than that. @table @asis +@sp 1 @need 1000 @item HPPA 2.0 (@samp{hppa2.0*}) @@ -939,17 +941,16 @@ HPPA 2.0 CPUs can run all HPPA 1.0 and 1.1 code in the 32-bit HPPA 1.0 ABI. No special compiler options are needed for applications. @end table -CPU type @samp{hppa2.0w} or @samp{hppa2.0} will use any of the above ABIs, -@samp{hppa2.0n} uses only 2.0n or 1.0. +All three ABIs are available for CPUs @samp{hppa2.0w} and @samp{hppa2.0}, but +for CPU @samp{hppa2.0n} only 2.0n or 1.0 are allowed. +@sp 1 @need 1000 @item MIPS under IRIX 6 (@samp{mips*-*-irix[6789]} -IRIX 6 and up supports the n32 and 64 ABIs, and always has a MIPS 3 or better -CPU, so a 64-bit limb is used on that system. A new enough @command{gcc} is -required (2.95 for instance). Note that MIPS GNU/Linux, as of kernel version -2.2, doesn't have the necessary support for n32 or 64 and so only uses a -32-bit limb. +IRIX 6 supports the n32 and 64 ABIs and always has a 64-bit MIPS 3 or better +CPU. In both these ABIs GMP uses a 64-bit limb. A new enough @command{gcc} +is required (2.95 for instance). @table @asis @item @samp{ABI=64} @@ -965,7 +966,7 @@ cc -64 @item @samp{ABI=n32} The n32 ABI is 32-bit pointers and integers, but with a 64-bit limb using a -@code{long long} in the 64-bit registers. Applications must be compiled with +@code{long long}. Applications must be compiled with @example gcc -mabi=n32 @@ -973,6 +974,10 @@ cc -n32 @end example @end table +Note that MIPS GNU/Linux, as of kernel version 2.2, doesn't have the necessary +support for n32 or 64 and so only gets a 32-bit limb and the MIPS 2 code. + +@sp 1 @need 1000 @item PowerPC 64 (@samp{powerpc64*}) @@ -1002,6 +1007,7 @@ This is the basic 32-bit PowerPC ABI. No special compiler options are needed for applications. @end table +@sp 1 @need 1000 @item Sparc V9 (@samp{sparcv9} and @samp{ultrasparc*}) @@ -1032,8 +1038,8 @@ cc -xarch=v8plus @command{gcc} 2.8 and earlier only supports @samp{-mv8} though. @end table -Don't be confused by the names of these sparc options, they're called -@samp{arch} but they effectively control the ABI. +Don't be confused by the names of these sparc @samp{-m} and @samp{-x} options, +they're called @samp{arch} but they effectively control the ABI. @end table @@ -1068,6 +1074,9 @@ user to omit @samp{--build} (and @samp{--host}) so @samp{./config.guess} will detect the CPU. But a way to manually specify a @samp{--build} will be wanted for systems where @samp{./config.guess} is inexact. +It should be noted that @file{gmp.h} is a generated file, and will be +architecure and ABI dependent. + @need 2000 @node Notes for Particular Systems, Known Build Problems, Notes for Package Builds, Installing GMP @@ -1159,14 +1168,14 @@ version 1.92.3 that comes with FreeBSD 2.2.8 doesn't (and unfortunately there's no newer assembler for that system). Solaris 2.6 and 2.7 @command{as} generate incorrect object code for register -to register @code{movq} instructions, making that assembler unusable. Install -a recent @command{gas} if MMX code is wanted on these systems. +to register @code{movq} instructions, and so can't be used for MMX code. +Install a recent @command{gas} if MMX code is wanted on these systems. @item x86 GCC @samp{-march=pentiumpro} -GCC 2.95.2 miscompiled some versions of @file{mpz/powm.c} when +GCC 2.95.2 and 2.95.3 miscompiled some versions of @file{mpz/powm.c} when @samp{-march=pentiumpro} was used, so for relevant CPUs that option is only in -the default @env{CFLAGS} for GCC 2.96 and up. +the default @env{CFLAGS} for GCC 2.95.4 and up. @end table @@ -1515,9 +1524,10 @@ implementation of @file{stack-alloc.c}. @item It's safe for two threads to read from the same GMP variable simultaneously, but it's not safe for one to read while the another might be writing, nor for -two threads to write simultaneously. Note that this also applies to the seed -variables when generating random numbers (@pxref{Random Number Functions}), -since the seed is updated when a number is generated. +two threads to write simultaneously. It's not safe for two threads to +generate a random number from the same @code{gmp_randstate_t} simultaneously, +since this involves an update of that variable. +generated. @item On SCO systems the default @code{<ctype.h>} macros use per-file static @@ -4446,7 +4456,7 @@ functions will be linked in even if the first thing a program does is an this is a problem. -@node Algorithms, Contributors, Custom Allocation, Top +@node Algorithms, Internals, Custom Allocation, Top @chapter Algorithms @cindex Algorithms @@ -6069,11 +6079,14 @@ corresponding factor, but it can also allow better register usage, for example alternately using one register combination and then another. Judicious use of @command{m4} macros can help avoid lots of duplication in the source code. -Unrolling is generally best done to a power of 2 multiple. This makes it -possible to calculate the number of unrolled loops and the number of remaining -limbs using a shift and mask. +Unrolling is commonly done to a power of 2 multiple so the number of unrolled +loops and the number of remaining limbs can be calculated with a shift and +mask. But other multiples can be used too, just by subtracting each @var{n} +limbs processed from a counter and waiting for less than @var{n} remaining (or +offsetting the counter by @var{n} so it goes negative when there's less than +@var{n} remaining). -Sizes not a multiple of the unrolling can be handled in various ways, for +The limbs not a multiple of the unrolling can be handled in various ways, for example @itemize @bullet @@ -6106,7 +6119,364 @@ instructions at the end whose results are unwanted. Sizes not a multiple of the unrolling can then be handled as desired. -@node Contributors, References, Algorithms, Top +@node Internals, Contributors, Algorithms, Top +@chapter Internals + +@strong{This chapter is provided only for informational purposes and the +various internals described here may change in future GMP releases. +Applications expecting to be compatible with future releases should use only +the documented interfaces described in previous chapters.} + +@menu +* Integer Internals:: +* Rational Internals:: +* Float Internals:: +* Raw Output Internals:: +@end menu + +@node Integer Internals, Rational Internals, Internals, Internals +@section Integer Internals + +@code{mpz_t} variables represent integers using sign and magnitude, in space +dynamically allocated and reallocated. The fields are as follows. + +@table @asis +@item @code{_mp_size} +The number of limbs, or the negative of that when representing a negative +integer. Zero is represented by @code{_mp_size} set to zero, in which case +the @code{_mp_d} data is unused. + +@item @code{_mp_d} +A pointer to an array of limbs which is the magnitude. These are stored +``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the +least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most +significant. Whenever @code{_mp_size} is non-zero, the most significant limb +is non-zero. + +Currently there's always at least one limb allocated, so for instance +@code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch +@code{_mp_d[0]} unconditionally (though its value is then only wanted if +@code{_mp_size} is non-zero). + +@item @code{_mp_alloc} +@code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d}, +and naturally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine +is about to (or might be about to) increase @code{_mp_size}, it checks +@code{_mp_alloc} to see whether there's enough space, and reallocates if not. +@code{MPZ_REALLOC} is used for this. +@end table + +Various bitwise logical functions like @code{mpz_and} behave as if negative +values were twos complement. But sign and magnitude is always used +internally, and necessary adjustments are made during the calculations. +Sometimes this isn't pretty, but sign and magnitude are best for other +routines. + +Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these +have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory +allocation functions. Care is taken to ensure that these are big enough that +no reallocation is necessary (since it would have unpredictable consequences). + + +@node Rational Internals, Float Internals, Integer Internals, Internals +@section Rational Internals + +@code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and +denominator (@pxref{Integer Internals}). + +The canonical form adopted is denominator positive (and non-zero), no common +factors between numerator and denominator, and zero uniquely represented as +0/1. + +It's believed that casting out common factors at each stage of a calculation +is best in general. A GCD is an @ma{O(N^2)} operation so it's better to do a +few small ones immediately than to delay and have to do a big one later. +Knowing the numerator and denominator have no common factors can be used for +example in @code{mpq_mul} to make only two cross GCDs necessary, not four. + +This general approach to common factors is badly sub-optimal in the presence +of simple factorizations or little prospect for cancellation, but GMP has no +way to know when this will occur. As per @ref{Efficiency}, that's left to +applications. The @code{mpq_t} framework might still suit, with +@code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and +denominator, or of course @code{mpz_t} variables can be used directly. + + +@node Float Internals, Raw Output Internals, Rational Internals, Internals +@section Float Internals + +Efficient calculation is the primary aim of GMP floats and the use of whole +limbs and simple rounding facilitates this. + +@code{mpf_t} floats have a variable precision mantissa and a single machine +word exponent. The mantissa is represented using sign and magnitude. + +@c FIXME: The arrow heads don't join to the lines exactly. +@tex +\global\newdimen\GMPboxwidth \GMPboxwidth=5em +\global\newdimen\GMPboxheight \GMPboxheight=3ex +\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}} +\GMPdisplay{% +\vbox{% + \hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb} + \vskip 0.7ex + \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}} + \hbox { + \hbox to 3\GMPboxwidth {% + \setbox 0 = \hbox{@code{\_mp\_exp}}% + \dimen0=3\GMPboxwidth + \advance\dimen0 by -\wd0 + \divide\dimen0 by 2 + \advance\dimen0 by -1em + \setbox1 = \hbox{$\rightarrow$}% + \dimen1=\dimen0 + \advance\dimen1 by -\wd1 + \GMPcentreline{\dimen0}% + \hfil + \box0% + \hfil + \GMPcentreline{\dimen1{}}% + \box1} + \hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}} + \vskip 0.5ex + \vbox {% + \hrule + \hbox{% + \vrule height 2ex depth 1ex + \hbox to \GMPboxwidth {}% + \vrule + \hbox to \GMPboxwidth {}% + \vrule + \hbox to \GMPboxwidth {}% + \vrule + \hbox to \GMPboxwidth {}% + \vrule + \hbox to \GMPboxwidth {}% + \vrule} + \hrule + } + \hbox {% + \hbox to 0.8 pt {} + \hbox to 3\GMPboxwidth {% + \hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}} + \hbox to 5\GMPboxwidth{% + \setbox 0 = \hbox{@code{\_mp\_size}}% + \dimen0 = 5\GMPboxwidth + \advance\dimen0 by -\wd0 + \divide\dimen0 by 2 + \advance\dimen0 by -1em + \dimen1 = \dimen0 + \setbox1 = \hbox{$\leftarrow$}% + \setbox2 = \hbox{$\rightarrow$}% + \advance\dimen0 by -\wd1 + \advance\dimen1 by -\wd2 + \hbox to 0.3 em {}% + \box1 + \GMPcentreline{\dimen0}% + \hfil + \box0 + \hfil + \GMPcentreline{\dimen1}% + \box2} +}} +@end tex +@ifnottex +@example + most least +significant significant + limb limb + + _mp_d + |---- _mp_exp ---> | + _____ _____ _____ _____ _____ + |_____|_____|_____|_____|_____| + . <------------ radix point + + <-------- _mp_size ---------> +@sp 1 +@end example +@end ifnottex + +@noindent +The fields are as follows. + +@table @asis +@item @code{_mp_size} +The number of limbs currently in use, or the negative of that when +representing a negative value. Zero is represented by @code{_mp_size} and +@code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is +unused. (In the future @code{_mp_exp} might be undefined when representing +zero.) + +@item @code{_mp_prec} +The precision of the mantissa, in limbs. In any calculation the aim is to +produce @code{_mp_prec} limbs of result (the most significant being non-zero). + +@item @code{_mp_d} +A pointer to the array of limbs which is the absolute value of the mantissa. +These are stored ``little endian'' as per the @code{mpn} functions, so +@code{_mp_d[0]} is the least significant limb and +@code{_mp_d[ABS(_mp_size)-1]} the most significant. + +The most significant limb is always non-zero, but there are no other +restrictions on its value, in particular the highest 1 bit can be anywhere +within the limb. + +@code{_mp_prec+1} limbs are allocated, the extra limb being for convenience +(see below). There are no reallocations during calculations, only in a change +of precision with @code{mpf_set_prec}. + +@item @code{_mp_exp} +The exponent, in limbs, determining the location of the implied radix point. +Zero means the radix point is just above the most significant limb. Positive +values mean a radix point offset towards the lower limbs and hence imply a +value @ma{@ge{} 1}, as for example in the diagram above. Negative exponents +mean a radix point further above the highest limb. + +Naturally the exponent can be any value, it doesn't have to fall within the +limbs as the diagram shows, it can be a long way above or a long way below. +Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data +are treated as zero. +@end table + +@sp 1 +@noindent +The following various points should be noted. + +@table @asis +@item Low Zeros +The least significant limbs @code{_mp_d[0]} etc can be zero, though such low +zeros can always be ignored. Routines likely to produce low zeros check and +avoid them to save time in subsequent calculations, but for most routines +they're quite unlikely and aren't checked. + +@item Mantissa Size Range +The @code{_mp_size} limbs in use can be less than @code{_mp_prec} if the value +can be represented in less. This means low precision values or small integers +stored in a high precision @code{mpf_t} can still be operated on efficiently. + +@code{_mp_size} can also be greater than @code{_mp_prec}. Firstly a value is +allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d}, +and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves +@code{_mp_size} unchanged and so the size can be arbitrarily bigger than +@code{_mp_prec}. + +@item Rounding +All rounding is done on limb boundaries. Calculating @code{_mp_prec} limbs +with the high non-zero will ensure the application requested minimum precision +is obtained. + +The use of simple ``trunc'' rounding towards zero is efficient, since there's +no need to examine extra limbs and do an increment or decrement. + +@item Bit Shifts +Since the exponent is in limbs, there are no bit shifts in basic operations +like @code{mpf_add} and @code{mpf_mul}. When differing exponents are +encountered all that's needed is to adjust pointers to line up the relevant +limbs. + +Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts, +but the choice is between an exponent in limbs which requires shifts there, or +one in bits which requires them almost everywhere else. + +@item Use of @code{_mp_prec+1} Limbs +The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just +@code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its +operation. @code{mpf_add} for instance will do an @code{mpn_add} of +@code{_mp_prec} limbs. If there's no carry then that's the result, but if +there is a carry then it's stored in the extra limb of space and +@code{_mp_size} becomes @code{_mp_prec+1}. + +Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not +needed for the intended precision, only the @code{_mp_prec} high limbs. But +zeroing it out or moving the rest down is unnecessary. Subsequent routines +reading the value will simply take the high limbs they need, and this will be +@code{_mp_prec} if their target has that same precision. This is no more than +a pointer adjustment, and must be checked anyway since the destination +precision can be different from the sources. + +Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs +if available. This ensures that a variable which has @code{_mp_size} equal to +@code{_mp_prec+1} will get its full exact value copied. Strictly speaking +this is unnecessary since only @code{_mp_prec} limbs are needed for the +application's requested precision, but it's considered that an @code{mpf_set} +from one variable into another of the same precision ought to produce an exact +copy. + +@item Application Precisions +@code{__GMPF_BITS_TO_PREC} converts an application requested precision to an +@code{_mp_prec}. The value in bits is rounded up to a whole limb then an +extra limb is added since the most significant limb of @code{_mp_d} is only +non-zero and therefore might contain only one bit. + +@code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra +limb from @code{_mp_prec} before converting to bits. The net effect of +reading back with @code{mpf_get_prec} is simply the precision rounded up to a +multiple of @code{mp_bits_per_limb}. + +Note that the extra limb added here for the high limb is in addition to the +extra limb allocated to @code{_mp_d}. For example with a 32-bit limb, an +application request for 250 bits will be rounded up to 8 limbs, then an extra +added for the high being only non-zero, giving an @code{_mp_prec} of 9. +@code{_mp_d} then gets 10 limbs allocated. Reading back with +@code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and multiply by +32, giving 256 bits. + +Strictly speaking, the fact the high limb has at least one bit means that a +float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but +for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice +multiple of the limb size. +@end table + + +@node Raw Output Internals, , Float Internals, Internals +@section Raw Output Internals + +@noindent +@code{mpz_out_raw} uses the following format. + +@tex +\global\newdimen\GMPboxwidth \GMPboxwidth=5em +\global\newdimen\GMPboxheight \GMPboxheight=3ex +\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}} +\GMPdisplay{% +\vbox{% + \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}} + \vbox {% + \hrule + \hbox{% + \vrule height 2.5ex depth 1.5ex + \hbox to \GMPboxwidth {\hfil size\hfil}% + \vrule + \hbox to 3\GMPboxwidth {\hfil data bytes\hfil}% + \vrule} + \hrule} +}} +@end tex +@ifnottex +@example ++------+------------------------+ +| size | data bytes | ++------+------------------------+ +@end example +@end ifnottex + +The size is 4 bytes written most significant byte first, being the number of +subsequent data bytes, or the negative of that when a negative integer is +represented. The data bytes are the absolute value of the integer, written +most significant byte first. + +The most significant data byte is always non-zero, so the output is the same +on all systems, irrespective of limb size. + +In GMP 1, leading zero bytes were written to pad the data bytes to a multiple +of the limb size. @code{mpz_inp_raw} will accept this, for compatibility. + +The use of ``big endian'' for both the size and data fields is deliberate, it +makes the data easy to read in a hex dump of a file. + + +@node Contributors, References, Internals, Top @comment node-name, next, previous, up @unnumbered Contributors @cindex Contributors @@ -6329,4 +6699,5 @@ volume 43, number 8, August 1994, pp. 899-908. @c Local variables: @c fill-column: 78 +@c compile-command: "make gmp.info" @c End: |