diff options
Diffstat (limited to 'rts/gmp/mpn/sparc32/README')
-rw-r--r-- | rts/gmp/mpn/sparc32/README | 36 |
1 files changed, 36 insertions, 0 deletions
diff --git a/rts/gmp/mpn/sparc32/README b/rts/gmp/mpn/sparc32/README new file mode 100644 index 0000000000..7c19df7bc4 --- /dev/null +++ b/rts/gmp/mpn/sparc32/README @@ -0,0 +1,36 @@ +This directory contains mpn functions for various SPARC chips. Code that +runs only on version 8 SPARC implementations, is in the v8 subdirectory. + +RELEVANT OPTIMIZATION ISSUES + + Load and Store timing + +On most early SPARC implementations, the ST instructions takes multiple +cycles, while a STD takes just a single cycle more than an ST. For the CPUs +in SPARCstation I and II, the times are 3 and 4 cycles, respectively. +Therefore, combining two ST instrucitons into a STD when possible is a +significant optimiation. + +Later SPARC implementations have single cycle ST. + +For SuperSPARC, we can perform just one memory instruction per cycle, even +if up to two integer instructions can be executed in its pipeline. For +programs that perform so many memory operations that there are not enough +non-memory operations to issue in parallel with all memory operations, using +LDD and STD when possible helps. + +STATUS + +1. On a SuperSPARC, mpn_lshift and mpn_rshift run at 3 cycles/limb, or 2.5 + cycles/limb asymptotically. We could optimize speed for special counts + by using ADDXCC. + +2. On a SuperSPARC, mpn_add_n and mpn_sub_n runs at 2.5 cycles/limb, or 2 + cycles/limb asymptotically. + +3. mpn_mul_1 runs at what is believed to be optimal speed. + +4. On SuperSPARC, mpn_addmul_1 and mpn_submul_1 could both be improved by a + cycle by avoiding one of the add instrucitons. See a29k/addmul_1. + +The speed of the code for other SPARC implementations is uncertain. |