summaryrefslogtreecommitdiff
path: root/mpn/alpha/ev6/nails/README
blob: 187dc9bbe75211e56ee1700817243b9a076a393d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Copyright 2002, 2005 Free Software Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library.  If not, see https://www.gnu.org/licenses/.





This directory contains assembly code for nails-enabled 21264.  The code is not
very well optimized.

For addmul_N, as N grows larger, we could make multiple loads together, then do
about 3.3 i/c.  10 cycles after the last load, we can increase to 4 i/c.  This
would surely allow addmul_4 to run at 2 c/l, but the same should be possible
also for addmul_3 and perhaps even addmul_2.


		current		fair		best
Routine		c/l  unroll	c/l  unroll	c/l  i/c
mul_1		3.25		2.75		2.75 3.273
addmul_1	4.0	4	3.5	4 14	3.25 3.385
addmul_2	4.0	1	2.5	2 10	2.25 3.333
addmul_3	3.0	1	2.33	2 14	2    3.333
addmul_4	2.5	1	2.125	2 17	2    3.135

addmul_5			2	1 10
addmul_6			2	1 12
addmul_7			2	1 14

(The "best" column doesn't account for bookkeeping instructions and
thereby assumes infinite unrolling.)

Basecase usages:

1	 addmul_1
2	 addmul_2
3	 addmul_3
4	 addmul_4
5	 addmul_3 + addmul_2	2.3998
6	 addmul_4 + addmul_2
7	 addmul_4 + addmul_3