| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch arm neon assembler routines to endianness-agnostic loads and
stores where possible to avoid modifications to the rest of the code.
This involves switching to vld1.32 for loading consecutive 32-bit words
in host endianness as well as vst1.8 for storing back to memory in
little-endian order as required by the caller. Where necessary, r3 is
used to store the precalculated offset into the source vector for the
secondary load operations. vstm is kept for little-endian platforms
because it is faster than vst1 on most ARM implementations.
vst1.x (at least on the Allwinner A20 Cortex-A7 implementation) seems to
interfer with itself on subsequent calls, slowing it down further. So we
reschedule some instructions to do stores as soon as results become
available to have some other calculations or loads before the next
vst1.x. This reliably saves two additional cycles per block on salsa20
and chacha which would otherwise be incurred.
vld1.x does not seem to suffer from this or at least not to a level
where two consecutive vld1.x run slower than an equivalent vldm.
Rescheduling them similarly did not improve performance beyond that of
vldm.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
|
|
|
|
| |
Spotted by Michael Weiser
|
| |
|
|
|
|
| |
Update arm files.
|
|
|
|
| |
chacha_3core
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
ARM assembly adjustments for big-endian systems contained armv6+-only
instructions (rev) in generic arm memxor code. Replace those with an
actual conversion of the leftover byte store routines for big-endian
systems. This also provides a slight optimisation by removing the
additional instruction as well as increased symmetry between little- and
big-endian implementations.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
|
|
|
|
|
|
|
|
| |
Rename curve functions to use curve names instead of just bits.
Otherwise function names can easily become confusing after adding other
curves.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
|
|
|
|
|
| |
There is no need to keep optimized ECC functions in public namespace
(nettle_*), move them to internal namespace (_nettle_*).
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
|
|
|
|
|
| |
In preparation to adding GOST curves support, rename source files and
use curve name as eccdata parameter.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BCM2837B0 (Cortex-A53) @1.4GHz (Raspberry Pi 3B+),
Before:
`gnutls-cli --benchmark-ciphers`
CHACHA20-POLY1305 (16384) 51.54 MB/sec
`gnutls-cli --benchmark-tls-ciphers`:
ECDHE_RSA_CHACHA20_POLY1305 (payload 1400) 21.31 MB/sec
ECDHE_RSA_CHACHA20_POLY1305 (payload 15360) 24.60 MB/sec
`nettle-benchmark`
chacha encrypt 71.90
chacha decrypt 71.89
chacha_poly1305 encrypt 48.17
chacha_poly1305 decrypt 48.17
chacha_poly1305 update 146.03
After:
`gnutls-cli --benchmark-ciphers`
CHACHA20-POLY1305 (16384) 68.44 MB/sec
`gnutls-cli --benchmark-tls-ciphers`:
ECDHE_RSA_CHACHA20_POLY1305 (payload 1400) 27.25 MB/sec
ECDHE_RSA_CHACHA20_POLY1305 (payload 15360) 32.41 MB/sec
`nettle-benchmark`
chacha encrypt 106.00
chacha decrypt 105.94
chacha_poly1305 encrypt 65.94
chacha_poly1305 decrypt 65.96
chacha_poly1305 update 175.24
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds all exported symbols in the map files explicitly under
the following rules:
- Symbols mentioned in internal headers go in a section which is
valid only for testing, and linking with these symbols will break
in library updates.
- Symbols mentioned in installed headers go in the exported sections
and are considered part of the ABI.
- All internal symbols move to internal headers.
- The _nettle_md5_compress and _nettle_sha1_compress become exported
without the _nettle prefix, due to existing usage.
|
|
|
|
|
|
| |
Extend arm/README to provide some background on considerations to be taken into
account when writing assembly routines supposed to work in big and little memory
endianness.
|
|
|
|
|
| |
Adjust sha1-compress, sha256-compress, umac-nh, chacha-core-internal,
salsa20-core-internal and memxor for arm to work in big-endian mode.
|
|
|
|
|
|
|
| |
See: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cjagjjbc.html
The pre-UAL instruction is also accepted by modern assemblers.
Signed-off-by: Marcus Hoffmann <m.hoffmann@cartelsol.com>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This allows building these files as part of a fat build, even if
the assembler by default targets a lower architecture version.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
Renamed directory armv7 to arm. New subdirectory arm/neon, for files
using neon instructions. configure.ac hacked to make use of neon
configurable.
|