| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-aarch64.S (clear): Use 'movi'.
* cipher/chacha20-armv7-neon.S (clear): Use 'vmov'.
* cipher/cipher-gcm-armv7-neon.S (clear): Use 'vmov'.
* cipher/cipher-gcm-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/cipher-gcm-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/rijndael-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha1-armv7-neon.S (clear): Use 'vmov'.
* cipher/sha1-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha1-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/sha256-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha256-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/sha512-armv7-neon.S (CLEAR_REG): New using 'vmov'.
(_gcry_sha512_transform_armv7_neon): Use CLEAR_REG for clearing
registers.
--
Use 'vmov reg, #0' on 32-bit and 'movi reg.16b, #0' instead of
self-xoring register to break false register dependency.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'sha1-armv8-aarch32-ce.S'.
* cipher/sha1-armv7-neon.S (_gcry_sha1_transform_armv7_neon): Add
missing size.
* cipher/sha1-armv8-aarch32-ce.S: New.
* cipher/sha1.c (USE_ARM_CE): New.
(sha1_init): Check features for HWF_ARM_SHA1.
[USE_ARM_CE] (_gcry_sha1_transform_armv8_ce): New.
(transform) [USE_ARM_CE]: Use ARMv8 CE implementation if HW supports
it.
* cipher/sha1.h (SHA1_CONTEXT): Add 'use_arm_ce'.
* configure.ac: Add 'sha1-armv8-aarch32-ce.lo'.
--
Benchmark on Cortex-A53 (1152 Mhz):
Before (SHA-1 NEON):
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 6.62 ns/B 144.2 MiB/s 7.62 c/B
After (~3.8x faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 1.73 ns/B 552.2 MiB/s 1.99 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
--
Benchmark on Cortex-A8 1008Mhz:
New:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.04 ns/B 135.4 MiB/s 7.10 c/B
Old:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.79 ns/B 122.4 MiB/s 7.85 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-arm.S (GET_DATA_POINTER): New.
(_gcry_camellia_arm_encrypt_block): Use GET_DATA_POINTER.
(_gcry_camellia_arm_decrypt_block): Ditto.
* cipher/cast5-arm.S (GET_DATA_POINTER): New.
(_gcry_cast5_arm_encrypt_block, _gcry_cast5_arm_decrypt_block)
(_gcry_cast5_arm_enc_blk2, _gcry_cast5_arm_dec_blk2): Use
GET_DATA_POINTER.
* cipher/rijndael-arm.S (GET_DATA_POINTER): New.
(_gcry_aes_arm_encrypt_block, _gcry_aes_arm_decrypt_block): Use
GET_DATA_POINTER.
* cipher/sha1-armv7-neon.S (GET_DATA_POINTER): New.
(.LK_VEC): Move from .text to .data section.
(_gcry_sha1_transform_armv7_neon): Use GET_DATA_POINTER.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha1-armv7-neon.S'.
* cipher/sha1-armv7-neon.S: New.
* cipher/sha1.c (USE_NEON): New.
(SHA1_CONTEXT, sha1_init) [USE_NEON]: Add and initialize 'use_neon'.
[USE_NEON] (_gcry_sha1_transform_armv7_neon): New.
(transform) [USE_NEON]: Use ARM/NEON assembly if enabled.
* configure.ac: Add 'sha1-armv7-neon.lo'.
--
Patch adds ARM/NEON implementation for SHA-1.
Benchmarks show 1.72x improvement on ARM Cortex-A8, 1008 Mhz:
jussi@cubie:~/libgcrypt$ tests/bench-slope --cpu-mhz 1008 hash sha1
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 7.80 ns/B 122.3 MiB/s 7.86 c/B
=
jussi@cubie:~/libgcrypt$ tests/bench-slope --disable-hwf arm-neon --cpu-mhz 1008 hash sha1
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
SHA1 | 13.41 ns/B 71.10 MiB/s 13.52 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|