| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_arm_platform_as_ok)
(gcry_cv_gcc_inline_asm_neon): Remove % prefix from register names.
* cipher/cipher-gcm-armv7-neon.S (vmull_p64): Prefix constant values
with # character instead of $.
* cipher/blowfish-arm.S: Remove % prefix from all register names.
* cipher/camellia-arm.S: Likewise.
* cipher/cast5-arm.S: Likewise.
* cipher/rijndael-arm.S: Likewise.
* cipher/rijndael-armv8-aarch32-ce.S: Likewise.
* cipher/sha512-arm.S: Likewise.
* cipher/sha512-armv7-neon.S: Likewise.
* cipher/twofish-arm.S: Likewise.
* mpi/arm/mpih-add1.S: Likewise.
* mpi/arm/mpih-mul1.S: Likewise.
* mpi/arm/mpih-mul2.S: Likewise.
* mpi/arm/mpih-mul3.S: Likewise.
* mpi/arm/mpih-sub1.S: Likewise.
--
Reported-by: Dmytro Kovalov <dmytro.a.kovalov@globallogic.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-aarch64.S (clear): Use 'movi'.
* cipher/chacha20-armv7-neon.S (clear): Use 'vmov'.
* cipher/cipher-gcm-armv7-neon.S (clear): Use 'vmov'.
* cipher/cipher-gcm-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/cipher-gcm-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/rijndael-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha1-armv7-neon.S (clear): Use 'vmov'.
* cipher/sha1-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha1-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/sha256-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'.
* cipher/sha256-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'.
* cipher/sha512-armv7-neon.S (CLEAR_REG): New using 'vmov'.
(_gcry_sha512_transform_armv7_neon): Use CLEAR_REG for clearing
registers.
--
Use 'vmov reg, #0' on 32-bit and 'movi reg.16b, #0' instead of
self-xoring register to break false register dependency.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha1.c (ASM_EXTRA_STACK): Increase by sizeof(void*)*4.
(do_sha1_transform_amd64_ssse3, do_sha1_transform_amd64_avx)
(do_sha1_transform_amd64_avx_bmi2, do_sha1_transform_intel_shaext)
(do_sha1_transform_armv7_neon, do_sha1_transform_armv8_ce): New.
(transform_blk, transform): Merge to ...
(do_transform_generic): ... this and remove calls to assembly
implementations.
(sha1_init): Select hd->bctx.bwrite based on HW features.
(_gcry_sha1_mixblock, sha1_final): Call hd->bctx.bwrite instead of
transform.
* cipher/sha1.h (SHA1_CONTEXT): Remove implementation selection bits.
* cipher/sha256.h (SHA256_CONTEXT): Remove implementation selection
bits.
(ASM_EXTRA_STACK): Increase by sizeof(void*)*4.
(do_sha256_transform_amd64_ssse3, do_sha256_transform_amd64_avx)
(do_sha256_transform_amd64_avx2, do_sha256_transform_intel_shaext)
(do_sha256_transform_armv8_ce): New.
(transform_blk, transform): Merge to ...
(do_transform_generic): ... this and remove calls to assembly
implementations.
(sha256_init, sha224_init): Select hd->bctx.bwrite based on HW
features.
(sha256_final): Call hd->bctx.bwrite instead of transform.
* cipher/sha512-armv7-neon.S
(_gcry_sha512_transform_armv7_neon): Return zero.
* cipher/sha512.h (SHA512_CONTEXT): Remove implementation selection
bits.
(ASM_EXTRA_STACK): Increase by sizeof(void*)*4.
(do_sha512_transform_armv7_neon, do_sha512_transform_amd64_ssse3)
(do_sha512_transform_amd64_avx, do_sha512_transform_amd64_avx2): New.
[USE_ARM_ASM] (do_transform_generic): New.
(transform_blk, transform): Merge to ...
[!USE_ARM_ASM] (do_transform_generic): ... this and remove calls to
assembly implementations.
(sha512_init, sha384_init): Select hd->bctx.bwrite based on HW
features.
(sha512_final): Call hd->bctx.bwrite instead of transform.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha512-armv7-neon.S
(_gcry_sha512_transform_armv7_neon): Byte-swap RW67q and RW1011q
correctly in multi-block loop.
* tests/basic.c (check_digests): Add large test vector for SHA512.
--
Patch fixes bug introduced to multi-block processing by commit df629ba53a6,
"Improve performance of SHA-512/ARM/NEON implementation". Patch also adds
multi-block test vector for SHA-512.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to
'(C)'.
cipher/blowfish-arm.S: Ditto.
cipher/bufhelp.h: Ditto.
cipher/camellia-aesni-avx-amd64.S: Ditto.
cipher/camellia-aesni-avx2-amd64.S: Ditto.
cipher/camellia-arm.S: Ditto.
cipher/cast5-amd64.S: Ditto.
cipher/cast5-arm.S: Ditto.
cipher/cipher-ccm.c: Ditto.
cipher/cipher-cmac.c: Ditto.
cipher/cipher-gcm.c: Ditto.
cipher/cipher-selftest.c: Ditto.
cipher/cipher-selftest.h: Ditto.
cipher/mac-cmac.c: Ditto.
cipher/mac-gmac.c: Ditto.
cipher/mac-hmac.c: Ditto.
cipher/mac-internal.h: Ditto.
cipher/mac.c: Ditto.
cipher/rijndael-amd64.S: Ditto.
cipher/rijndael-arm.S: Ditto.
cipher/salsa20-amd64.S: Ditto.
cipher/salsa20-armv7-neon.S: Ditto.
cipher/serpent-armv7-neon.S: Ditto.
cipher/serpent-avx2-amd64.S: Ditto.
cipher/serpent-sse2-amd64.S: Ditto.
--
Avoid use of '©' for easier parsing of source for copyright information.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha512-armv7-neon.S (RT01q, RT23q, RT45q, RT67q): New.
(round_0_63, round_64_79): Remove.
(rounds2_0_63, rounds2_64_79): New.
(_gcry_sha512_transform_armv7_neon): Add 'nblks' input; Handle multiple
input blocks; Use new round macros.
* cipher/sha512.c [USE_ARM_NEON_ASM]
(_gcry_sha512_transform_armv7_neon): Add 'num_blks'.
(transform) [USE_ARM_NEON_ASM]: Pass nblks to assembly.
--
Benchmarks on ARM Cortex-A8:
C-language: 139.1 c/B
Old ARM/NEON: 34.30 c/B
New ARM/NEON: 24.46 c/B
New vs C: 5.68x
New vs Old: 1.40x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sha512-armv7-neon.S'.
* cipher/sha512-armv7-neon.S: New file.
* cipher/sha512.c (USE_ARM_NEON_ASM): New macro.
(SHA512_CONTEXT) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(sha512_init, sha384_init) [USE_ARM_NEON_ASM]: Enable 'use_neon' if
CPU support NEON instructions.
(k): Round constant array moved outside of 'transform' function.
(__transform): Renamed from 'tranform' function.
[USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): New prototype.
(transform): New wrapper function for different transform versions.
(sha512_write, sha512_final): Burn stack by the amount returned by
transform function.
* configure.ac (sha512) [neonsupport]: Add 'sha512-armv7-neon.lo'.
--
Add NEON assembly for transform function for faster SHA512 on ARM. Major speed
up thanks to 64-bit integer registers and large register file that can hold
full input buffer.
Benchmark results on Cortex-A8, 1Ghz:
Old:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 17050ms 18780ms 29120ms 18040ms 17190ms
SHA384 17130ms 18720ms 29160ms 18090ms 17280ms
New:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512 3600ms 5070ms 15330ms 4510ms 3480ms
SHA384 3590ms 5060ms 15350ms 4510ms 3520ms
New vs old:
SHA512 4.74x 3.70x 1.90x 4.00x 4.94x
SHA384 4.77x 3.70x 1.90x 4.01x 4.91x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|