summaryrefslogtreecommitdiff
path: root/cipher/chacha20-armv7-neon.S
Commit message (Collapse)AuthorAgeFilesLines
* Use 'vmov' and 'movi' for vector register clearing in ARM assemblyJussi Kivilinna2022-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | * cipher/chacha20-aarch64.S (clear): Use 'movi'. * cipher/chacha20-armv7-neon.S (clear): Use 'vmov'. * cipher/cipher-gcm-armv7-neon.S (clear): Use 'vmov'. * cipher/cipher-gcm-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'. * cipher/cipher-gcm-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'. * cipher/rijndael-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'. * cipher/sha1-armv7-neon.S (clear): Use 'vmov'. * cipher/sha1-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'. * cipher/sha1-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'. * cipher/sha256-armv8-aarch32-ce.S (CLEAR_REG): Use 'vmov'. * cipher/sha256-armv8-aarch64-ce.S (CLEAR_REG): Use 'movi'. * cipher/sha512-armv7-neon.S (CLEAR_REG): New using 'vmov'. (_gcry_sha512_transform_armv7_neon): Use CLEAR_REG for clearing registers. -- Use 'vmov reg, #0' on 32-bit and 'movi reg.16b, #0' instead of self-xoring register to break false register dependency. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* New ChaCha implementationsJussi Kivilinna2018-01-091-714/+357
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Remove 'chacha20-sse2-amd64.S', 'chacha20-ssse3-amd64.S', 'chacha20-avx2-amd64.S'; Add 'chacha20-amd64-ssse3.S', 'chacha20-amd64-avx2.S'. * cipher/chacha20-amd64-avx2.S: New. * cipher/chacha20-amd64-ssse3.S: New. * cipher/chacha20-armv7-neon.S: Rewrite. * cipher/chacha20-avx2-amd64.S: Remove. * cipher/chacha20-sse2-amd64.S: Remove. * cipher/chacha20-ssse3-amd64.S: Remove. * cipher/chacha20.c (CHACHA20_INPUT_LENGTH, USE_SSE2, USE_NEON) (ASM_EXTRA_STACK, chacha20_blocks_t, _gcry_chacha20_amd64_sse2_blocks) (_gcry_chacha20_amd64_ssse3_blocks, _gcry_chacha20_amd64_avx2_blocks) (_gcry_chacha20_armv7_neon_blocks, QROUND, QOUT, chacha20_core) (chacha20_do_encrypt_stream): Remove. (_gcry_chacha20_amd64_ssse3_blocks4, _gcry_chacha20_amd64_avx2_blocks8) (_gcry_chacha20_armv7_neon_blocks4, ROTATE, XOR, PLUS, PLUSONE) (QUARTERROUND, BUF_XOR_LE32): New. (CHACHA20_context_s, chacha20_blocks, chacha20_keysetup) (chacha20_encrypt_stream): Rewrite. (chacha20_do_setkey): Adjust for new CHACHA20_context_s. * configure.ac: Remove 'chacha20-sse2-amd64.lo', 'chacha20-ssse3-amd64.lo', 'chacha20-avx2-amd64.lo'; Add 'chacha20-amd64-ssse3.lo', 'chacha20-amd64-avx2.lo'. -- Intel Core i7-4790K CPU @ 4.00GHz (x86_64/AVX2): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.319 ns/B 2988.5 MiB/s 1.28 c/B STREAM dec | 0.318 ns/B 2995.4 MiB/s 1.27 c/B Intel Core i7-4790K CPU @ 4.00GHz (x86_64/SSSE3): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.633 ns/B 1507.4 MiB/s 2.53 c/B STREAM dec | 0.633 ns/B 1506.6 MiB/s 2.53 c/B Intel Core i7-4790K CPU @ 4.00GHz (i386): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.05 ns/B 465.2 MiB/s 8.20 c/B STREAM dec | 2.04 ns/B 467.5 MiB/s 8.16 c/B Cortex-A53 @ 1152Mhz (armv7/neon): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 5.29 ns/B 180.3 MiB/s 6.09 c/B STREAM dec | 5.29 ns/B 180.1 MiB/s 6.10 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* chacha20-armv7-neon: fix to use fast code path when memory is alignedJussi Kivilinna2017-05-181-1/+1
| | | | | | | | * cipher/chacha20-armv7-neon.S (UNALIGNED_LDMIA4): Uncomment instruction for jump to aligned code path. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Fix unaligned accesses with ldm/stm in ChaCha20 and Poly1305 ARM/NEONJussi Kivilinna2016-07-081-8/+48
| | | | | | | | | | | | | | | * cipher/chacha20-armv7-neon.S (UNALIGNED_STMIA8) (UNALIGNED_LDMIA4): New. (_gcry_chacha20_armv7_neon_blocks): Use new helper macros instead of ldm/stm instructions directly. * cipher/poly1305-armv7-neon.S (UNALIGNED_LDMIA2) (UNALIGNED_LDMIA4): New. (_gcry_poly1305_armv7_neon_init_ext, _gcry_poly1305_armv7_neon_blocks) (_gcry_poly1305_armv7_neon_finish_ext): Use new helper macros instead of ldm instruction directly. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* chacha20: add ARMv7/NEON implementationJussi Kivilinna2014-11-021-0/+710
* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'. * cipher/chacha20-armv7-neon.S: New. * cipher/chacha20.c (USE_NEON): New. [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New. (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if HWF_ARM_NEON flag set. (selftest): Self-test encrypting buffer byte by byte. * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'. -- Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original source is available at: https://github.com/floodyberry/chacha-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 13.45 ns/B 70.92 MiB/s 13.56 c/B STREAM dec | 13.45 ns/B 70.90 MiB/s 13.56 c/B New: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 6.20 ns/B 153.9 MiB/s 6.25 c/B STREAM dec | 6.20 ns/B 153.9 MiB/s 6.25 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>