summaryrefslogtreecommitdiff
path: root/cipher/serpent-armv7-neon.S
Commit message (Collapse)AuthorAgeFilesLines
* serpent: accelerate XTS and ECB modesJussi Kivilinna2022-10-261-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/serpent-armv7-neon.S (_gcry_serpent_neon_blk8): New. * cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_blk16): New. * cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_blk8): New. * cipher/serpent.c (_gcry_serpent_sse2_blk8) (_gcry_serpent_avx2_blk16, _gcry_serpent_neon_blk8) (_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt) (serpent_crypt_blk1_16, serpent_encrypt_blk1_16) (serpent_decrypt_blk1_16): New. (serpent_setkey): Setup XTS and ECB bulk functions. -- Benchmark on AMD Ryzen 9 7900X: Before: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 5.42 ns/B 176.0 MiB/s 30.47 c/B 5625 ECB dec | 4.82 ns/B 197.9 MiB/s 27.11 c/B 5625 XTS enc | 5.57 ns/B 171.3 MiB/s 31.31 c/B 5625 XTS dec | 4.99 ns/B 191.1 MiB/s 28.07 c/B 5625 After: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.708 ns/B 1347 MiB/s 3.98 c/B 5625 ECB dec | 0.694 ns/B 1373 MiB/s 3.91 c/B 5625 XTS enc | 0.766 ns/B 1246 MiB/s 4.31 c/B 5625 XTS dec | 0.754 ns/B 1264 MiB/s 4.24 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add bulk OCB for Serpent SSE2, AVX2 and NEON implementationsJussi Kivilinna2015-07-271-0/+255
| | | | | | | | | | | | | | | | | | | | | * cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk functions for Serpent. * cipher/serpent-armv7-neon.S: Add OCB assembly functions. * cipher/serpent-avx2-amd64.S: Add OCB assembly functions. * cipher/serpent-sse2-amd64.S: Add OCB assembly functions. * cipher/serpent.c (_gcry_serpent_sse2_ocb_enc) (_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth) (_gcry_serpent_neon_ocb_enc, _gcry_serpent_neon_ocb_dec) (_gcry_serpent_neon_ocb_auth, _gcry_serpent_avx2_ocb_enc) (_gcry_serpent_avx2_ocb_dec, _gcry_serpent_avx2_ocb_auth): New prototypes. (get_l, _gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): New. * src/cipher.h (_gcry_serpent_ocb_crypt) (_gcry_serpent_ocb_auth): New. * tests/basic.c (check_ocb_cipher): Add test-vector for serpent. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Change utf-8 copyright characters to '(C)'Jussi Kivilinna2013-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to '(C)'. cipher/blowfish-arm.S: Ditto. cipher/bufhelp.h: Ditto. cipher/camellia-aesni-avx-amd64.S: Ditto. cipher/camellia-aesni-avx2-amd64.S: Ditto. cipher/camellia-arm.S: Ditto. cipher/cast5-amd64.S: Ditto. cipher/cast5-arm.S: Ditto. cipher/cipher-ccm.c: Ditto. cipher/cipher-cmac.c: Ditto. cipher/cipher-gcm.c: Ditto. cipher/cipher-selftest.c: Ditto. cipher/cipher-selftest.h: Ditto. cipher/mac-cmac.c: Ditto. cipher/mac-gmac.c: Ditto. cipher/mac-hmac.c: Ditto. cipher/mac-internal.h: Ditto. cipher/mac.c: Ditto. cipher/rijndael-amd64.S: Ditto. cipher/rijndael-arm.S: Ditto. cipher/salsa20-amd64.S: Ditto. cipher/salsa20-armv7-neon.S: Ditto. cipher/serpent-armv7-neon.S: Ditto. cipher/serpent-avx2-amd64.S: Ditto. cipher/serpent-sse2-amd64.S: Ditto. -- Avoid use of '©' for easier parsing of source for copyright information. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARM NEON assembly implementation of SerpentJussi Kivilinna2013-10-281-0/+869
* cipher/Makefile.am: Add 'serpent-armv7-neon.S'. * cipher/serpent-armv7-neon.S: New. * cipher/serpent.c (USE_NEON): New macro. (serpent_context_t) [USE_NEON]: Add 'use_neon'. [USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec) (_gcry_serpent_neon_cbc_dec): New prototypes. (serpent_setkey_internal) [USE_NEON]: Detect NEON support. (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec) (_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations to process eight blocks in parallel. * configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'. -- Patch adds ARM NEON optimized implementation of Serpent cipher to speed up parallelizable bulk operations. Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz): Old: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 43.53 ns/B 21.91 MiB/s 43.88 c/B CFB dec | 44.77 ns/B 21.30 MiB/s 45.13 c/B CTR enc | 45.21 ns/B 21.10 MiB/s 45.57 c/B CTR dec | 45.21 ns/B 21.09 MiB/s 45.57 c/B New: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte CBC dec | 26.26 ns/B 36.32 MiB/s 26.47 c/B CFB dec | 26.21 ns/B 36.38 MiB/s 26.42 c/B CTR enc | 26.20 ns/B 36.40 MiB/s 26.41 c/B CTR dec | 26.20 ns/B 36.40 MiB/s 26.41 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>