summaryrefslogtreecommitdiff
path: root/cipher/rijndael-vaes-avx2-amd64.S
Commit message (Collapse)AuthorAgeFilesLines
* aes-amd64-vaes: fix fast exit path in XTS functionJussi Kivilinna2023-02-261-2/+2
| | | | | | | | | | * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_xts_crypt_amd64): On fast exit path, compare number of blocks left against '1' instead of '0' as following branch is 'less than'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aes-vaes-avx2: improve case when only CTR needs carry handlingJussi Kivilinna2023-02-221-35/+41
| | | | | | | | | | * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ctr_enc_amd64): Add handling for the case when only main counter needs carry handling but generated vector counters do not. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* amd64-asm: move constant data to read-only section for cipher algosJussi Kivilinna2023-01-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | * cipher/camellia-aesni-avx-amd64.S: Move constant data to read-only section. * cipher/camellia-aesni-avx2-amd64.h: Likewise. * cipher/camellia-gfni-avx512-amd64.S: Likewise. * cipher/chacha20-amd64-avx2.S: Likewise. * cipher/chacha20-amd64-avx512.S: Likewise. * cipher/chacha20-amd64-ssse3.S: Likewise. * cipher/des-amd64.s: Likewise. * cipher/rijndael-ssse3-amd64-asm.S: Likewise. * cipher/rijndael-vaes-avx2-amd64.S: Likewise. * cipher/serpent-avx2-amd64.S: Likewise. * cipher/sm4-aesni-avx-amd64.S: Likewise. * cipher/sm4-aesni-avx2-amd64.S: Likewise. * cipher/sm4-gfni-avx2-amd64.S: Likewise. * cipher/sm4-gfni-avx512-amd64.S: Likewise. * cipher/twofish-avx2-amd64.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-vaes: align asm functionsJussi Kivilinna2022-10-261-0/+7
| | | | | | | * cipher/rijndael-vaes-avx2-amd64.S: Align functions to 16 bytes. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: add ECB acceleration (for benchmarking purposes)Jussi Kivilinna2022-10-261-1/+431
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'. * cipher/cipher.c (do_ecb_crypt): Use bulk function if available. * cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label '.Ldeclast' to '.Lenclast'. (_gcry_aes_aesni_ecb_crypt): New. * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce) (_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change return value from void to size_t. (ocb_crypt_fn_t, xts_crypt_fn_t): Remove. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove indirect function call; Return value from called function (allows tail call optimization). (_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows tail call optimization). (_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce) (_gcry_aes_armv8_ce_ecb_crypt): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ecb_crypt_amd64): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64) (_gcry_aes_vaes_ecb_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt) (_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New. (do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE. -- Benchmark on AMD Ryzen 9 7900X: Before (OCB for reference): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1 ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608 OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625 OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625 After: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625 ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-vaes-avx2: perform checksumming inlineJussi Kivilinna2022-03-091-237/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ocb_checksum): Remove. (_gcry_vaes_avx2_ocb_crypt_amd64): Add inline checksumming. -- VAES/AVX2/OCB encryption implementation had same issue with performance drop with large buffers as did AES-NI/OCB implementation, see e924ce456d5728a81c148de4a6eb23373cb70ca0 for details. Patch changes VAES/AVX2/OCB to perform checksumming inline with encryption and decryption instead of using 2-pass approach. Inline checksumming also gives nice small ~6% speed boost too. Benchmark on Intel Core i3-1115G4: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz OCB enc | 0.044 ns/B 21569 MiB/s 0.181 c/B 4089 OCB dec | 0.045 ns/B 21298 MiB/s 0.183 c/B 4089 After: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz OCB enc | 0.042 ns/B 22922 MiB/s 0.170 c/B 4089 OCB dec | 0.042 ns/B 22676 MiB/s 0.172 c/B 4089 GnuPG-bug-id: T5875 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add straight-line speculation hardening for amd64 and i386 assemblyJussi Kivilinna2022-01-111-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-amd64.h (ret_spec_stop): New. * cipher/arcfour-amd64.S: Use 'ret_spec_stop' for 'ret' instruction. * cipher/blake2b-amd64-avx2.S: Likewise. * cipher/blake2s-amd64-avx.S: Likewise. * cipher/blowfish-amd64.S: Likewise. * cipher/camellia-aesni-avx-amd64.S: Likewise. * cipher/camellia-aesni-avx2-amd64.h: Likewise. * cipher/cast5-amd64.S: Likewise. * cipher/chacha20-amd64-avx2.S: Likewise. * cipher/chacha20-amd64-ssse3.S: Likewise. * cipher/des-amd64.S: Likewise. * cipher/rijndael-aarch64.S: Likewise. * cipher/rijndael-amd64.S: Likewise. * cipher/rijndael-ssse3-amd64-asm.S: Likewise. * cipher/rijndael-vaes-avx2-amd64.S: Likewise. * cipher/salsa20-amd64.S: Likewise. * cipher/serpent-avx2-amd64.S: Likewise. * cipher/serpent-sse2-amd64.S: Likewise. * cipher/sha1-avx-amd64.S: Likewise. * cipher/sha1-avx-bmi2-amd64.S: Likewise. * cipher/sha1-avx2-bmi2-amd64.S: Likewise. * cipher/sha1-ssse3-amd64.S: Likewise. * cipher/sha256-avx-amd64.S: Likewise. * cipher/sha256-avx2-bmi2-amd64.S: Likewise. * cipher/sha256-ssse3-amd64.S: Likewise. * cipher/sha512-avx-amd64.S: Likewise. * cipher/sha512-avx2-bmi2-amd64.S: Likewise. * cipher/sha512-ssse3-amd64.S: Likewise. * cipher/sm3-avx-bmi2-amd64.S: Likewise. * cipher/sm4-aesni-avx-amd64.S: Likewise. * cipher/sm4-aesni-avx2-amd64.S: Likewise. * cipher/twofish-amd64.S: Likewise. * cipher/twofish-avx2-amd64.S: Likewise. * cipher/whirlpool-sse2-amd64.S: Likewise. * mpi/amd64/func_abi.h (CFI_*): Remove, include from "asm-common-amd64.h" instead. (FUNC_EXIT): Use 'ret_spec_stop' for 'ret' instruction. * mpi/asm-common-amd64.h: New. * mpi/i386/mpih-add1.S: Use 'ret_spec_stop' for 'ret' instruction. * mpi/i386/mpih-lshift.S: Likewise. * mpi/i386/mpih-mul1.S: Likewise. * mpi/i386/mpih-mul2.S: Likewise. * mpi/i386/mpih-mul3.S: Likewise. * mpi/i386/mpih-rshift.S: Likewise. * mpi/i386/mpih-sub1.S: Likewise. * mpi/i386/syntax.h (ret_spec_stop): New. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add x86 HW acceleration for GCM-SIV counter modeJussi Kivilinna2021-08-261-0/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-siv.c (do_ctr_le32): Use bulk function if available. * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ctr32le_enc'. * cipher/rijndael-aesni.c (_gcry_aes_aesni_ctr32le_enc): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ctr32le_enc_amd64, .Lle_addd_*): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ctr32le_enc_amd64) (_gcry_aes_vaes_ctr32le_enc): New. * cipher/rijndael.c (_gcry_aes_aesni_ctr32le_enc) (_gcry_aes_vaes_ctr32le_enc): New prototypes. (do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for AES-NI and VAES. * tests/basic.c (check_gcm_siv_cipher): Add large test-vector for bulk ops testing. -- Counter mode in GCM-SIV is little-endian on first 4 bytes of of counter block, unlike regular CTR mode which works on big-endian full block. Benchmark on AMD Ryzen 7 5800X: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 1.00 ns/B 953.2 MiB/s 4.85 c/B 4850 GCM-SIV dec | 1.01 ns/B 940.1 MiB/s 4.92 c/B 4850 GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.575 c/B 4850 After (~6x faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 0.150 ns/B 6367 MiB/s 0.727 c/B 4850 GCM-SIV dec | 0.161 ns/B 5909 MiB/s 0.783 c/B 4850 GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.574 c/B 4850 GnuPG-bug-id: T4485 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: add x86_64 VAES/AVX2 accelerated implementationJussi Kivilinna2021-02-281-0/+2693
* cipher/Makefile.am: Add 'rijndael-vaes.c' and 'rijndael-vaes-avx2-amd64.S'. * cipher/rijndael-internal.h (USE_VAES): New. * cipher/rijndael-vaes-avx2-amd64.S: New. * cipher/rijndael-vaes.c: New. * cipher/rijndael.c (_gcry_aes_vaes_cfb_dec, _gcry_aes_vaes_cbc_dec) (_gcry_aes_vaes_ctr_enc, _gcry_aes_vaes_ocb_crypt) (_gcry_aes_vaes_xts_crypt): New. (do_setkey) [USE_VAES]: Add detection for VAES. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128) [USE_VAES]: Increase number of selftest blocks. * configure.ac: Add 'rijndael-vaes.lo' and 'rijndael-vaes-avx2-amd64.lo'. -- Patch adds VAES/AVX2 accelerated implementation for CBC-decryption, CFB-decryption, CTR-encryption, OCB-en/decryption and XTS-en/decryption. Benchmarks on AMD Ryzen 5800X: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.067 ns/B 14314 MiB/s 0.323 c/B 4850 CFB dec | 0.067 ns/B 14322 MiB/s 0.323 c/B 4850 CTR enc | 0.066 ns/B 14429 MiB/s 0.321 c/B 4850 CTR dec | 0.066 ns/B 14433 MiB/s 0.320 c/B 4850 XTS enc | 0.087 ns/B 10910 MiB/s 0.424 c/B 4850 XTS dec | 0.088 ns/B 10856 MiB/s 0.426 c/B 4850 OCB enc | 0.070 ns/B 13633 MiB/s 0.339 c/B 4850 OCB dec | 0.069 ns/B 13911 MiB/s 0.332 c/B 4850 After (XTS ~1.7x faster, others ~1.9x faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.034 ns/B 28159 MiB/s 0.164 c/B 4850 CFB dec | 0.034 ns/B 27955 MiB/s 0.165 c/B 4850 CTR enc | 0.034 ns/B 28214 MiB/s 0.164 c/B 4850 CTR dec | 0.034 ns/B 28146 MiB/s 0.164 c/B 4850 XTS enc | 0.051 ns/B 18539 MiB/s 0.249 c/B 4850 XTS dec | 0.051 ns/B 18655 MiB/s 0.248 c/B 4850 GCM auth | 0.088 ns/B 10817 MiB/s 0.428 c/B 4850 OCB enc | 0.037 ns/B 25824 MiB/s 0.179 c/B 4850 OCB dec | 0.038 ns/B 25359 MiB/s 0.182 c/B 4850 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>