| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_xts_crypt_amd64): On fast exit path, compare
number of blocks left against '1' instead of '0' as following
branch is 'less than'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ctr_enc_amd64): Add handling for the case when
only main counter needs carry handling but generated vector counters
do not.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx-amd64.S: Move constant data to
read-only section.
* cipher/camellia-aesni-avx2-amd64.h: Likewise.
* cipher/camellia-gfni-avx512-amd64.S: Likewise.
* cipher/chacha20-amd64-avx2.S: Likewise.
* cipher/chacha20-amd64-avx512.S: Likewise.
* cipher/chacha20-amd64-ssse3.S: Likewise.
* cipher/des-amd64.s: Likewise.
* cipher/rijndael-ssse3-amd64-asm.S: Likewise.
* cipher/rijndael-vaes-avx2-amd64.S: Likewise.
* cipher/serpent-avx2-amd64.S: Likewise.
* cipher/sm4-aesni-avx-amd64.S: Likewise.
* cipher/sm4-aesni-avx2-amd64.S: Likewise.
* cipher/sm4-gfni-avx2-amd64.S: Likewise.
* cipher/sm4-gfni-avx512-amd64.S: Likewise.
* cipher/twofish-avx2-amd64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S: Align functions to 16 bytes.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'.
* cipher/cipher.c (do_ecb_crypt): Use bulk function if available.
* cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label
'.Ldeclast' to '.Lenclast'.
(_gcry_aes_aesni_ecb_crypt): New.
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce)
(_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change
return value from void to size_t.
(ocb_crypt_fn_t, xts_crypt_fn_t): Remove.
(_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove
indirect function call; Return value from called function (allows tail
call optimization).
(_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows
tail call optimization).
(_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce)
(_gcry_aes_armv8_ce_ecb_crypt): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ecb_crypt_amd64): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64)
(_gcry_aes_vaes_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt)
(_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New.
(do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE.
--
Benchmark on AMD Ryzen 9 7900X:
Before (OCB for reference):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1
ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608
OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625
OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625
ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ocb_checksum): Remove.
(_gcry_vaes_avx2_ocb_crypt_amd64): Add inline checksumming.
--
VAES/AVX2/OCB encryption implementation had same issue with
performance drop with large buffers as did AES-NI/OCB implementation,
see e924ce456d5728a81c148de4a6eb23373cb70ca0 for details. Patch
changes VAES/AVX2/OCB to perform checksumming inline with encryption
and decryption instead of using 2-pass approach. Inline checksumming
also gives nice small ~6% speed boost too.
Benchmark on Intel Core i3-1115G4:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.044 ns/B 21569 MiB/s 0.181 c/B 4089
OCB dec | 0.045 ns/B 21298 MiB/s 0.183 c/B 4089
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.042 ns/B 22922 MiB/s 0.170 c/B 4089
OCB dec | 0.042 ns/B 22676 MiB/s 0.172 c/B 4089
GnuPG-bug-id: T5875
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (ret_spec_stop): New.
* cipher/arcfour-amd64.S: Use 'ret_spec_stop' for 'ret' instruction.
* cipher/blake2b-amd64-avx2.S: Likewise.
* cipher/blake2s-amd64-avx.S: Likewise.
* cipher/blowfish-amd64.S: Likewise.
* cipher/camellia-aesni-avx-amd64.S: Likewise.
* cipher/camellia-aesni-avx2-amd64.h: Likewise.
* cipher/cast5-amd64.S: Likewise.
* cipher/chacha20-amd64-avx2.S: Likewise.
* cipher/chacha20-amd64-ssse3.S: Likewise.
* cipher/des-amd64.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/rijndael-amd64.S: Likewise.
* cipher/rijndael-ssse3-amd64-asm.S: Likewise.
* cipher/rijndael-vaes-avx2-amd64.S: Likewise.
* cipher/salsa20-amd64.S: Likewise.
* cipher/serpent-avx2-amd64.S: Likewise.
* cipher/serpent-sse2-amd64.S: Likewise.
* cipher/sha1-avx-amd64.S: Likewise.
* cipher/sha1-avx-bmi2-amd64.S: Likewise.
* cipher/sha1-avx2-bmi2-amd64.S: Likewise.
* cipher/sha1-ssse3-amd64.S: Likewise.
* cipher/sha256-avx-amd64.S: Likewise.
* cipher/sha256-avx2-bmi2-amd64.S: Likewise.
* cipher/sha256-ssse3-amd64.S: Likewise.
* cipher/sha512-avx-amd64.S: Likewise.
* cipher/sha512-avx2-bmi2-amd64.S: Likewise.
* cipher/sha512-ssse3-amd64.S: Likewise.
* cipher/sm3-avx-bmi2-amd64.S: Likewise.
* cipher/sm4-aesni-avx-amd64.S: Likewise.
* cipher/sm4-aesni-avx2-amd64.S: Likewise.
* cipher/twofish-amd64.S: Likewise.
* cipher/twofish-avx2-amd64.S: Likewise.
* cipher/whirlpool-sse2-amd64.S: Likewise.
* mpi/amd64/func_abi.h (CFI_*): Remove, include from "asm-common-amd64.h"
instead.
(FUNC_EXIT): Use 'ret_spec_stop' for 'ret' instruction.
* mpi/asm-common-amd64.h: New.
* mpi/i386/mpih-add1.S: Use 'ret_spec_stop' for 'ret' instruction.
* mpi/i386/mpih-lshift.S: Likewise.
* mpi/i386/mpih-mul1.S: Likewise.
* mpi/i386/mpih-mul2.S: Likewise.
* mpi/i386/mpih-mul3.S: Likewise.
* mpi/i386/mpih-rshift.S: Likewise.
* mpi/i386/mpih-sub1.S: Likewise.
* mpi/i386/syntax.h (ret_spec_stop): New.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-siv.c (do_ctr_le32): Use bulk function if
available.
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ctr32le_enc'.
* cipher/rijndael-aesni.c (_gcry_aes_aesni_ctr32le_enc): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ctr32le_enc_amd64, .Lle_addd_*): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ctr32le_enc_amd64)
(_gcry_aes_vaes_ctr32le_enc): New.
* cipher/rijndael.c (_gcry_aes_aesni_ctr32le_enc)
(_gcry_aes_vaes_ctr32le_enc): New prototypes.
(do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for AES-NI and
VAES.
* tests/basic.c (check_gcm_siv_cipher): Add large test-vector for
bulk ops testing.
--
Counter mode in GCM-SIV is little-endian on first 4 bytes of
of counter block, unlike regular CTR mode which works on
big-endian full block.
Benchmark on AMD Ryzen 7 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 1.00 ns/B 953.2 MiB/s 4.85 c/B 4850
GCM-SIV dec | 1.01 ns/B 940.1 MiB/s 4.92 c/B 4850
GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.575 c/B 4850
After (~6x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 0.150 ns/B 6367 MiB/s 0.727 c/B 4850
GCM-SIV dec | 0.161 ns/B 5909 MiB/s 0.783 c/B 4850
GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.574 c/B 4850
GnuPG-bug-id: T4485
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-vaes.c' and
'rijndael-vaes-avx2-amd64.S'.
* cipher/rijndael-internal.h (USE_VAES): New.
* cipher/rijndael-vaes-avx2-amd64.S: New.
* cipher/rijndael-vaes.c: New.
* cipher/rijndael.c (_gcry_aes_vaes_cfb_dec, _gcry_aes_vaes_cbc_dec)
(_gcry_aes_vaes_ctr_enc, _gcry_aes_vaes_ocb_crypt)
(_gcry_aes_vaes_xts_crypt): New.
(do_setkey) [USE_VAES]: Add detection for VAES.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128)
[USE_VAES]: Increase number of selftest blocks.
* configure.ac: Add 'rijndael-vaes.lo' and
'rijndael-vaes-avx2-amd64.lo'.
--
Patch adds VAES/AVX2 accelerated implementation for CBC-decryption,
CFB-decryption, CTR-encryption, OCB-en/decryption and XTS-en/decryption.
Benchmarks on AMD Ryzen 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.067 ns/B 14314 MiB/s 0.323 c/B 4850
CFB dec | 0.067 ns/B 14322 MiB/s 0.323 c/B 4850
CTR enc | 0.066 ns/B 14429 MiB/s 0.321 c/B 4850
CTR dec | 0.066 ns/B 14433 MiB/s 0.320 c/B 4850
XTS enc | 0.087 ns/B 10910 MiB/s 0.424 c/B 4850
XTS dec | 0.088 ns/B 10856 MiB/s 0.426 c/B 4850
OCB enc | 0.070 ns/B 13633 MiB/s 0.339 c/B 4850
OCB dec | 0.069 ns/B 13911 MiB/s 0.332 c/B 4850
After (XTS ~1.7x faster, others ~1.9x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.034 ns/B 28159 MiB/s 0.164 c/B 4850
CFB dec | 0.034 ns/B 27955 MiB/s 0.165 c/B 4850
CTR enc | 0.034 ns/B 28214 MiB/s 0.164 c/B 4850
CTR dec | 0.034 ns/B 28146 MiB/s 0.164 c/B 4850
XTS enc | 0.051 ns/B 18539 MiB/s 0.249 c/B 4850
XTS dec | 0.051 ns/B 18655 MiB/s 0.248 c/B 4850
GCM auth | 0.088 ns/B 10817 MiB/s 0.428 c/B 4850
OCB enc | 0.037 ns/B 25824 MiB/s 0.179 c/B 4850
OCB dec | 0.038 ns/B 25359 MiB/s 0.182 c/B 4850
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|