diff options
author | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2022-05-01 16:01:41 +0300 |
---|---|---|
committer | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2022-05-11 20:14:33 +0300 |
commit | 9ab61ba24b72bc109b7578a7868716910d2ea9d1 (patch) | |
tree | 6f0d22f7e6fe010535e78bbec83e2fc428c1f1a6 /cipher/chacha20-amd64-avx512.S | |
parent | a611e3a25d61505698e2bb38ec2db38bc6a74820 (diff) | |
download | libgcrypt-9ab61ba24b72bc109b7578a7868716910d2ea9d1.tar.gz |
camellia: add amd64 GFNI/AVX512 implementation
* cipher/Makefile.am: Add 'camellia-gfni-avx512-amd64.S'.
* cipher/bulkhelp.h (bulk_ocb_prepare_L_pointers_array_blk64): New.
* cipher/camellia-aesni-avx2-amd64.h: Rename internal functions from
"__camellia_???" to "FUNC_NAME(???)"; Minor changes to comments.
* cipher/camellia-gfni-avx512-amd64.S: New.
* cipher/camellia-gfni.c (USE_GFNI_AVX512): New.
(CAMELLIA_context): Add 'use_gfni_avx512'.
(_gcry_camellia_gfni_avx512_ctr_enc, _gcry_camellia_gfni_avx512_cbc_dec)
(_gcry_camellia_gfni_avx512_cfb_dec, _gcry_camellia_gfni_avx512_ocb_enc)
(_gcry_camellia_gfni_avx512_ocb_dec)
(_gcry_camellia_gfni_avx512_enc_blk64)
(_gcry_camellia_gfni_avx512_dec_blk64, avx512_burn_stack_depth): New.
(camellia_setkey): Use GFNI/AVX512 if supported by CPU.
(camellia_encrypt_blk1_64, camellia_decrypt_blk1_64): New.
(_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec, _gcry_camellia_cfb_dec)
(_gcry_camellia_ocb_crypt) [USE_GFNI_AVX512]: Add GFNI/AVX512 code path.
(_gcry_camellia_xts_crypt): Change parallel block size from 32 to 64.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Increase test
block size.
* cipher/chacha20-amd64-avx512.S: Clear k-mask registers with xor.
* cipher/poly1305-amd64-avx512.S: Likewise.
* cipher/sha512-avx512-amd64.S: Likewise.
---
Benchmark on Intel i3-1115G4 (tigerlake):
Before (GFNI/AVX2):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.356 ns/B 2679 MiB/s 1.46 c/B 4089
CFB dec | 0.374 ns/B 2547 MiB/s 1.53 c/B 4089
CTR enc | 0.409 ns/B 2332 MiB/s 1.67 c/B 4089
CTR dec | 0.406 ns/B 2347 MiB/s 1.66 c/B 4089
XTS enc | 0.430 ns/B 2216 MiB/s 1.76 c/B 4090
XTS dec | 0.433 ns/B 2201 MiB/s 1.77 c/B 4090
OCB enc | 0.460 ns/B 2071 MiB/s 1.88 c/B 4089
OCB dec | 0.492 ns/B 1939 MiB/s 2.01 c/B 4089
After (GFNI/AVX512):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.207 ns/B 4600 MiB/s 0.827 c/B 3989
CFB dec | 0.207 ns/B 4610 MiB/s 0.825 c/B 3989
CTR enc | 0.218 ns/B 4382 MiB/s 0.868 c/B 3990
CTR dec | 0.217 ns/B 4389 MiB/s 0.867 c/B 3990
XTS enc | 0.330 ns/B 2886 MiB/s 1.35 c/B 4097±4
XTS dec | 0.328 ns/B 2904 MiB/s 1.35 c/B 4097±3
OCB enc | 0.246 ns/B 3879 MiB/s 0.981 c/B 3990
OCB dec | 0.247 ns/B 3855 MiB/s 0.987 c/B 3990
CBC dec: 70% faster
CFB dec: 80% faster
CTR: 87% faster
XTS: 31% faster
OCB: 92% faster
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Diffstat (limited to 'cipher/chacha20-amd64-avx512.S')
-rw-r--r-- | cipher/chacha20-amd64-avx512.S | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/cipher/chacha20-amd64-avx512.S b/cipher/chacha20-amd64-avx512.S index da24286e..8b4d7499 100644 --- a/cipher/chacha20-amd64-avx512.S +++ b/cipher/chacha20-amd64-avx512.S @@ -287,7 +287,7 @@ _gcry_chacha20_amd64_avx512_blocks16: /* clear the used vector registers */ clear_zmm16_zmm31(); - kmovd %eax, %k2; + kxord %k2, %k2, %k2; vzeroall; /* clears ZMM0-ZMM15 */ /* eax zeroed by round loop. */ |