diff options
author | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2022-07-21 11:05:38 +0300 |
---|---|---|
committer | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2022-07-21 11:05:38 +0300 |
commit | eaed633c1662d8a98042ac146c981113f2807b22 (patch) | |
tree | 5d0977724cbf429c34f2bc52dfe6f2f32406a2c6 /configure.ac | |
parent | 2dc2654006746a25f9cb6b24786867f1725ac244 (diff) | |
download | libgcrypt-eaed633c1662d8a98042ac146c981113f2807b22.tar.gz |
sm4: add amd64 GFNI/AVX512 implementation
* cipher/Makefile.am: Add 'sm4-gfni-avx512-amd64.S'.
* cipher/sm4-gfni-avx512-amd64.S: New.
* cipher/sm4-gfni.c (USE_GFNI_AVX512): New.
(SM4_context): Add 'use_gfni_avx512' and 'crypt_blk1_16'.
(_gcry_sm4_gfni_avx512_expand_key, _gcry_sm4_gfni_avx512_ctr_enc)
(_gcry_sm4_gfni_avx512_cbc_dec, _gcry_sm4_gfni_avx512_cfb_dec)
(_gcry_sm4_gfni_avx512_ocb_enc, _gcry_sm4_gfni_avx512_ocb_dec)
(_gcry_sm4_gfni_avx512_ocb_auth, _gcry_sm4_gfni_avx512_ctr_enc_blk32)
(_gcry_sm4_gfni_avx512_cbc_dec_blk32)
(_gcry_sm4_gfni_avx512_cfb_dec_blk32)
(_gcry_sm4_gfni_avx512_ocb_enc_blk32)
(_gcry_sm4_gfni_avx512_ocb_dec_blk32)
(_gcry_sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_gfni_avx512_crypt_blk32, sm4_gfni_avx512_crypt_blk1_16)
(sm4_crypt_blk1_32, sm4_encrypt_blk1_32, sm4_decrypt_blk1_32): New.
(sm4_expand_key): Add GFNI/AVX512 code-path
(sm4_setkey): Use GFNI/AVX512 if supported by CPU; Setup
`ctx->crypt_blk1_16`.
(sm4_encrypt, sm4_decrypt, sm4_get_crypt_blk1_16_fn, _gcry_sm4_ctr_enc)
(_gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec, _gcry_sm4_ocb_crypt)
(_gcry_sm4_ocb_auth) [USE_GFNI_AVX512]: Add GFNI/AVX512 code path.
(_gcry_sm4_xts_crypt): Change parallel block size from 16 to 32.
* configure.ac: Add 'sm4-gfni-avx512-amd64.lo'.
--
Benchmark on Intel i3-1115G4 (tigerlake):
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 9.45 ns/B 101.0 MiB/s 38.63 c/B 4089
CBC dec | 0.647 ns/B 1475 MiB/s 2.64 c/B 4089
CFB enc | 9.43 ns/B 101.1 MiB/s 38.57 c/B 4089
CFB dec | 0.648 ns/B 1472 MiB/s 2.65 c/B 4089
CTR enc | 0.661 ns/B 1443 MiB/s 2.70 c/B 4089
CTR dec | 0.661 ns/B 1444 MiB/s 2.70 c/B 4089
XTS enc | 0.767 ns/B 1243 MiB/s 3.14 c/B 4089
XTS dec | 0.772 ns/B 1235 MiB/s 3.16 c/B 4089
OCB enc | 0.671 ns/B 1421 MiB/s 2.74 c/B 4089
OCB dec | 0.676 ns/B 1410 MiB/s 2.77 c/B 4089
OCB auth | 0.668 ns/B 1428 MiB/s 2.73 c/B 4090
After:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 7.80 ns/B 122.2 MiB/s 31.91 c/B 4090
CBC dec | 0.293 ns/B 3258 MiB/s 1.20 c/B 4095±3
CFB enc | 7.80 ns/B 122.2 MiB/s 31.90 c/B 4089
CFB dec | 0.294 ns/B 3247 MiB/s 1.20 c/B 4096±3
CTR enc | 0.306 ns/B 3120 MiB/s 1.25 c/B 4098±4
CTR dec | 0.300 ns/B 3182 MiB/s 1.23 c/B 4103±6
XTS enc | 0.431 ns/B 2211 MiB/s 1.77 c/B 4107±9
XTS dec | 0.431 ns/B 2213 MiB/s 1.77 c/B 4102±6
OCB enc | 0.324 ns/B 2946 MiB/s 1.33 c/B 4096±3
OCB dec | 0.326 ns/B 2923 MiB/s 1.34 c/B 4093±2
OCB auth | 0.536 ns/B 1779 MiB/s 2.19 c/B 4089
CBC/CFB enc: 1.20x faster
CBC/CFB dec: 2.20x faster
CTR: 2.18x faster
XTS: 1.78x faster
OCB enc/dec: 2.07x faster
OCB auth: 1.24x faster
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Diffstat (limited to 'configure.ac')
-rw-r--r-- | configure.ac | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/configure.ac b/configure.ac index b55510d8..34ec058e 100644 --- a/configure.ac +++ b/configure.ac @@ -2952,6 +2952,7 @@ if test "$found" = "1" ; then GCRYPT_ASM_CIPHERS="$GCRYPT_ASM_CIPHERS sm4-aesni-avx-amd64.lo" GCRYPT_ASM_CIPHERS="$GCRYPT_ASM_CIPHERS sm4-aesni-avx2-amd64.lo" GCRYPT_ASM_CIPHERS="$GCRYPT_ASM_CIPHERS sm4-gfni-avx2-amd64.lo" + GCRYPT_ASM_CIPHERS="$GCRYPT_ASM_CIPHERS sm4-gfni-avx512-amd64.lo" ;; aarch64-*-*) # Build with the assembly implementation |