| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* build-aux/db2any: Update copyright notice.
* cipher/arcfour.c, cipher/blowfish.ccipher/cast5.c: Likewise.
* cipher/crc-armv8-ce.c, cipher/crc-intel-pclmul.c: Likewise.
* cipher/crc-ppc.c, cipher/crc.c, cipher/des.c: Likewise.
* cipher/md2.c, cipher/md4.c, cipher/md5.c: Likewise.
* cipher/primegen.c, cipher/rfc2268.c, cipher/rmd160.c: Likewise.
* cipher/seed.c, cipher/serpent.c, cipher/tiger.c: Likewise.
* cipher/twofish.c: Likewise.
* mpi/alpha/mpih-add1.S, mpi/alpha/mpih-lshift.S: Likewise.
* mpi/alpha/mpih-mul1.S, mpi/alpha/mpih-mul2.S: Likewise.
* mpi/alpha/mpih-mul3.S, mpi/alpha/mpih-rshift.S: Likewise.
* mpi/alpha/mpih-sub1.S, mpi/alpha/udiv-qrnnd.S: Likewise.
* mpi/amd64/mpih-add1.S, mpi/amd64/mpih-lshift.S: Likewise.
* mpi/amd64/mpih-mul1.S, mpi/amd64/mpih-mul2.S: Likewise.
* mpi/amd64/mpih-mul3.S, mpi/amd64/mpih-rshift.S: Likewise.
* mpi/amd64/mpih-sub1.S, mpi/config.links: Likewise.
* mpi/generic/mpih-add1.c, mpi/generic/mpih-lshift.c: Likewise.
* mpi/generic/mpih-mul1.c, mpi/generic/mpih-mul2.c: Likewise.
* mpi/generic/mpih-mul3.c, mpi/generic/mpih-rshift.c: Likewise.
* mpi/generic/mpih-sub1.c, mpi/generic/udiv-w-sdiv.c: Likewise.
* mpi/hppa/mpih-add1.S, mpi/hppa/mpih-lshift.S: Likewise.
* mpi/hppa/mpih-rshift.S, mpi/hppa/mpih-sub1.S: Likewise.
* mpi/hppa/udiv-qrnnd.S, mpi/hppa1.1/mpih-mul1.S: Likewise.
* mpi/hppa1.1/mpih-mul2.S, mpi/hppa1.1/mpih-mul3.S: Likewise.
* mpi/hppa1.1/udiv-qrnnd.S, mpi/i386/mpih-add1.S: Likewise.
* mpi/i386/mpih-lshift.S, mpi/i386/mpih-mul1.S: Likewise.
* mpi/i386/mpih-mul2.S, mpi/i386/mpih-mul3.S: Likewise.
* mpi/i386/mpih-rshift.S, mpi/i386/mpih-sub1.S: Likewise.
* mpi/i386/syntax.h, mpi/longlong.h: Likewise.
* mpi/m68k/mc68020/mpih-mul1.S, mpi/m68k/mc68020/mpih-mul2.S: Likewise.
* mpi/m68k/mc68020/mpih-mul3.S, mpi/m68k/mpih-add1.S: Likewise.
* mpi/m68k/mpih-lshift.S, mpi/m68k/mpih-rshift.S: Likewise.
* mpi/m68k/mpih-sub1.S, mpi/m68k/syntax.h: Likewise.
* mpi/mips3/mpih-add1.S, mpi/mips3/mpih-lshift.S: Likewise.
* mpi/mips3/mpih-mul1.S, mpi/mips3/mpih-mul2.S: Likewise.
* mpi/mips3/mpih-mul3.S, mpi/mips3/mpih-rshift.S: Likewise.
* mpi/mips3/mpih-sub1.S, mpi/mpi-add.c: Likewise.
* mpi/mpi-bit.c, mpi/mpi-cmp.c, mpi/mpi-div.c: Likewise.
* mpi/mpi-gcd.c, mpi/mpi-inline.c, mpi/mpi-inline.h: Likewise.
* mpi/mpi-internal.h, mpi/mpi-mpow.c, mpi/mpi-mul.c: Likewise.
* mpi/mpi-scan.c, mpi/mpih-div.c, mpi/mpih-mul.c: Likewise.
* mpi/pa7100/mpih-lshift.S, mpi/pa7100/mpih-rshift.S: Likewise.
* mpi/power/mpih-add1.S, mpi/power/mpih-lshift.S: Likewise.
* mpi/power/mpih-mul1.S, mpi/power/mpih-mul2.S: Likewise.
* mpi/power/mpih-mul3.S, mpi/power/mpih-rshift.S: Likewise.
* mpi/power/mpih-sub1.S, mpi/powerpc32/mpih-add1.S: Likewise.
* mpi/powerpc32/mpih-lshift.S, mpi/powerpc32/mpih-mul1.S: Likewise.
* mpi/powerpc32/mpih-mul2.S, mpi/powerpc32/mpih-mul3.S: Likewise.
* mpi/powerpc32/mpih-rshift.S, mpi/powerpc32/mpih-sub1.S: Likewise.
* mpi/powerpc32/syntax.h, mpi/sparc32/mpih-add1.S: Likewise.
* mpi/sparc32/mpih-lshift.S, mpi/sparc32/mpih-rshift.S: Likewise.
* mpi/sparc32/udiv.S, mpi/sparc32v8/mpih-mul1.S: Likewise.
* mpi/sparc32v8/mpih-mul2.S, mpi/sparc32v8/mpih-mul3.S: Likewise.
* mpi/supersparc/udiv.S: Likewise.
* random/random.h, random/rndegd.c: Likewise.
* src/cipher.h, src/libgcrypt.def, src/libgcrypt.vers: Likewise.
* src/missing-string.c, src/mpi.h, src/secmem.h: Likewise.
* src/stdmem.h, src/types.h: Likewise.
* tests/aeswrap.c, tests/curves.c, tests/hmac.c: Likewise.
* tests/keygrip.c, tests/prime.c, tests/random.c: Likewise.
* tests/t-kdf.c, tests/testapi.c: Likewise.
--
GnuPG-bug-id: 6271
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (cipher_setkey): Restore weak-key error-code
in case mode specific setkey returned success for the return code.
--
GnuPG-bug-id: 6451
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (cipher_setkey): Do not reset RC.
--
This reverts commit 30840c2c45d718e0fd93cfd40771fbefa50e31f5.
GnuPG-bug-id: 6451
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
| |
* cipher/cipher.c (cipher_setkey): Reset RC.
--
GnuPG-bug-id: 6451
|
|
|
|
|
|
|
|
|
|
|
| |
cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt) [USE_CHACHA20]: Conditionalize.
--
GnuPG-bug-id: 6384
Reported-by: Andrew Collier
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-ppc.c (_gcry_ghash_ppc_vpmsum): Increament
'buf' pointer right after use; Use 'for' loop for inner 4-blocks
loop to allow compiler to better optimize loop.
--
Benchmark on POWER9:
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.226 ns/B 4211 MiB/s 0.521 c/B
After:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.224 ns/B 4251 MiB/s 0.516 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am [ENABLE_O_FLAG_MUNGING]: Support -Oz.
* random/Makefile.am [ENABLE_O_FLAG_MUNGING]: Support -Oz.
--
GnuPG-bug-id: 6432
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
| |
* cipher/camellia-simd128.h (rol32_1_16): Use vpsrlb128 for uint8
right shift by 7 if available.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx2-amd64.h (IF_GFNI, IF_NOT_GFNI): New.
[CAMELLIA_GFNI_BUILD] (rol32_1_32): Add GFNI variant which uses
vgf2p8affineqb for uint8 right shift by 7.
(fls32): Load 'right shift by 7' bit-matrix on GFNI build.
[CAMELLIA_GFNI_BUILD] (.Lright_shift_by_7): New.
* cipher/camellia-gfni-avx512-amd64.S (clear_regs): Don't clear %k1.
(rol32_1_64): Use vgf2p8affineqb for uint8 right shift by 7.
(fls64): Adjust for rol32_1_64 changes.
(.Lbyte_ones): Remove.
(.Lright_shift_by_7): New.
(_gcry_camellia_gfni_avx512_ctr_enc): Clear %k1 after use.
--
Benchmark on Intel Core i3-1115G4:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.194 ns/B 4920 MiB/s 0.794 c/B 4096±4
ECB dec | 0.194 ns/B 4916 MiB/s 0.793 c/B 4089
After (~1.7% faster)
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.190 ns/B 5008 MiB/s 0.780 c/B 4096±3
ECB dec | 0.191 ns/B 5002 MiB/s 0.781 c/B 4096±3
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/mac-hmac.c (_gcry_mac_type_spec_hmac_md5): Allow in fips mode.
* cipher/md5.c (_gcry_digest_spec_md5): Allow in fips mode.
--
GnuPG-bug-id: 6376
Signed-off-by: Tobias Heider <tobias.heider@canonical.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/kdf.c (check_one): run selftests for more approved parameters
and check that wrong parameters correctly fail in FIPS mode.
--
Fixes-commit: 535a4d345872aa2cd2ab3a5f9c4411d0a0313328
GnuPG-bug-id: 5512
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/ecc.c (test_keys_fips): Replace calls to log_fatal with
return code on error.
(ecc_generate): Signal error when PCT fails in FIPS mode.
--
GnuPG-bug-id: 6397
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/ecc.c (ecc_generate): Do not allow skipping tests PCT tests
in FIPS mode.
--
The new FIPS specification requires to run the PCT without any
exceptions.
GnuPG-bug-id: 6394
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Remove.
(bcast_u32_to_vec, u32_from_vec): New.
(_gcry_aes_ppc8_setkey): Use vectors for round key calculation
variables.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'sm4-ppc.c'.
* cipher/sm4-ppc.c: New.
* cipher/sm4.c (USE_PPC_CRYPTO): New.
(SM4_context): Add 'use_ppc8le' and 'use_ppc9le'.
[USE_PPC_CRYPTO] (_gcry_sm4_ppc8le_crypt_blk1_16)
(_gcry_sm4_ppc9le_crypt_blk1_16, sm4_ppc8le_crypt_blk1_16)
(sm4_ppc9le_crypt_blk1_16): New.
(sm4_setkey) [USE_PPC_CRYPTO]: Set use_ppc8le and use_ppc9le
based on HW features.
(sm4_get_crypt_blk1_16_fn) [USE_PPC_CRYPTO]: Add PowerPC
implementation selection.
--
Benchmark on POWER9:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 14.47 ns/B 65.89 MiB/s 33.29 c/B
ECB dec | 14.47 ns/B 65.89 MiB/s 33.29 c/B
CBC enc | 35.09 ns/B 27.18 MiB/s 80.71 c/B
CBC dec | 16.69 ns/B 57.13 MiB/s 38.39 c/B
CFB enc | 35.09 ns/B 27.18 MiB/s 80.71 c/B
CFB dec | 16.76 ns/B 56.90 MiB/s 38.55 c/B
CTR enc | 16.88 ns/B 56.50 MiB/s 38.82 c/B
CTR dec | 16.88 ns/B 56.50 MiB/s 38.82 c/B
After (ECB ~4.4x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.26 ns/B 292.3 MiB/s 7.50 c/B
ECB dec | 3.26 ns/B 292.3 MiB/s 7.50 c/B
CBC enc | 35.10 ns/B 27.17 MiB/s 80.72 c/B
CBC dec | 3.33 ns/B 286.3 MiB/s 7.66 c/B
CFB enc | 35.10 ns/B 27.17 MiB/s 80.74 c/B
CFB dec | 3.36 ns/B 283.8 MiB/s 7.73 c/B
CTR enc | 3.47 ns/B 275.0 MiB/s 7.98 c/B
CTR dec | 3.47 ns/B 275.0 MiB/s 7.98 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-simd128.h (if_vpsrlb128)
(if_not_vpsrlb128): New.
(filter_8bit): Use 'vpsrlb128' when available on target
architecture (PowerPC and AArch64).
--
Benchmark on POWER9:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.26 ns/B 292.8 MiB/s 7.49 c/B
ECB dec | 3.29 ns/B 290.0 MiB/s 7.56 c/B
After (~2% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.16 ns/B 301.4 MiB/s 7.28 c/B
ECB dec | 3.19 ns/B 298.7 MiB/s 7.34 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-ppc.c (HAVE_FUNC_ATTR_TARGET): New.
(_gcry_chacha20_ppc9_blocks1, _gcry_chacha20_ppc9_blocks4)
(_gcry_chacha20_poly1305_ppc8_blocks4): Use inline functions
only if HAVE_FUNC_ATTR_TARGET is defined.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-ppc.c (chacha20_ppc_blocks1)
(chacha20_ppc_blocks4, chacha20_poly1305_ppc_blocks4): Move
'ASM_FUNC_ATTR_INLINE' right after 'static'.
* cipher/sha256-ppc.c (sha256_transform_ppc): Likewise.
* cipher/sha512-ppc.c (sha512_transform_ppc): Likewise.
--
Patch fixes these GCC warnings in PowerPC implementations:
warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'camellia-aarch64-ce.(c|o|lo)'.
(aarch64_neon_cflags): New.
* cipher/camellia-aarch64-ce.c: New.
* cipher/camellia-glue.c (USE_AARCH64_CE): New.
(CAMELLIA_context): Add 'use_aarch64ce'.
(_gcry_camellia_aarch64ce_encrypt_blk16)
(_gcry_camellia_aarch64ce_decrypt_blk16)
(_gcry_camellia_aarch64ce_keygen, camellia_aarch64ce_enc_blk16)
(camellia_aarch64ce_dec_blk16, aarch64ce_burn_stack_depth): New.
(camellia_setkey) [USE_AARCH64_CE]: Set use_aarch64ce if HW has
HWF_ARM_AES; Use AArch64/CE key generation if supported by HW.
(camellia_encrypt_blk1_32, camellia_decrypt_blk1_32)
[USE_AARCH64_CE]: Add AArch64/CE code path.
--
Patch enables 128-bit vector instrinsics implementation of Camellia
cipher for AArch64.
Benchmark on AWS Graviton2:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 5.99 ns/B 159.2 MiB/s 14.97 c/B 2500
ECB dec | 5.99 ns/B 159.1 MiB/s 14.98 c/B 2500
CBC enc | 6.16 ns/B 154.7 MiB/s 15.41 c/B 2500
CBC dec | 6.12 ns/B 155.8 MiB/s 15.29 c/B 2499
CFB enc | 6.49 ns/B 147.0 MiB/s 16.21 c/B 2500
CFB dec | 6.05 ns/B 157.6 MiB/s 15.13 c/B 2500
CTR enc | 6.09 ns/B 156.7 MiB/s 15.22 c/B 2500
CTR dec | 6.09 ns/B 156.6 MiB/s 15.22 c/B 2500
XTS enc | 6.16 ns/B 154.9 MiB/s 15.39 c/B 2500
XTS dec | 6.16 ns/B 154.8 MiB/s 15.40 c/B 2499
GCM enc | 6.31 ns/B 151.1 MiB/s 15.78 c/B 2500
GCM dec | 6.31 ns/B 151.1 MiB/s 15.78 c/B 2500
GCM auth | 0.206 ns/B 4635 MiB/s 0.514 c/B 2500
OCB enc | 6.63 ns/B 143.9 MiB/s 16.57 c/B 2499
OCB dec | 6.63 ns/B 143.9 MiB/s 16.56 c/B 2499
OCB auth | 6.55 ns/B 145.7 MiB/s 16.37 c/B 2499
After (ecb ~2.1x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.77 ns/B 344.2 MiB/s 6.93 c/B 2499
ECB dec | 2.76 ns/B 345.3 MiB/s 6.90 c/B 2499
CBC enc | 6.17 ns/B 154.7 MiB/s 15.41 c/B 2499
CBC dec | 2.89 ns/B 330.3 MiB/s 7.22 c/B 2500
CFB enc | 6.48 ns/B 147.1 MiB/s 16.21 c/B 2499
CFB dec | 2.84 ns/B 336.1 MiB/s 7.09 c/B 2499
CTR enc | 2.90 ns/B 328.8 MiB/s 7.25 c/B 2499
CTR dec | 2.90 ns/B 328.9 MiB/s 7.25 c/B 2500
XTS enc | 2.93 ns/B 325.3 MiB/s 7.33 c/B 2500
XTS dec | 2.92 ns/B 326.2 MiB/s 7.31 c/B 2500
GCM enc | 3.10 ns/B 307.2 MiB/s 7.76 c/B 2500
GCM dec | 3.10 ns/B 307.2 MiB/s 7.76 c/B 2499
GCM auth | 0.206 ns/B 4635 MiB/s 0.514 c/B 2500
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'camellia-simd128.h',
'camellia-ppc8le.c' and 'camellia-ppc9le.c'.
* cipher/camellia-glue.c (USE_PPC_CRYPTO): New.
(CAMELLIA_context) [USE_PPC_CRYPTO]: Add 'use_ppc', 'use_ppc8'
and 'use_ppc9'.
[USE_PPC_CRYPTO] (_gcry_camellia_ppc8_encrypt_blk16)
(_gcry_camellia_ppc8_decrypt_blk16, _gcry_camellia_ppc8_keygen)
(_gcry_camellia_ppc9_encrypt_blk16)
(_gcry_camellia_ppc9_decrypt_blk16, _gcry_camellia_ppc9_keygen)
(camellia_ppc_enc_blk16, camellia_ppc_dec_blk16)
(ppc_burn_stack_depth): New.
(camellia_setkey) [USE_PPC_CRYPTO]: Setup 'use_ppc', 'use_ppc8'
and 'use_ppc9' and use PPC key-generation if HWF is available.
(camellia_encrypt_blk1_32)
(camellia_decrypt_blk1_32) [USE_PPC_CRYPTO]: Add 'use_ppc' paths.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Enable
generic bulk path when USE_PPC_CRYPTO is defined.
* cipher/camellia-ppc8le.c: New.
* cipher/camellia-ppc9le.c: New.
* cipher/camellia-simd128.h: New.
* configure.ac: Add 'camellia-ppc8le.lo' and 'camellia-ppc9le.lo'.
--
Patch adds 128-bit vector instrinsics implementation of Camellia
cipher and enables implementation for POWER8 and POWER9.
Benchmark on POWER9:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 13.45 ns/B 70.90 MiB/s 30.94 c/B
ECB dec | 13.45 ns/B 70.92 MiB/s 30.93 c/B
CBC enc | 15.22 ns/B 62.66 MiB/s 35.00 c/B
CBC dec | 13.54 ns/B 70.41 MiB/s 31.15 c/B
CFB enc | 15.24 ns/B 62.59 MiB/s 35.04 c/B
CFB dec | 13.53 ns/B 70.48 MiB/s 31.12 c/B
CTR enc | 13.60 ns/B 70.15 MiB/s 31.27 c/B
CTR dec | 13.62 ns/B 70.02 MiB/s 31.33 c/B
XTS enc | 13.67 ns/B 69.74 MiB/s 31.45 c/B
XTS dec | 13.74 ns/B 69.41 MiB/s 31.60 c/B
GCM enc | 18.18 ns/B 52.45 MiB/s 41.82 c/B
GCM dec | 17.76 ns/B 53.69 MiB/s 40.86 c/B
GCM auth | 4.12 ns/B 231.7 MiB/s 9.47 c/B
OCB enc | 14.40 ns/B 66.22 MiB/s 33.12 c/B
OCB dec | 14.40 ns/B 66.23 MiB/s 33.12 c/B
OCB auth | 14.37 ns/B 66.37 MiB/s 33.05 c/B
After (ECB ~4.1x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.25 ns/B 293.7 MiB/s 7.47 c/B
ECB dec | 3.25 ns/B 293.4 MiB/s 7.48 c/B
CBC enc | 15.22 ns/B 62.68 MiB/s 35.00 c/B
CBC dec | 3.36 ns/B 284.1 MiB/s 7.72 c/B
CFB enc | 15.25 ns/B 62.55 MiB/s 35.07 c/B
CFB dec | 3.36 ns/B 284.0 MiB/s 7.72 c/B
CTR enc | 3.47 ns/B 275.1 MiB/s 7.97 c/B
CTR dec | 3.47 ns/B 275.1 MiB/s 7.97 c/B
XTS enc | 3.54 ns/B 269.0 MiB/s 8.15 c/B
XTS dec | 3.54 ns/B 269.6 MiB/s 8.14 c/B
GCM enc | 3.69 ns/B 258.2 MiB/s 8.49 c/B
GCM dec | 3.69 ns/B 258.2 MiB/s 8.50 c/B
GCM auth | 0.226 ns/B 4220 MiB/s 0.520 c/B
OCB enc | 3.81 ns/B 250.2 MiB/s 8.77 c/B
OCB dec | 4.08 ns/B 233.8 MiB/s 9.38 c/B
OCB auth | 3.53 ns/B 270.0 MiB/s 8.12 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_xts_crypt_amd64): On fast exit path, compare
number of blocks left against '1' instead of '0' as following
branch is 'less than'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_clang_attribute_ppc_target): New.
* cipher/chacha20-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
* cipher/rijndael-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FPC_OPT_ATTR): New.
* cipher/rijndael-ppc9le.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FPC_OPT_ATTR): New.
* cipher/sha256-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
* cipher/sha512-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
(ror64): Remove unused function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-ppc.c (_gcry_chacha20_ppc8_blocks1): Rename to...
(chacha20_ppc_blocks1): ...this; Add 'always inline' attribute.
(_gcry_chacha20_ppc8_blocks4): Rename to...
(chacha20_ppc_blocks4): ...this; Add 'always inline' attribute.
(_gcry_chacha20_poly1305_ppc8_blocks4): Rename to...
(chacha20_poly1305_ppc_blocks4): ...this; Add 'always inline'
attribute.
(FUNC_ATTR_OPT_O2, FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
(_gcry_chacha20_ppc8_blocks1, _gcry_chacha20_ppc8_blocks4)
(_gcry_chacha20_poly1305_ppc8_blocks4): New.
(_gcry_chacha20_ppc9_blocks1, _gcry_chacha20_ppc9_blocks4)
(_gcry_chacha20_poly1305_ppc9_blocks4): New.
* cipher/chacha20.c (CHACHA20_context_t): Add 'use_p9'.
(_gcry_chacha20_ppc9_blocks1, _gcry_chacha20_ppc9_blocks4)
(_gcry_chacha20_poly1305_ppc9_blocks4): New.
(chacha20_do_setkey): Set 'use_p9' if HW has HWF_PPC_ARCH_3_00.
(chacha20_blocks, do_chacha20_encrypt_stream_tail)
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt) [USE_PPC_VEC]: Add 'use_p9' paths.
--
This change makes sure that chacha20-ppc gets compiled
with proper optimization level and right target setting.
Benchmark on POWER9:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.11 ns/B 856.0 MiB/s 2.56 c/B
STREAM dec | 1.11 ns/B 856.0 MiB/s 2.56 c/B
POLY1305 enc | 1.57 ns/B 606.2 MiB/s 3.62 c/B
POLY1305 dec | 1.56 ns/B 610.4 MiB/s 3.59 c/B
POLY1305 auth | 0.876 ns/B 1089 MiB/s 2.02 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h: Add PPC_OPT_ATTR attribute
macro for all functions.
* cipher/rijndael-ppc.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New.
(_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption): Add
PPC_OPT_ATTR attribute macro.
* cipher/rijndael-ppc9le.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New.
--
This change makes sure that PPC accelerated AES gets compiled
with proper optimization level and right target setting.
Benchmark on POWER9:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.305 ns/B 3129 MiB/s 0.701 c/B
ECB dec | 0.305 ns/B 3127 MiB/s 0.701 c/B
CBC enc | 1.66 ns/B 575.3 MiB/s 3.81 c/B
CBC dec | 0.318 ns/B 2997 MiB/s 0.732 c/B
CFB enc | 1.66 ns/B 574.7 MiB/s 3.82 c/B
CFB dec | 0.319 ns/B 2987 MiB/s 0.734 c/B
OFB enc | 2.15 ns/B 443.4 MiB/s 4.95 c/B
OFB dec | 2.15 ns/B 443.3 MiB/s 4.95 c/B
CTR enc | 0.328 ns/B 2907 MiB/s 0.754 c/B
CTR dec | 0.328 ns/B 2906 MiB/s 0.755 c/B
XTS enc | 0.516 ns/B 1849 MiB/s 1.19 c/B
XTS dec | 0.515 ns/B 1850 MiB/s 1.19 c/B
CCM enc | 1.98 ns/B 480.6 MiB/s 4.56 c/B
CCM dec | 1.98 ns/B 480.5 MiB/s 4.56 c/B
CCM auth | 1.66 ns/B 574.9 MiB/s 3.82 c/B
EAX enc | 1.99 ns/B 480.2 MiB/s 4.57 c/B
EAX dec | 1.99 ns/B 480.2 MiB/s 4.57 c/B
EAX auth | 1.66 ns/B 575.2 MiB/s 3.81 c/B
GCM enc | 0.552 ns/B 1727 MiB/s 1.27 c/B
GCM dec | 0.552 ns/B 1728 MiB/s 1.27 c/B
GCM auth | 0.225 ns/B 4240 MiB/s 0.517 c/B
OCB enc | 0.381 ns/B 2504 MiB/s 0.876 c/B
OCB dec | 0.385 ns/B 2477 MiB/s 0.886 c/B
OCB auth | 0.356 ns/B 2682 MiB/s 0.818 c/B
SIV enc | 1.98 ns/B 480.9 MiB/s 4.56 c/B
SIV dec | 2.11 ns/B 452.9 MiB/s 4.84 c/B
SIV auth | 1.66 ns/B 575.4 MiB/s 3.81 c/B
GCM-SIV enc | 0.726 ns/B 1314 MiB/s 1.67 c/B
GCM-SIV dec | 0.843 ns/B 1131 MiB/s 1.94 c/B
GCM-SIV auth | 0.377 ns/B 2527 MiB/s 0.868 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h (CTR32LE_ENC_FUNC): New.
* cipher/rijndael-ppc.c (_gcry_aes_ppc8_ctr32le_enc): New.
* cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ctr32le_enc): New.
* cipher/rijndael.c (_gcry_aes_ppc8_ctr32le_enc)
(_gcry_aes_ppc9le_ctr32le_enc): New.
(do_setkey): Setup _gcry_aes_ppc8_ctr32le_enc for POWER8 and
_gcry_aes_ppc9le_ctr32le_enc for POWER9.
--
Benchmark on POWER9:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM-SIV enc | 1.42 ns/B 672.2 MiB/s 3.26 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM-SIV enc | 0.725 ns/B 1316 MiB/s 1.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h (ECB_CRYPT_FUNC): New.
* cipher/rijndael-ppc.c (_gcry_aes_ppc8_ecb_crypt): New.
* cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_ppc8_ecb_crypt)
(_gcry_aes_ppc9le_ecb_crypt): New.
(do_setkey): Set up _gcry_aes_ppc8_ecb_crypt for POWER8 and
_gcry_aes_ppc9le_ecb_crypt for POWER9.
--
Benchmark on POWER9:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.875 ns/B 1090 MiB/s 2.01 c/B
ECB dec | 1.06 ns/B 899.8 MiB/s 2.44 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.305 ns/B 3126 MiB/s 0.702 c/B
ECB dec | 0.305 ns/B 3126 MiB/s 0.702 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha256-ppc.c: Change to use vector registers, generate
POWER8 and POWER9 from same code with help of 'target' and
'optimize' attribute.
* cipher/sha512-ppc.c: Likewise.
* configure.ac (gcry_cv_gcc_attribute_optimize)
(gcry_cv_gcc_attribute_ppc_target): New.
--
Benchmark on POWER9:
Before:
| nanosecs/byte mebibytes/sec cycles/byte
SHA256 | 5.22 ns/B 182.8 MiB/s 12.00 c/B
SHA512 | 3.53 ns/B 269.9 MiB/s 8.13 c/B
After (sha256 ~12% faster, sha512 ~19% faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHA256 | 4.65 ns/B 204.9 MiB/s 10.71 c/B
SHA512 | 2.97 ns/B 321.1 MiB/s 6.83 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx2-amd64.h (roundsm16, fls16): Broadcast
round key bytes directly with 'vpshufb'.
--
Benchmark on AMD Ryzen 9 7900X (turbo-freq off):
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.837 ns/B 1139 MiB/s 3.94 c/B 4700
ECB dec | 0.839 ns/B 1137 MiB/s 3.94 c/B 4700
After (~3% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.808 ns/B 1180 MiB/s 3.80 c/B 4700
ECB dec | 0.810 ns/B 1177 MiB/s 3.81 c/B 4700
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx2-amd64.h (roundsm32, fls32): Use
'vpbroadcastb' for loading round key.
* cipher/camellia-glue.c (camellia_encrypt_blk1_32)
(camellia_decrypt_blk1_32): Adjust num_blks thresholds for AVX2
implementations, 2 blks for GFNI, 4 blks for VAES and 5 blks for AESNI.
--
Benchmark on AMD Ryzen 9 7900X (turbo-freq off):
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.213 ns/B 4469 MiB/s 1.00 c/B 4700
ECB dec | 0.215 ns/B 4440 MiB/s 1.01 c/B 4700
After (~10% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.194 ns/B 4919 MiB/s 0.911 c/B 4700
ECB dec | 0.195 ns/B 4896 MiB/s 0.916 c/B 4700
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-gfni-avx512-amd64.S (roundsm64, fls64): Use
'vpbroadcastb' for loading round key.
--
Benchmark on AMD Ryzen 9 7900X (turbo-freq off):
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.173 ns/B 5514 MiB/s 0.813 c/B 4700
ECB dec | 0.176 ns/B 5432 MiB/s 0.825 c/B 4700
After (~13% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.152 ns/B 6267 MiB/s 0.715 c/B 4700
ECB dec | 0.155 ns/B 6170 MiB/s 0.726 c/B 4700
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx2-amd64.h (enc_blk1_32, dec_blk1_32): Add
fast path for 32 block input.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx-amd64.S
(_gcry_camellia_aesni_avx_ctr_enc): Add byte addition fast-path.
* cipher/camellia-aesni-avx2-amd64.h (ctr_enc): Likewise.
* cipher/camellia-gfni-avx512-amd64.S
(_gcry_camellia_gfni_avx512_ctr_enc): Likewise.
* cipher/camellia-glue.c (CAMELLIA_context): Add 'use_avx2'.
(camellia_setkey, _gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec)
(_gcry_camellia_cfb_dec, _gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth) [USE_AESNI_AVX2]: Use 'use_avx2' to check
if any of the AVX2 implementations is enabled.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx-amd64.S (_gcry_camellia_aesni_avx_ecb_enc)
(_gcry_camellia_aesni_avx_ecb_dec): New.
* cipher/camellia-glue.c (_gcry_camellia_aesni_avx_ecb_enc)
(_gcry_camellia_aesni_avx_ecb_dec): New.
(camellia_setkey): Always enable XTS/ECB/CTR32LE bulk functions.
(camellia_encrypt_blk1_32, camellia_decrypt_blk1_32)
[USE_AESNI_AVX]: Add AESNI/AVX code-path.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sm4-aesni-avx-amd64.S
(_gcry_sm4_aesni_avx_ctr_enc): Add byte addition fast-path.
* cipher/sm4-aesni-avx2-amd64.S
(_gcry_sm4_aesni_avx2_ctr_enc): Likewise.
* cipher/sm4-gfni-avx2-amd64.S
(_gcry_sm4_gfni_avx2_ctr_enc): Likewise.
* cipher/sm4-gfni-avx512-amd64.S
(_gcry_sm4_gfni_avx512_ctr_enc)
(_gcry_sm4_gfni_avx512_ctr_enc_blk32): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ctr_enc_amd64): Add handling for the case when
only main counter needs carry handling but generated vector counters
do not.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/aria-aesni-avx2-amd64.S (CONFIG_AS_VAES): New.
[CONFIG_AS_VAES]: Add VAES accelerated assembly macros and functions.
* cipher/aria.c (USE_VAES_AVX2): New.
(ARIA_context): Add 'use_vaes_avx2'.
(_gcry_aria_vaes_avx2_ecb_crypt_blk32)
(_gcry_aria_vaes_avx2_ctr_crypt_blk32)
(aria_avx2_ecb_crypt_blk32, aria_avx2_ctr_crypt_blk32): Add VAES/AVX2
code paths.
(aria_setkey): Enable VAES/AVX2 implementation based on HW features.
--
This patch adds VAES/AVX2 accelerated ARIA block cipher implementation.
VAES instruction set extends AESNI instructions to work on all 128-bit
lanes of 256-bit YMM and 512-bit ZMM vector registers, thus AES
operations can be executed directly on YMM registers without needing
to manually split YMM to two XMM halfs for AESNI instructions.
This improves performance on CPUs that support VAES but not GFNI, like
AMD Zen3.
Benchmark on Ryzen 7 5800X (zen3, turbo-freq off):
Before (AESNI/AVX2):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.559 ns/B 1707 MiB/s 2.12 c/B 3800
ECB dec | 0.560 ns/B 1703 MiB/s 2.13 c/B 3800
CTR enc | 0.570 ns/B 1672 MiB/s 2.17 c/B 3800
CTR dec | 0.568 ns/B 1679 MiB/s 2.16 c/B 3800
After (VAES/AVX2, ~33% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.435 ns/B 2193 MiB/s 1.65 c/B 3800
ECB dec | 0.434 ns/B 2197 MiB/s 1.65 c/B 3800
CTR enc | 0.413 ns/B 2306 MiB/s 1.57 c/B 3800
CTR dec | 0.411 ns/B 2318 MiB/s 1.56 c/B 3800
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/aria-gfni-avx512-amd64.S (aria_diff_m): Use 'vpternlogq' for
3-way XOR operation.
---
Using vpternlogq gives small performance improvement on AMD Zen4. With
Intel tiger-lake speed is the same as before.
Benchmark on AMD Ryzen 9 7900X (zen4, turbo-freq off):
Before:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.203 ns/B 4703 MiB/s 0.953 c/B 4700
ECB dec | 0.204 ns/B 4675 MiB/s 0.959 c/B 4700
CTR enc | 0.207 ns/B 4609 MiB/s 0.973 c/B 4700
CTR dec | 0.207 ns/B 4608 MiB/s 0.973 c/B 4700
After (~3% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.197 ns/B 4847 MiB/s 0.925 c/B 4700
ECB dec | 0.197 ns/B 4852 MiB/s 0.924 c/B 4700
CTR enc | 0.200 ns/B 4759 MiB/s 0.942 c/B 4700
CTR dec | 0.200 ns/B 4772 MiB/s 0.939 c/B 4700
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/aria-aesni-avx-amd64.S (aria_ark_8way): Use 'vmovd' for
loading key material and 'vpshufb' for broadcasting from byte
locations 3, 2, 1 and 0.
--
Benchmark on AMD Ryzen 9 7900X (zen4, turbo-freq off):
Before (GFNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.516 ns/B 1847 MiB/s 2.43 c/B 4700
ECB dec | 0.519 ns/B 1839 MiB/s 2.44 c/B 4700
CTR enc | 0.517 ns/B 1846 MiB/s 2.43 c/B 4700
CTR dec | 0.518 ns/B 1843 MiB/s 2.43 c/B 4700
After (GFNI/AVX, ~5% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.490 ns/B 1947 MiB/s 2.30 c/B 4700
ECB dec | 0.490 ns/B 1946 MiB/s 2.30 c/B 4700
CTR enc | 0.493 ns/B 1935 MiB/s 2.32 c/B 4700
CTR dec | 0.493 ns/B 1934 MiB/s 2.32 c/B 4700
===
Benchmark on Intel Core i3-1115G4 (tiger-lake, turbo-freq off):
Before (GFNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.967 ns/B 986.6 MiB/s 2.89 c/B 2992
ECB dec | 0.966 ns/B 987.1 MiB/s 2.89 c/B 2992
CTR enc | 0.972 ns/B 980.8 MiB/s 2.91 c/B 2993
CTR dec | 0.971 ns/B 982.5 MiB/s 2.90 c/B 2993
After (GFNI/AVX, ~6% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.908 ns/B 1050 MiB/s 2.72 c/B 2992
ECB dec | 0.903 ns/B 1056 MiB/s 2.70 c/B 2992
CTR enc | 0.913 ns/B 1045 MiB/s 2.73 c/B 2992
CTR dec | 0.910 ns/B 1048 MiB/s 2.72 c/B 2992
===
Benchmark on AMD Ryzen 7 5800X (zen3, turbo-freq off):
Before (AESNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.921 ns/B 1035 MiB/s 3.50 c/B 3800
ECB dec | 0.922 ns/B 1034 MiB/s 3.50 c/B 3800
CTR enc | 0.923 ns/B 1033 MiB/s 3.51 c/B 3800
CTR dec | 0.923 ns/B 1033 MiB/s 3.51 c/B 3800
After (AESNI/AVX, ~6% faster)
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.862 ns/B 1106 MiB/s 3.28 c/B 3800
ECB dec | 0.862 ns/B 1106 MiB/s 3.28 c/B 3800
CTR enc | 0.865 ns/B 1102 MiB/s 3.29 c/B 3800
CTR dec | 0.865 ns/B 1103 MiB/s 3.29 c/B 3800
===
Benchmark on AMD EPYC 7642 (zen2):
Before (AESNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.22 ns/B 784.5 MiB/s 4.01 c/B 3298
ECB dec | 1.22 ns/B 784.8 MiB/s 4.00 c/B 3292
CTR enc | 1.22 ns/B 780.1 MiB/s 4.03 c/B 3299
CTR dec | 1.22 ns/B 779.1 MiB/s 4.04 c/B 3299
After (AESNI/AVX, ~13% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.07 ns/B 888.3 MiB/s 3.54 c/B 3299
ECB dec | 1.08 ns/B 885.3 MiB/s 3.55 c/B 3299
CTR enc | 1.07 ns/B 888.7 MiB/s 3.54 c/B 3298
CTR dec | 1.07 ns/B 887.4 MiB/s 3.55 c/B 3299
===
Benchmark on Intel Core i5-6500 (skylake):
Before (AESNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.24 ns/B 766.6 MiB/s 4.48 c/B 3598
ECB dec | 1.25 ns/B 764.9 MiB/s 4.49 c/B 3598
CTR enc | 1.25 ns/B 761.7 MiB/s 4.50 c/B 3598
CTR dec | 1.25 ns/B 761.6 MiB/s 4.51 c/B 3598
After (AESNI/AVX, ~2% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.22 ns/B 780.0 MiB/s 4.40 c/B 3598
ECB dec | 1.22 ns/B 779.6 MiB/s 4.40 c/B 3598
CTR enc | 1.23 ns/B 776.6 MiB/s 4.42 c/B 3598
CTR dec | 1.23 ns/B 776.6 MiB/s 4.42 c/B 3598
===
Benchmark on Intel Core i5-2450M (sandy-bridge, turbo-freq off):
Before (AESNI/AVX):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.11 ns/B 452.7 MiB/s 5.25 c/B 2494
ECB dec | 2.10 ns/B 454.5 MiB/s 5.23 c/B 2494
CTR enc | 2.10 ns/B 453.2 MiB/s 5.25 c/B 2494
CTR dec | 2.10 ns/B 453.2 MiB/s 5.25 c/B 2494
After (AESNI/AVX, ~4% faster)
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.00 ns/B 475.8 MiB/s 5.00 c/B 2494
ECB dec | 2.00 ns/B 476.4 MiB/s 4.99 c/B 2494
CTR enc | 2.01 ns/B 474.7 MiB/s 5.01 c/B 2494
CTR dec | 2.01 ns/B 473.9 MiB/s 5.02 c/B 2494
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'aria-gfni-avx512-amd64.S'.
* cipher/aria-gfni-avx512-amd64.S: New.
* cipher/aria.c (USE_GFNI_AVX512): New.
[USE_GFNI_AVX512] (MAX_PARALLEL_BLKS): New.
(ARIA_context): Add 'use_gfni_avx512'.
(_gcry_aria_gfni_avx512_ecb_crypt_blk64)
(_gcry_aria_gfni_avx512_ctr_crypt_blk64)
(aria_gfni_avx512_ecb_crypt_blk64)
(aria_gfni_avx512_ctr_crypt_blk64): New.
(aria_crypt_blocks) [USE_GFNI_AVX512]: Add 64 parallel block
AVX512/GFNI processing.
(_gcry_aria_ctr_enc) [USE_GFNI_AVX512]: Add 64 parallel block
AVX512/GFNI processing.
(aria_setkey): Enable GFNI/AVX512 based on HW features.
* configure.ac: Add 'aria-gfni-avx512-amd64.lo'.
--
This patch adds AVX512/GFNI accelerated ARIA block cipher
implementation for libgcrypt. This implementation is based on
work by Taehee Yoo, with following notable changes:
- Integration to libgcrypt, use of 'aes-common-amd64.h'.
- Use round loop instead of unrolling for smaller code size and
increased performance.
- Use stack for temporary storage instead of external buffers.
- Add byte-addition fast path for CTR.
===
Benchmark on AMD Ryzen 9 7900X (zen4, turbo-freq off):
GFNI/AVX512:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.203 ns/B 4703 MiB/s 0.953 c/B 4700
ECB dec | 0.204 ns/B 4675 MiB/s 0.959 c/B 4700
CTR enc | 0.207 ns/B 4609 MiB/s 0.973 c/B 4700
CTR dec | 0.207 ns/B 4608 MiB/s 0.973 c/B 4700
===
Benchmark on Intel Core i3-1115G4 (tiger-lake, turbo-freq off):
GFNI/AVX512:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.362 ns/B 2635 MiB/s 1.08 c/B 2992
ECB dec | 0.361 ns/B 2639 MiB/s 1.08 c/B 2992
CTR enc | 0.362 ns/B 2633 MiB/s 1.08 c/B 2992
CTR dec | 0.362 ns/B 2633 MiB/s 1.08 c/B 2992
[v2]:
- Add byte-addition fast path for CTR.
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'aria-aesni-avx-amd64.S' and
'aria-aesni-avx2-amd64.S'.
* cipher/aria-aesni-avx-amd64.S: New.
* cipher/aria-aesni-avx2-amd64.S: New.
* cipher/aria.c (USE_AESNI_AVX, USE_GFNI_AVX, USE_AESNI_AVX2)
(USE_GFNI_AVX2, MAX_PARALLEL_BLKS, ASM_FUNC_ABI, ASM_EXTRA_STACK): New.
(ARIA_context): Add 'use_aesni_avx', 'use_gfni_avx',
'use_aesni_avx2' and 'use_gfni_avx2'.
(_gcry_aria_aesni_avx_ecb_crypt_blk1_16)
(_gcry_aria_aesni_avx_ctr_crypt_blk16)
(_gcry_aria_gfni_avx_ecb_crypt_blk1_16)
(_gcry_aria_gfni_avx_ctr_crypt_blk16)
(aria_avx_ecb_crypt_blk1_16, aria_avx_ctr_crypt_blk16)
(_gcry_aria_aesni_avx2_ecb_crypt_blk32)
(_gcry_aria_aesni_avx2_ctr_crypt_blk32)
(_gcry_aria_gfni_avx2_ecb_crypt_blk32)
(_gcry_aria_gfni_avx2_ctr_crypt_blk32)
(aria_avx2_ecb_crypt_blk32, aria_avx2_ctr_crypt_blk32): New.
(aria_crypt_blocks) [USE_AESNI_AVX2]: Add 32 parallel block
AVX2/AESNI/GFNI processing.
(aria_crypt_blocks) [USE_AESNI_AVX]: Add 3 to 16 parallel block
AVX/AESNI/GFNI processing.
(_gcry_aria_ctr_enc) [USE_AESNI_AVX2]: Add 32 parallel block
AVX2/AESNI/GFNI processing.
(_gcry_aria_ctr_enc) [USE_AESNI_AVX]: Add 16 parallel block
AVX/AESNI/GFNI processing.
(_gcry_aria_ctr_enc, _gcry_aria_cbc_dec, _gcry_aria_cfb_enc)
(_gcry_aria_ecb_crypt, _gcry_aria_xts_crypt, _gcry_aria_ctr32le_enc)
(_gcry_aria_ocb_crypt, _gcry_aria_ocb_auth): Use MAX_PARALLEL_BLKS
for parallel processing width.
(aria_setkey): Enable AESNI/AVX, GFNI/AVX, AESNI/AVX2, GFNI/AVX2 based
on HW features.
* configure.ac: Add 'aria-aesni-avx-amd64.lo' and
'aria-aesni-avx2-amd64.lo'.
---
This patch adds AVX/AVX2/AESNI/GFNI accelerated ARIA block cipher
implementations for libgcrypt. This implementation is based on work
by Taehee Yoo, with following notable changes:
- Integration to libgcrypt, use of 'aes-common-amd64.h'.
- Use 'vmovddup' for loading GFNI constants.
- Use round loop instead of unrolling for smaller code size and
increased performance.
- Use stack for temporary storage instead of external buffers.
- Use merge ECB encryption/decryption to single function.
- Add 1 to 15 blocks support for AVX ECB functions.
- Add byte-addition fast path for CTR.
===
Benchmark on AMD Ryzen 9 7900X (zen4, turbo-freq off):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.715 ns/B 1333 MiB/s 3.36 c/B 4700
ECB dec | 0.712 ns/B 1339 MiB/s 3.35 c/B 4700
CTR enc | 0.714 ns/B 1336 MiB/s 3.36 c/B 4700
CTR dec | 0.714 ns/B 1335 MiB/s 3.36 c/B 4700
GFNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.516 ns/B 1847 MiB/s 2.43 c/B 4700
ECB dec | 0.519 ns/B 1839 MiB/s 2.44 c/B 4700
CTR enc | 0.517 ns/B 1846 MiB/s 2.43 c/B 4700
CTR dec | 0.518 ns/B 1843 MiB/s 2.43 c/B 4700
AESNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.416 ns/B 2292 MiB/s 1.96 c/B 4700
ECB dec | 0.421 ns/B 2266 MiB/s 1.98 c/B 4700
CTR enc | 0.415 ns/B 2298 MiB/s 1.95 c/B 4700
CTR dec | 0.415 ns/B 2300 MiB/s 1.95 c/B 4700
GFNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.235 ns/B 4056 MiB/s 1.11 c/B 4700
ECB dec | 0.234 ns/B 4079 MiB/s 1.10 c/B 4700
CTR enc | 0.232 ns/B 4104 MiB/s 1.09 c/B 4700
CTR dec | 0.233 ns/B 4094 MiB/s 1.10 c/B 4700
===
Benchmark on Intel Core i3-1115G4 (tiger-lake, turbo-freq off):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.26 ns/B 757.6 MiB/s 3.77 c/B 2993
ECB dec | 1.27 ns/B 753.1 MiB/s 3.79 c/B 2992
CTR enc | 1.25 ns/B 760.3 MiB/s 3.75 c/B 2992
CTR dec | 1.26 ns/B 759.1 MiB/s 3.76 c/B 2992
GFNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.967 ns/B 986.6 MiB/s 2.89 c/B 2992
ECB dec | 0.966 ns/B 987.1 MiB/s 2.89 c/B 2992
CTR enc | 0.972 ns/B 980.8 MiB/s 2.91 c/B 2993
CTR dec | 0.971 ns/B 982.5 MiB/s 2.90 c/B 2993
AESNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.817 ns/B 1167 MiB/s 2.44 c/B 2992
ECB dec | 0.819 ns/B 1164 MiB/s 2.45 c/B 2992
CTR enc | 0.819 ns/B 1164 MiB/s 2.45 c/B 2992
CTR dec | 0.819 ns/B 1164 MiB/s 2.45 c/B 2992
GFNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.506 ns/B 1886 MiB/s 1.51 c/B 2992
ECB dec | 0.505 ns/B 1887 MiB/s 1.51 c/B 2992
CTR enc | 0.564 ns/B 1691 MiB/s 1.69 c/B 2992
CTR dec | 0.565 ns/B 1689 MiB/s 1.69 c/B 2992
===
Benchmark on AMD Ryzen 7 5800X (zen3, turbo-freq off):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.921 ns/B 1035 MiB/s 3.50 c/B 3800
ECB dec | 0.922 ns/B 1034 MiB/s 3.50 c/B 3800
CTR enc | 0.923 ns/B 1033 MiB/s 3.51 c/B 3800
CTR dec | 0.923 ns/B 1033 MiB/s 3.51 c/B 3800
AESNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.559 ns/B 1707 MiB/s 2.12 c/B 3800
ECB dec | 0.560 ns/B 1703 MiB/s 2.13 c/B 3800
CTR enc | 0.570 ns/B 1672 MiB/s 2.17 c/B 3800
CTR dec | 0.568 ns/B 1679 MiB/s 2.16 c/B 3800
===
Benchmark on AMD EPYC 7642 (zen2):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.22 ns/B 784.5 MiB/s 4.01 c/B 3298
ECB dec | 1.22 ns/B 784.8 MiB/s 4.00 c/B 3292
CTR enc | 1.22 ns/B 780.1 MiB/s 4.03 c/B 3299
CTR dec | 1.22 ns/B 779.1 MiB/s 4.04 c/B 3299
AESNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.735 ns/B 1298 MiB/s 2.42 c/B 3299
ECB dec | 0.738 ns/B 1292 MiB/s 2.44 c/B 3299
CTR enc | 0.732 ns/B 1303 MiB/s 2.41 c/B 3299
CTR dec | 0.732 ns/B 1303 MiB/s 2.41 c/B 3299
===
Benchmark on Intel Core i5-6500 (skylake):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.24 ns/B 766.6 MiB/s 4.48 c/B 3598
ECB dec | 1.25 ns/B 764.9 MiB/s 4.49 c/B 3598
CTR enc | 1.25 ns/B 761.7 MiB/s 4.50 c/B 3598
CTR dec | 1.25 ns/B 761.6 MiB/s 4.51 c/B 3598
AESNI/AVX2:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.829 ns/B 1150 MiB/s 2.98 c/B 3599
ECB dec | 0.831 ns/B 1147 MiB/s 2.99 c/B 3598
CTR enc | 0.829 ns/B 1150 MiB/s 2.98 c/B 3598
CTR dec | 0.828 ns/B 1152 MiB/s 2.98 c/B 3598
===
Benchmark on Intel Core i5-2450M (sandy-bridge, turbo-freq off):
AESNI/AVX:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.11 ns/B 452.7 MiB/s 5.25 c/B 2494
ECB dec | 2.10 ns/B 454.5 MiB/s 5.23 c/B 2494
CTR enc | 2.10 ns/B 453.2 MiB/s 5.25 c/B 2494
CTR dec | 2.10 ns/B 453.2 MiB/s 5.25 c/B 2494
[v2]
- Optimization for CTR mode: Use CTR byte-addition path when
counter carry-overflow happen only on ctr-variable but not in
generated counter vector registers.
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (SECTION_RODATA): Use .rdata for
_WIN32.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aarch64.S: Align functions to 16 bytes.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/rijndael-armv8-aarch64-ce.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sha512-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
* cipher/sm4-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-armv9-aarch64-sve-ce.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
* mpi/aarch64/mpih-add1.S: Likewise.
* mpi/aarch64/mpih-mul1.S: Likewise.
* mpi/aarch64/mpih-mul2.S: Likewise.
* mpi/aarch64/mpih-mul3.S: Likewise.
* mpi/aarch64/mpih-sub1.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (SECTION_RODATA)
(GET_DATA_POINTER): New.
(GET_LOCAL_POINTER): Remove.
* cipher/camellia-aarch64.S: Move constant data to read-only data
section; Remove unneeded '.ltorg'.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
* cipher/sm4-armv9-aarch64-sve-ce.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-s390x.S: Move constant data to read-only
section; Align functions to 16 bytes.
* cipher/poly1305-s390x.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/chacha20-p10le-8x.s: Move constant data to read-only
section.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx-amd64.S: Move constant data to
read-only section.
* cipher/camellia-aesni-avx2-amd64.h: Likewise.
* cipher/camellia-gfni-avx512-amd64.S: Likewise.
* cipher/chacha20-amd64-avx2.S: Likewise.
* cipher/chacha20-amd64-avx512.S: Likewise.
* cipher/chacha20-amd64-ssse3.S: Likewise.
* cipher/des-amd64.s: Likewise.
* cipher/rijndael-ssse3-amd64-asm.S: Likewise.
* cipher/rijndael-vaes-avx2-amd64.S: Likewise.
* cipher/serpent-avx2-amd64.S: Likewise.
* cipher/sm4-aesni-avx-amd64.S: Likewise.
* cipher/sm4-aesni-avx2-amd64.S: Likewise.
* cipher/sm4-gfni-avx2-amd64.S: Likewise.
* cipher/sm4-gfni-avx512-amd64.S: Likewise.
* cipher/twofish-avx2-amd64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/blowfish-amd64.S: Align functions to 16 bytes.
* cipher/camellia-aesni-avx-amd64.S: Likewise.
* cipher/camellia-aesni-avx2-amd64.h: Likewise.
* cipher/camellia-gfni-avx512-amd64.S: Likewise.
* cipher/cast5-amd64.S: Likewise.
* cipher/chacha20-amd64-avx2.S: Likewise.
* cipher/chacha20-amd64-ssse3.S: Likewise.
* cipher/des-amd64.s: Likewise.
* cipher/rijndael-amd64.S: Likewise.
* cipher/rijndael-ssse3-amd64-asm.S: Likewise.
* cipher/salsa20-amd64.S: Likewise.
* cipher/serpent-avx2-amd64.S: Likewise.
* cipher/serpent-sse2-amd64.S: Likewise.
* cipher/sm4-aesni-avx-amd64.S: Likewise.
* cipher/sm4-aesni-avx2-amd64.S: Likewise.
* cipher/sm4-gfni-avx2-amd64.S: Likewise.
* cipher/twofish-amd64.S: Likewise.
* cipher/twofish-avx2-amd64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (SECTION_RODATA): New.
* cipher/blake2b-amd64-avx2.S: Use read-only section for constant
data.
* cipher/blake2b-amd64-avx512.S: Likewise.
* cipher/blake2s-amd64-avx.S: Likewise.
* cipher/blake2s-amd64-avx512.S: Likewise.
* cipher/poly1305-amd64-avx512.S: Likewise.
* cipher/sha1-avx-amd64.S: Likewise.
* cipher/sha1-avx-bmi2-amd64.S: Likewise.
* cipher/sha1-avx2-bmi2-amd64.S: Likewise.
* cipher/sha1-ssse3-amd64.S: Likewise.
* cipher/sha256-avx-amd64.S: Likewise.
* cipher/sha256-avx2-bmi2-amd64.S: Likewise.
* cipher/sha256-ssse3-amd64.S: Likewise.
* cipher/sha512-avx-amd64.S: Likewise.
* cipher/sha512-avx2-bmi2-amd64.S: Likewise.
* cipher/sha512-avx512-amd64.S: Likewise.
* cipher/sha512-ssse3-amd64.S: Likewise.
* cipher/sha3-avx-bmi2-amd64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16
before and after vpopcntb.
* cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear
YMM16-YMM31 registers instead of XMM16-XMM31.
* cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise.
* cipher/keccak-amd64-avx512.S (clear_regs): Likewise.
(clear_avx512_4regs): Clear all 4 registers with XOR.
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
(_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of
ZMM16-ZMM19.
* cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31
registers after vector processing instead of XMM16-XMM31.
* cipher/sha512-avx512-amd64.S
(_gcry_sha512_transform_amd64_avx512): Likewise.
--
Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit
as this is better for AMD Zen4. Also clear xmm16 register
after vpopcnt in avx512 spec-stop so we do not leave any zmm
register state which might end up unnecessarily using CPU
resources.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/aria.c (ARIA_context): Add 'bulk_prefetch_ready'.
(aria_crypt_2blks, aria_crypt_blocks, aria_enc_blocks, aria_dec_blocks)
(_gcry_aria_ctr_enc, _gcry_aria_cbc_enc, _gcry_aria_cbc_dec)
(_gcry_aria_cfb_enc, _gcry_aria_cfb_dec, _gcry_aria_ecb_crypt)
(_gcry_aria_xts_crypt, _gcry_aria_ctr32le_enc, _gcry_aria_ocb_crypt)
(_gcry_aria_ocb_auth): New.
(aria_setkey): Setup 'bulk_ops' function pointers.
--
Patch adds 2-way parallel generic ARIA implementation for modest
performance increase.
Benchmark on AMD Ryzen 9 7900X (x86-64) shows ~40% performance
improvement for parallelizable modes:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.62 ns/B 364.0 MiB/s 14.74 c/B 5625
ECB dec | 2.61 ns/B 365.2 MiB/s 14.69 c/B 5625
CBC enc | 3.62 ns/B 263.7 MiB/s 20.34 c/B 5625
CBC dec | 2.63 ns/B 363.0 MiB/s 14.78 c/B 5625
CFB enc | 3.59 ns/B 265.3 MiB/s 20.22 c/B 5625
CFB dec | 2.63 ns/B 362.0 MiB/s 14.82 c/B 5625
OFB enc | 3.98 ns/B 239.7 MiB/s 22.38 c/B 5625
OFB dec | 4.00 ns/B 238.2 MiB/s 22.52 c/B 5625
CTR enc | 2.64 ns/B 360.6 MiB/s 14.87 c/B 5624
CTR dec | 2.65 ns/B 360.0 MiB/s 14.90 c/B 5625
XTS enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625
XTS dec | 2.67 ns/B 356.9 MiB/s 15.03 c/B 5625
CCM enc | 6.24 ns/B 152.7 MiB/s 35.12 c/B 5625
CCM dec | 6.25 ns/B 152.5 MiB/s 35.18 c/B 5625
CCM auth | 3.59 ns/B 265.4 MiB/s 20.21 c/B 5625
EAX enc | 6.23 ns/B 153.0 MiB/s 35.06 c/B 5625
EAX dec | 6.23 ns/B 153.1 MiB/s 35.05 c/B 5625
EAX auth | 3.59 ns/B 265.4 MiB/s 20.22 c/B 5625
GCM enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625
GCM dec | 2.69 ns/B 354.7 MiB/s 15.12 c/B 5625
GCM auth | 0.031 ns/B 30832 MiB/s 0.174 c/B 5625
OCB enc | 2.71 ns/B 351.4 MiB/s 15.27 c/B 5625
OCB dec | 2.74 ns/B 347.6 MiB/s 15.43 c/B 5625
OCB auth | 2.64 ns/B 360.8 MiB/s 14.87 c/B 5625
SIV enc | 6.24 ns/B 152.9 MiB/s 35.08 c/B 5625
SIV dec | 6.24 ns/B 152.8 MiB/s 35.10 c/B 5625
SIV auth | 3.59 ns/B 266.0 MiB/s 20.17 c/B 5625
GCM-SIV enc | 2.67 ns/B 356.7 MiB/s 15.04 c/B 5625
GCM-SIV dec | 2.68 ns/B 355.7 MiB/s 15.08 c/B 5625
GCM-SIV auth | 0.034 ns/B 28303 MiB/s 0.190 c/B 5625
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|