| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16
before and after vpopcntb.
* cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear
YMM16-YMM31 registers instead of XMM16-XMM31.
* cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise.
* cipher/keccak-amd64-avx512.S (clear_regs): Likewise.
(clear_avx512_4regs): Clear all 4 registers with XOR.
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
(_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of
ZMM16-ZMM19.
* cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31
registers after vector processing instead of XMM16-XMM31.
* cipher/sha512-avx512-amd64.S
(_gcry_sha512_transform_amd64_avx512): Likewise.
--
Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit
as this is better for AMD Zen4. Also clear xmm16 register
after vpopcnt in avx512 spec-stop so we do not leave any zmm
register state which might end up unnecessarily using CPU
resources.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/aria.c (ARIA_context): Add 'bulk_prefetch_ready'.
(aria_crypt_2blks, aria_crypt_blocks, aria_enc_blocks, aria_dec_blocks)
(_gcry_aria_ctr_enc, _gcry_aria_cbc_enc, _gcry_aria_cbc_dec)
(_gcry_aria_cfb_enc, _gcry_aria_cfb_dec, _gcry_aria_ecb_crypt)
(_gcry_aria_xts_crypt, _gcry_aria_ctr32le_enc, _gcry_aria_ocb_crypt)
(_gcry_aria_ocb_auth): New.
(aria_setkey): Setup 'bulk_ops' function pointers.
--
Patch adds 2-way parallel generic ARIA implementation for modest
performance increase.
Benchmark on AMD Ryzen 9 7900X (x86-64) shows ~40% performance
improvement for parallelizable modes:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.62 ns/B 364.0 MiB/s 14.74 c/B 5625
ECB dec | 2.61 ns/B 365.2 MiB/s 14.69 c/B 5625
CBC enc | 3.62 ns/B 263.7 MiB/s 20.34 c/B 5625
CBC dec | 2.63 ns/B 363.0 MiB/s 14.78 c/B 5625
CFB enc | 3.59 ns/B 265.3 MiB/s 20.22 c/B 5625
CFB dec | 2.63 ns/B 362.0 MiB/s 14.82 c/B 5625
OFB enc | 3.98 ns/B 239.7 MiB/s 22.38 c/B 5625
OFB dec | 4.00 ns/B 238.2 MiB/s 22.52 c/B 5625
CTR enc | 2.64 ns/B 360.6 MiB/s 14.87 c/B 5624
CTR dec | 2.65 ns/B 360.0 MiB/s 14.90 c/B 5625
XTS enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625
XTS dec | 2.67 ns/B 356.9 MiB/s 15.03 c/B 5625
CCM enc | 6.24 ns/B 152.7 MiB/s 35.12 c/B 5625
CCM dec | 6.25 ns/B 152.5 MiB/s 35.18 c/B 5625
CCM auth | 3.59 ns/B 265.4 MiB/s 20.21 c/B 5625
EAX enc | 6.23 ns/B 153.0 MiB/s 35.06 c/B 5625
EAX dec | 6.23 ns/B 153.1 MiB/s 35.05 c/B 5625
EAX auth | 3.59 ns/B 265.4 MiB/s 20.22 c/B 5625
GCM enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625
GCM dec | 2.69 ns/B 354.7 MiB/s 15.12 c/B 5625
GCM auth | 0.031 ns/B 30832 MiB/s 0.174 c/B 5625
OCB enc | 2.71 ns/B 351.4 MiB/s 15.27 c/B 5625
OCB dec | 2.74 ns/B 347.6 MiB/s 15.43 c/B 5625
OCB auth | 2.64 ns/B 360.8 MiB/s 14.87 c/B 5625
SIV enc | 6.24 ns/B 152.9 MiB/s 35.08 c/B 5625
SIV dec | 6.24 ns/B 152.8 MiB/s 35.10 c/B 5625
SIV auth | 3.59 ns/B 266.0 MiB/s 20.17 c/B 5625
GCM-SIV enc | 2.67 ns/B 356.7 MiB/s 15.04 c/B 5625
GCM-SIV dec | 2.68 ns/B 355.7 MiB/s 15.08 c/B 5625
GCM-SIV auth | 0.034 ns/B 28303 MiB/s 0.190 c/B 5625
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'aria.c'.
* cipher/aria.c: New.
* cipher/cipher.c (cipher_list, cipher_list_algo301): Add ARIA cipher
specs.
* cipher/mac-cmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_CMAC_ARIA.
(_gcry_mac_type_spec_cmac_aria): New.
* cipher/mac-gmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_GMAC_ARIA.
(_gcry_mac_type_spec_gmac_aria): New.
* cipher/mac-internal.h (_gcry_mac_type_spec_cmac_aria)
(_gcry_mac_type_spec_gmac_aria)
(_gcry_mac_type_spec_poly1305mac_aria): New.
* cipher/mac-poly1305.c (poly1305mac_open): Add GCRY_MAC_GMAC_ARIA.
(_gcry_mac_type_spec_poly1305mac_aria): New.
* cipher/mac.c (mac_list, mac_list_algo201, mac_list_algo401)
(mac_list_algo501): Add ARIA MAC specs.
* configure.ac (available_ciphers): Add 'aria'.
(GCRYPT_CIPHERS): Add 'aria.lo'.
(USE_ARIA): New.
* doc/gcrypt.texi: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192,
GCRY_CIPHER_ARIA256, GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and
GCRY_MAC_POLY1305_ARIA.
* src/cipher.h (_gcry_cipher_spec_aria128, _gcry_cipher_spec_aria192)
(_gcry_cipher_spec_aria256): New.
* src/gcrypt.h.in (gcry_cipher_algos): Add GCRY_CIPHER_ARIA128,
GCRY_CIPHER_ARIA192 and GCRY_CIPHER_ARIA256.
(gcry_mac_algos): GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and
GCRY_MAC_POLY1305_ARIA.
* tests/basic.c (check_ecb_cipher, check_ctr_cipher)
(check_cfb_cipher, check_ocb_cipher) [USE_ARIA]: Add ARIA test-vectors.
(check_ciphers) [USE_ARIA]: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192
and GCRY_CIPHER_ARIA256.
(main): Also run 'check_bulk_cipher_modes' for 'cipher_modes_only'-mode.
* tests/bench-slope.c (bench_mac_init): Add GCRY_MAC_POLY1305_ARIA
setiv-handling.
* tests/benchmark.c (mac_bench): Likewise.
--
This patch adds ARIA block cipher for libgcrypt. This implementation
is based on work by Taehee Yoo, with following notable changes:
- Integration to libgcrypt, use of bithelp.h and bufhelp.h helper
functions where possible.
- Added lookup table prefetching as is done in AES, GCM and SM4
implementations.
- Changed `get_u8` to return `u32` as returning `byte` caused
sub-optimal code generation with gcc-12/x86-64 (zero extending
from 8-bit to 32-bit register, followed by extraneous sign
extending from 32-bit to 64-bit register).
- Changed 'aria_crypt' loop structure a bit for tiny performance
increase (~1% seen with gcc-12/x86-64/zen4).
Benchmark on AMD Ryzen 9 7900X (x86-64):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 3.99 ns/B 239.1 MiB/s 22.43 c/B 5625
ECB dec | 4.00 ns/B 238.4 MiB/s 22.50 c/B 5625
Benchmark on AMD Ryzen 9 7900X (win32):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 4.57 ns/B 208.7 MiB/s 25.31 c/B 5538
ECB dec | 4.66 ns/B 204.8 MiB/s 25.39 c/B 5453
Benchmark on ARM Cortex-A53 (aarch64):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 74.69 ns/B 12.77 MiB/s 48.40 c/B 647.9
ECB dec | 74.99 ns/B 12.72 MiB/s 48.58 c/B 647.9
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/sm4.c (_gcry_sm4_ocb_crypt) [USE_GFNI_AVX512]: Add 16-way
GFNI-AVX512 handling.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bulkhelp.h (bulk_crypt_fn_t): Make 'ctx' non-constant and
change 'num_blks' from 'unsigned int' to 'size_t'.
* cipher/camellia-glue.c (camellia_encrypt_blk1_32)
(camellia_encrypt_blk1_64, camellia_decrypt_blk1_32)
(camellia_decrypt_blk1_64): Adjust to match 'bulk_crypt_fn_t'.
* cipher/serpent.c (serpent_crypt_blk1_16, serpent_encrypt_blk1_16)
(serpent_decrypt_blk1_16): Likewise.
* cipher/sm4.c (crypt_blk1_16_fn_t, _gcry_sm4_aesni_avx_crypt_blk1_8)
(sm4_aesni_avx_crypt_blk1_16, _gcry_sm4_aesni_avx2_crypt_blk1_16)
(sm4_aesni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx2_crypt_blk1_16)
(sm4_gfni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_gfni_avx512_crypt_blk32, sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_aarch64_crypt_blk1_8, sm4_aarch64_crypt_blk1_16)
(_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_16)
(_gcry_sm4_armv9_sve_ce_crypt, sm4_armv9_sve_ce_crypt_blk1_16)
(sm4_crypt_blocks, sm4_crypt_blk1_32, sm4_encrypt_blk1_32)
(sm4_decrypt_blk1_32): Likewise.
* cipher/twofish.c (twofish_crypt_blk1_16, twofish_encrypt_blk1_16)
(twofish_decrypt_blk1_16): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (cipher_list_algo301): Remove comma at the end
of last entry.
* cipher/mac-gmac.c (map_mac_algo_to_cipher): Add SM4.
(_gcry_mac_type_spec_gmac_sm4): New.
* cipher/max-internal.h (_gcry_mac_type_spec_gmac_sm4)
(_gcry_mac_type_spec_poly1305mac_sm4): New.
* cipher/mac-poly1305.c (poly1305mac_open): Add SM4.
(_gcry_mac_type_spec_poly1305mac_sm4): New.
* cipher/mac.c (mac_list, mac_list_algo401, mac_list_algo501): Add
GMAC-SM4 and Poly1304-SM4.
(mac_list_algo101): Remove comma at the end of last entry.
* cipher/md.c (digest_list_algo301): Remove comma at the end of
last entry.
* doc/gcrypt.texi: Add GCRY_MAC_GMAC_SM4 and GCRY_MAC_POLY1305_SM4.
* src/gcrypt.h.in (GCRY_MAC_GMAC_SM4, GCRY_MAC_POLY1305_SM4): New.
* tests/bench-slope.c (bench_mac_init): Setup IV for
GCRY_MAC_POLY1305_SM4.
* tests/benchmark.c (mac_bench): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-common.h (asm_sbox_be): New.
* cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Use 'asm_sbox_be'
instead of 'vec_sbox_be' since this instrinsics has different
prototype definition on GCC and Clang ('vector uchar' vs 'vector
ulong long').
* cipher/sha256-ppc.c (vec_ror_u32): Remove unused function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_arm_platform_as_ok)
(gcry_cv_gcc_inline_asm_neon): Remove % prefix from register names.
* cipher/cipher-gcm-armv7-neon.S (vmull_p64): Prefix constant values
with # character instead of $.
* cipher/blowfish-arm.S: Remove % prefix from all register names.
* cipher/camellia-arm.S: Likewise.
* cipher/cast5-arm.S: Likewise.
* cipher/rijndael-arm.S: Likewise.
* cipher/rijndael-armv8-aarch32-ce.S: Likewise.
* cipher/sha512-arm.S: Likewise.
* cipher/sha512-armv7-neon.S: Likewise.
* cipher/twofish-arm.S: Likewise.
* mpi/arm/mpih-add1.S: Likewise.
* mpi/arm/mpih-mul1.S: Likewise.
* mpi/arm/mpih-mul2.S: Likewise.
* mpi/arm/mpih-mul3.S: Likewise.
* mpi/arm/mpih-sub1.S: Likewise.
--
Reported-by: Dmytro Kovalov <dmytro.a.kovalov@globallogic.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-function.h (CBC_ENC_FUNC): Fix outiv constraint.
--
Noticed when trying to compile with powerpc64le clang. GCC accepted the
buggy constraint without complaints.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (spec_stop_avx512_intel_syntax): New.
* cipher/poly1305-amd64-avx512.S: Use spec_stop_avx512_intel_syntax
instead of spec_stop_avx512.
* cipher/sha512-avx512-amd64.S: Likewise.
--
Reported-by: Clemens Lang <cllang@redhat.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* m4/ax_cc_for_build.m4: Fix for no arg.
* m4/noexecstack.m4: Likewise.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
| |
* configure.ac: More fixes for other architecture.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add function declarations for asm functions.
--
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c: Use xmm registers for AVX512
spec stop.
* cipher/asm-common-amd64.h (spec_stop_avx512): New.
* cipher/blake2b-amd64-avx512.S: Use spec_stop_avx512.
* cipher/blake2s-amd64-avx512.S: Likewise.
* cipher/camellia-gfni-avx512-amd64.S: Likewise.
* cipher/chacha20-avx512-amd64.S: Likewise.
* cipher/keccak-amd64-avx512.S: Likewise.
* cipher/poly1305-amd64-avx512.S: Likewise.
* cipher/sha512-avx512-amd64.S: Likewise.
* cipher/sm4-gfni-avx512-amd64.S: Likewise.
---
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
code a bit
* cipher/chacha20-amd64-avx512.S: Add tail handling for 8/4/2/1
blocks; Rename `_gcry_chacha20_amd64_avx512_blocks16` to
`_gcry_chacha20_amd64_avx512_blocks`; Tweak 16 parallel block processing
for small speed improvement.
* cipher/chacha20.c (_gcry_chacha20_amd64_avx512_blocks16): Rename to ...
(_gcry_chacha20_amd64_avx512_blocks): ... this.
(chacha20_blocks) [USE_AVX512]: Add AVX512 code-path.
(do_chacha20_encrypt_stream_tail) [USE_AVX512]: Change to handle any
number of full input blocks instead of multiples of 16.
--
Patch improves performance of ChaCha20-AVX512 implementation on small
input buffer sizes (less than 64*16B = 1024B).
Following benchmarks show improvement in 16 parallel blocks processing
performance.
===
Benchmark on AMD Ryzen 9 7900X:
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
STREAM enc | 0.130 ns/B 7330 MiB/s 0.716 c/B 5500
STREAM dec | 0.128 ns/B 7426 MiB/s 0.713 c/B 5555
POLY1305 enc | 0.175 ns/B 5444 MiB/s 0.964 c/B 5500
POLY1305 dec | 0.175 ns/B 5455 MiB/s 0.962 c/B 5500
After:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
STREAM enc | 0.123 ns/B 7767 MiB/s 0.691 c/B 5625
STREAM dec | 0.123 ns/B 7736 MiB/s 0.693 c/B 5625
POLY1305 enc | 0.168 ns/B 5679 MiB/s 0.945 c/B 5625
POLY1305 dec | 0.167 ns/B 5708 MiB/s 0.940 c/B 5625
===
Benchmark on Intel Core i3-1115G4:
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
STREAM enc | 0.161 ns/B 5934 MiB/s 0.658 c/B 4097±3
STREAM dec | 0.160 ns/B 5951 MiB/s 0.656 c/B 4097±4
POLY1305 enc | 0.220 ns/B 4333 MiB/s 0.902 c/B 4096±3
POLY1305 dec | 0.220 ns/B 4325 MiB/s 0.903 c/B 4096±3
After:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
STREAM enc | 0.152 ns/B 6267 MiB/s 0.623 c/B 4097±3
STREAM dec | 0.152 ns/B 6287 MiB/s 0.621 c/B 4097±3
POLY1305 enc | 0.215 ns/B 4443 MiB/s 0.879 c/B 4096±3
POLY1305 dec | 0.214 ns/B 4452 MiB/s 0.878 c/B 4096±3
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
| |
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rsa.c (rsa_generate): Do not accept use-x931 or derive-parms
in FIPS mode.
* tests/pubkey.c (get_keys_x931_new): Expect failure in FIPS mode.
(check_run): Skip checking X9.31 keys in FIPS mode.
* doc/gcrypt.texi: Document "test-parms" and clarify some cases around
the X9.31 keygen.
--
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
| |
* cipher/rsa-common.c (_gcry_rsa_pss_encode): Prevent usage of large
salt lengths
(_gcry_rsa_pss_verify): Ditto.
* tests/basic.c (check_pubkey_sign): Check longer salt length fails in
FIPS mode
* tests/t-rsa-pss.c (one_test_sexp): Fix function name in error message
|
|
|
|
|
|
|
|
| |
* random/rndw32.c (slow_gatherer): Suppress emitting by log_info.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
| |
* src/fips.c (_gcry_fips_indicator_cipher): Add key wrapping mode as
approved.
--
GnuPG-bug-id: 5512
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/kdf.c (_gcry_kdf_pkdf2): Require 8 chars passphrase for FIPS.
Set bounds for salt length and iteration count in FIPS mode.
--
GnuPG-bug-id: 6039
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
| |
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/libgcrypt.m4: Overriding the decision by
--with-libgcrypt-prefix, use gpgrt-config libgcrypt when gpgrt-config
is available.
--
This may offer better migration.
GnuPG-bug-id: 5034
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/keccak.c (_gcry_keccak_absorb_blocks_avx512): Change size_t
to u64; change 'const byte **new_lanes' to 'u64 *new_lanes'.
(keccak_absorb_lanes64_avx512): Get new lines pointer from assembly
through 'u64' type.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent-armv7-neon.S (_gcry_serpent_neon_blk8): New.
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_blk16): New.
* cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_blk8): New.
* cipher/serpent.c (_gcry_serpent_sse2_blk8)
(_gcry_serpent_avx2_blk16, _gcry_serpent_neon_blk8)
(_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt)
(serpent_crypt_blk1_16, serpent_encrypt_blk1_16)
(serpent_decrypt_blk1_16): New.
(serpent_setkey): Setup XTS and ECB bulk functions.
--
Benchmark on AMD Ryzen 9 7900X:
Before:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 5.42 ns/B 176.0 MiB/s 30.47 c/B 5625
ECB dec | 4.82 ns/B 197.9 MiB/s 27.11 c/B 5625
XTS enc | 5.57 ns/B 171.3 MiB/s 31.31 c/B 5625
XTS dec | 4.99 ns/B 191.1 MiB/s 28.07 c/B 5625
After:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.708 ns/B 1347 MiB/s 3.98 c/B 5625
ECB dec | 0.694 ns/B 1373 MiB/s 3.91 c/B 5625
XTS enc | 0.766 ns/B 1246 MiB/s 4.31 c/B 5625
XTS dec | 0.754 ns/B 1264 MiB/s 4.24 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth) [USE_NEON]: Cast "Ls" to 'const void **'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/twofish-amd64.S (_gcry_twofish_amd64_blk3): New.
* cipher/twofish-avx2-amd64.S (_gcry_twofish_avx2_blk16): New.
(_gcry_twofish_xts_crypt, _gcry_twofish_ecb_crypt)
(_gcry_twofish_avx2_blk16, _gcry_twofish_amd64_blk3)
(twofish_crypt_blk1_16, twofish_encrypt_blk1_16)
(twofish_decrypt_blk1_16): New.
(twofish_setkey): Setup XTS and ECB bulk functions.
--
Benchmark on AMD Ryzen 9 7900X:
Before:
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.52 ns/B 378.2 MiB/s 14.18 c/B 5625
ECB dec | 2.51 ns/B 380.2 MiB/s 14.11 c/B 5625
XTS enc | 2.65 ns/B 359.9 MiB/s 14.91 c/B 5625
XTS dec | 2.63 ns/B 362.0 MiB/s 14.60 c/B 5541
After:
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.60 ns/B 594.8 MiB/s 9.02 c/B 5625
ECB dec | 1.60 ns/B 594.8 MiB/s 9.02 c/B 5625
XTS enc | 1.66 ns/B 573.9 MiB/s 9.35 c/B 5625
XTS dec | 1.67 ns/B 569.6 MiB/s 9.41 c/B 5619±2
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sm4.c (_gcry_sm4_ecb_crypt): New.
(sm4_setkey): Setup ECB bulk function.
--
Benchmark on AMD Ryzen 9 7900X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 4.75 ns/B 200.6 MiB/s 26.74 c/B 5625
ECB dec | 4.79 ns/B 199.3 MiB/s 26.92 c/B 5625
After (OCB for reference):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.252 ns/B 3782 MiB/s 1.42 c/B 5624
ECB dec | 0.253 ns/B 3770 MiB/s 1.42 c/B 5625
OCB enc | 0.277 ns/B 3446 MiB/s 1.56 c/B 5625
OCB dec | 0.281 ns/B 3399 MiB/s 1.54 c/B 5500
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sm4.c (sm4_expand_key): Prefetch sbox table.
(sm4_get_crypt_blk1_16_fn): Do not prefetch sbox table.
(sm4_expand_key, _gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec)
(_gcry_sm4_cfb_dec): Prefetch sbox table if table look-up
implementation is used.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bulkhelp.h (bulk_ecb_crypt_128): New.
* cipher/camellia-glue.c (_gcry_camellia_ecb_crypt): New.
(camellia_setkey): Select ECB bulk function with AESNI/AVX2, VAES/AVX2
and GFNI/AVX2.
--
Benchmark on AMD Ryzen 9 7900X:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 3.27 ns/B 291.8 MiB/s 18.38 c/B 5625
ECB dec | 3.25 ns/B 293.3 MiB/s 18.29 c/B 5625
After (OCB for reference):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.146 ns/B 6533 MiB/s 0.803 c/B 5500
ECB dec | 0.149 ns/B 6384 MiB/s 0.822 c/B 5500
OCB enc | 0.170 ns/B 5608 MiB/s 0.957 c/B 5625
OCB dec | 0.175 ns/B 5452 MiB/s 0.984 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
| |
* cipher/rijndael-vaes-avx2-amd64.S: Align functions to 16 bytes.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'.
* cipher/cipher.c (do_ecb_crypt): Use bulk function if available.
* cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label
'.Ldeclast' to '.Lenclast'.
(_gcry_aes_aesni_ecb_crypt): New.
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce)
(_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change
return value from void to size_t.
(ocb_crypt_fn_t, xts_crypt_fn_t): Remove.
(_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove
indirect function call; Return value from called function (allows tail
call optimization).
(_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows
tail call optimization).
(_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce)
(_gcry_aes_armv8_ce_ecb_crypt): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ecb_crypt_amd64): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64)
(_gcry_aes_vaes_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt)
(_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New.
(do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE.
--
Benchmark on AMD Ryzen 9 7900X:
Before (OCB for reference):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1
ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608
OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625
OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625
ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* mpi/longlong.h [__powerpc__, __powerpc64__]: Update macros.
--
Update longlong.h powerpc macros with more up to date versions
from GCC's longlong.h. Note, GCC's version is licensed under
LGPLv2.1+.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/hwf-x86.c (detect_x86_gnuc): Move model based checks and
forced soft hwfeatures enablement at end; Enable VPGATHER for
AMD CPUs with AVX512.
--
AMD Zen4 is able to benefit from VPGATHER based table-lookup for
Twofish.
Benchmark on Ryzen 9 7900X:
Before:
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CTR enc | 1.79 ns/B 532.8 MiB/s 10.07 c/B 5625
CTR dec | 1.79 ns/B 532.6 MiB/s 10.07 c/B 5625
After (~10% faster):
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CTR enc | 1.61 ns/B 593.5 MiB/s 9.05 c/B 5631±2
CTR dec | 1.61 ns/B 590.8 MiB/s 9.08 c/B 5625
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/sha512.c (sha512_init_common): Enable AVX512 implementation
only for Intel CPUs.
--
SHA512-AVX512 implementation is slightly slower than AVX2 variant
on AMD Zen4 (AVX512 4.88 cpb, AVX2 4.35 cpb). This is likely
because AVX512 implementation uses vector registers for round
function unlike AVX2 where general purpose registers are used
for round function. On Zen4, message expansion and round function
then end up competing for narrower vector execution bandwidth
and gives slower performance.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* src/visibility.c (gcry_md_setkey): Add the check here, too.
--
GnuPG-bug-id: 6039
Fixes-commit: 58c92098d053aae7c78cc42bdd7c80c13efc89bb
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/kdf.c (_gcry_kdf_pkdf2): Remove the length limitation of
passphrase input length.
--
This reverts commit 857e6f467d0fc9fd858a73d84122695425970075.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
| |
* m4/gpg-error.m4: Update from libgpg-error 1.46.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
| |
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
| |
* cipher/rsa.c (selftests_rsa): Skip encryption selftest as this
operation is not claimed as part of the certification.
|
|
|
|
|
| |
This reverts commit f736f3c70182d9c948f9105eb769c47c5578df35. The pubkey
encryption has already separate explicit FIPS service indicator.
|
|
|
|
|
| |
This reverts commit c7709f7b23848abf4ba65cb99cb2a9e9c7ebdefc. The pubkey
encryption has already separate explicit FIPS service indicator.
|
|
|
|
|
| |
This reverts commit 249ca431ef881d510b90a5d3db9cd8507c4d697b. The pubkey
encryption has already separate explicit FIPS service indicator.
|
|
|
|
|
| |
This reverts commit e552e37983da0c54840786eeff34481685fde1e9. The pubkey
encryption has already separate explicit FIPS service indicator.
|
|
|
|
|
|
|
|
|
| |
* src/fips.c (_gcry_fips_indicator_function): Add
gcry_pk_encrypt/decrypt as non-approved.
--
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
* src/fips.c (_gcry_fips_indicator_function): Fix typo in sign/verify
function names.
--
Fixes-commit: 05a9c9d1ba1db6c1cd160fba979e9ddf4700a0c0
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
|
|
|
|
|
|
|
|
| |
* doc/gcrypt.texi: Fix GCM-SIV RFC reference to RFC-8452.
--
GnuPG-bug-id: 6232
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* mpi/longlong.h [__i386__] (count_trailing_zeros): Add 'rep' prefix
for 'bsfq'.
--
"rep;bsf" aka "tzcnt" is new instruction with well defined operation
on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt"
functions as old "bsf" with undefined behaviour on zero input.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* mpi/longlong.h [__x86_64__] (count_trailing_zeros): Add 'rep' prefix
for 'bsfq'.
--
"rep;bsf" aka "tzcnt" is new instruction with well defined operation
on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt"
functions as old "bsf" with undefined behaviour on zero input.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* mpi/longlong.h [!umul_ppmm] (smul_ppmm): Change ifdef
from !defined(umul_ppmm) to !defined(smul_ppmm).
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|