summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* avx512: tweak zmm16-zmm31 register clearingJussi Kivilinna2023-01-177-37/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16 before and after vpopcntb. * cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear YMM16-YMM31 registers instead of XMM16-XMM31. * cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise. * cipher/keccak-amd64-avx512.S (clear_regs): Likewise. (clear_avx512_4regs): Clear all 4 registers with XOR. * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) (_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of ZMM16-ZMM19. * cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31 registers after vector processing instead of XMM16-XMM31. * cipher/sha512-avx512-amd64.S (_gcry_sha512_transform_amd64_avx512): Likewise. -- Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit as this is better for AMD Zen4. Also clear xmm16 register after vpopcnt in avx512 spec-stop so we do not leave any zmm register state which might end up unnecessarily using CPU resources. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aria: add generic 2-way bulk processingJussi Kivilinna2023-01-061-2/+477
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/aria.c (ARIA_context): Add 'bulk_prefetch_ready'. (aria_crypt_2blks, aria_crypt_blocks, aria_enc_blocks, aria_dec_blocks) (_gcry_aria_ctr_enc, _gcry_aria_cbc_enc, _gcry_aria_cbc_dec) (_gcry_aria_cfb_enc, _gcry_aria_cfb_dec, _gcry_aria_ecb_crypt) (_gcry_aria_xts_crypt, _gcry_aria_ctr32le_enc, _gcry_aria_ocb_crypt) (_gcry_aria_ocb_auth): New. (aria_setkey): Setup 'bulk_ops' function pointers. -- Patch adds 2-way parallel generic ARIA implementation for modest performance increase. Benchmark on AMD Ryzen 9 7900X (x86-64) shows ~40% performance improvement for parallelizable modes: ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 2.62 ns/B 364.0 MiB/s 14.74 c/B 5625 ECB dec | 2.61 ns/B 365.2 MiB/s 14.69 c/B 5625 CBC enc | 3.62 ns/B 263.7 MiB/s 20.34 c/B 5625 CBC dec | 2.63 ns/B 363.0 MiB/s 14.78 c/B 5625 CFB enc | 3.59 ns/B 265.3 MiB/s 20.22 c/B 5625 CFB dec | 2.63 ns/B 362.0 MiB/s 14.82 c/B 5625 OFB enc | 3.98 ns/B 239.7 MiB/s 22.38 c/B 5625 OFB dec | 4.00 ns/B 238.2 MiB/s 22.52 c/B 5625 CTR enc | 2.64 ns/B 360.6 MiB/s 14.87 c/B 5624 CTR dec | 2.65 ns/B 360.0 MiB/s 14.90 c/B 5625 XTS enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625 XTS dec | 2.67 ns/B 356.9 MiB/s 15.03 c/B 5625 CCM enc | 6.24 ns/B 152.7 MiB/s 35.12 c/B 5625 CCM dec | 6.25 ns/B 152.5 MiB/s 35.18 c/B 5625 CCM auth | 3.59 ns/B 265.4 MiB/s 20.21 c/B 5625 EAX enc | 6.23 ns/B 153.0 MiB/s 35.06 c/B 5625 EAX dec | 6.23 ns/B 153.1 MiB/s 35.05 c/B 5625 EAX auth | 3.59 ns/B 265.4 MiB/s 20.22 c/B 5625 GCM enc | 2.68 ns/B 355.8 MiB/s 15.08 c/B 5625 GCM dec | 2.69 ns/B 354.7 MiB/s 15.12 c/B 5625 GCM auth | 0.031 ns/B 30832 MiB/s 0.174 c/B 5625 OCB enc | 2.71 ns/B 351.4 MiB/s 15.27 c/B 5625 OCB dec | 2.74 ns/B 347.6 MiB/s 15.43 c/B 5625 OCB auth | 2.64 ns/B 360.8 MiB/s 14.87 c/B 5625 SIV enc | 6.24 ns/B 152.9 MiB/s 35.08 c/B 5625 SIV dec | 6.24 ns/B 152.8 MiB/s 35.10 c/B 5625 SIV auth | 3.59 ns/B 266.0 MiB/s 20.17 c/B 5625 GCM-SIV enc | 2.67 ns/B 356.7 MiB/s 15.04 c/B 5625 GCM-SIV dec | 2.68 ns/B 355.7 MiB/s 15.08 c/B 5625 GCM-SIV auth | 0.034 ns/B 28303 MiB/s 0.190 c/B 5625 Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARIA block cipherJussi Kivilinna2023-01-0615-8/+1495
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'aria.c'. * cipher/aria.c: New. * cipher/cipher.c (cipher_list, cipher_list_algo301): Add ARIA cipher specs. * cipher/mac-cmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_CMAC_ARIA. (_gcry_mac_type_spec_cmac_aria): New. * cipher/mac-gmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_GMAC_ARIA. (_gcry_mac_type_spec_gmac_aria): New. * cipher/mac-internal.h (_gcry_mac_type_spec_cmac_aria) (_gcry_mac_type_spec_gmac_aria) (_gcry_mac_type_spec_poly1305mac_aria): New. * cipher/mac-poly1305.c (poly1305mac_open): Add GCRY_MAC_GMAC_ARIA. (_gcry_mac_type_spec_poly1305mac_aria): New. * cipher/mac.c (mac_list, mac_list_algo201, mac_list_algo401) (mac_list_algo501): Add ARIA MAC specs. * configure.ac (available_ciphers): Add 'aria'. (GCRYPT_CIPHERS): Add 'aria.lo'. (USE_ARIA): New. * doc/gcrypt.texi: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192, GCRY_CIPHER_ARIA256, GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and GCRY_MAC_POLY1305_ARIA. * src/cipher.h (_gcry_cipher_spec_aria128, _gcry_cipher_spec_aria192) (_gcry_cipher_spec_aria256): New. * src/gcrypt.h.in (gcry_cipher_algos): Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192 and GCRY_CIPHER_ARIA256. (gcry_mac_algos): GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and GCRY_MAC_POLY1305_ARIA. * tests/basic.c (check_ecb_cipher, check_ctr_cipher) (check_cfb_cipher, check_ocb_cipher) [USE_ARIA]: Add ARIA test-vectors. (check_ciphers) [USE_ARIA]: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192 and GCRY_CIPHER_ARIA256. (main): Also run 'check_bulk_cipher_modes' for 'cipher_modes_only'-mode. * tests/bench-slope.c (bench_mac_init): Add GCRY_MAC_POLY1305_ARIA setiv-handling. * tests/benchmark.c (mac_bench): Likewise. -- This patch adds ARIA block cipher for libgcrypt. This implementation is based on work by Taehee Yoo, with following notable changes: - Integration to libgcrypt, use of bithelp.h and bufhelp.h helper functions where possible. - Added lookup table prefetching as is done in AES, GCM and SM4 implementations. - Changed `get_u8` to return `u32` as returning `byte` caused sub-optimal code generation with gcc-12/x86-64 (zero extending from 8-bit to 32-bit register, followed by extraneous sign extending from 32-bit to 64-bit register). - Changed 'aria_crypt' loop structure a bit for tiny performance increase (~1% seen with gcc-12/x86-64/zen4). Benchmark on AMD Ryzen 9 7900X (x86-64): ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 3.99 ns/B 239.1 MiB/s 22.43 c/B 5625 ECB dec | 4.00 ns/B 238.4 MiB/s 22.50 c/B 5625 Benchmark on AMD Ryzen 9 7900X (win32): ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 4.57 ns/B 208.7 MiB/s 25.31 c/B 5538 ECB dec | 4.66 ns/B 204.8 MiB/s 25.39 c/B 5453 Benchmark on ARM Cortex-A53 (aarch64): ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 74.69 ns/B 12.77 MiB/s 48.40 c/B 647.9 ECB dec | 74.99 ns/B 12.72 MiB/s 48.58 c/B 647.9 Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4: add missing OCB 16-way GFNI-AVX512 pathJussi Kivilinna2023-01-041-0/+20
| | | | | | | | * cipher/sm4.c (_gcry_sm4_ocb_crypt) [USE_GFNI_AVX512]: Add 16-way GFNI-AVX512 handling. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* bulkhelp: change bulk function definition to allow modifying contextJussi Kivilinna2023-01-045-61/+59
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/bulkhelp.h (bulk_crypt_fn_t): Make 'ctx' non-constant and change 'num_blks' from 'unsigned int' to 'size_t'. * cipher/camellia-glue.c (camellia_encrypt_blk1_32) (camellia_encrypt_blk1_64, camellia_decrypt_blk1_32) (camellia_decrypt_blk1_64): Adjust to match 'bulk_crypt_fn_t'. * cipher/serpent.c (serpent_crypt_blk1_16, serpent_encrypt_blk1_16) (serpent_decrypt_blk1_16): Likewise. * cipher/sm4.c (crypt_blk1_16_fn_t, _gcry_sm4_aesni_avx_crypt_blk1_8) (sm4_aesni_avx_crypt_blk1_16, _gcry_sm4_aesni_avx2_crypt_blk1_16) (sm4_aesni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx2_crypt_blk1_16) (sm4_gfni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx512_crypt_blk1_16) (_gcry_sm4_gfni_avx512_crypt_blk32, sm4_gfni_avx512_crypt_blk1_16) (_gcry_sm4_aarch64_crypt_blk1_8, sm4_aarch64_crypt_blk1_16) (_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_16) (_gcry_sm4_armv9_sve_ce_crypt, sm4_armv9_sve_ce_crypt_blk1_16) (sm4_crypt_blocks, sm4_crypt_blk1_32, sm4_encrypt_blk1_32) (sm4_decrypt_blk1_32): Likewise. * cipher/twofish.c (twofish_crypt_blk1_16, twofish_encrypt_blk1_16) (twofish_decrypt_blk1_16): Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add GMAC-SM4 and Poly1305-SM4Jussi Kivilinna2023-01-0410-12/+58
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher.c (cipher_list_algo301): Remove comma at the end of last entry. * cipher/mac-gmac.c (map_mac_algo_to_cipher): Add SM4. (_gcry_mac_type_spec_gmac_sm4): New. * cipher/max-internal.h (_gcry_mac_type_spec_gmac_sm4) (_gcry_mac_type_spec_poly1305mac_sm4): New. * cipher/mac-poly1305.c (poly1305mac_open): Add SM4. (_gcry_mac_type_spec_poly1305mac_sm4): New. * cipher/mac.c (mac_list, mac_list_algo401, mac_list_algo501): Add GMAC-SM4 and Poly1304-SM4. (mac_list_algo101): Remove comma at the end of last entry. * cipher/md.c (digest_list_algo301): Remove comma at the end of last entry. * doc/gcrypt.texi: Add GCRY_MAC_GMAC_SM4 and GCRY_MAC_POLY1305_SM4. * src/gcrypt.h.in (GCRY_MAC_GMAC_SM4, GCRY_MAC_POLY1305_SM4): New. * tests/bench-slope.c (bench_mac_init): Setup IV for GCRY_MAC_POLY1305_SM4. * tests/benchmark.c (mac_bench): Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Fix compiler warnings seen with clang-powerpc64le targetJussi Kivilinna2023-01-043-9/+12
| | | | | | | | | | | | * cipher/rijndael-ppc-common.h (asm_sbox_be): New. * cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Use 'asm_sbox_be' instead of 'vec_sbox_be' since this instrinsics has different prototype definition on GCC and Clang ('vector uchar' vs 'vector ulong long'). * cipher/sha256-ppc.c (vec_ror_u32): Remove unused function. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add clang support for ARM 32-bit assemblyJussi Kivilinna2022-12-1415-682/+682
| | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_arm_platform_as_ok) (gcry_cv_gcc_inline_asm_neon): Remove % prefix from register names. * cipher/cipher-gcm-armv7-neon.S (vmull_p64): Prefix constant values with # character instead of $. * cipher/blowfish-arm.S: Remove % prefix from all register names. * cipher/camellia-arm.S: Likewise. * cipher/cast5-arm.S: Likewise. * cipher/rijndael-arm.S: Likewise. * cipher/rijndael-armv8-aarch32-ce.S: Likewise. * cipher/sha512-arm.S: Likewise. * cipher/sha512-armv7-neon.S: Likewise. * cipher/twofish-arm.S: Likewise. * mpi/arm/mpih-add1.S: Likewise. * mpi/arm/mpih-mul1.S: Likewise. * mpi/arm/mpih-mul2.S: Likewise. * mpi/arm/mpih-mul3.S: Likewise. * mpi/arm/mpih-sub1.S: Likewise. -- Reported-by: Dmytro Kovalov <dmytro.a.kovalov@globallogic.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: fix wrong inline assembly constraintJussi Kivilinna2022-12-141-1/+1
| | | | | | | | | | * cipher/rijndael-ppc-function.h (CBC_ENC_FUNC): Fix outiv constraint. -- Noticed when trying to compile with powerpc64le clang. GCC accepted the buggy constraint without complaints. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Fix building AVX512 Intel-syntax assembly with x86-64 clangJussi Kivilinna2022-12-143-2/+6
| | | | | | | | | | | * cipher/asm-common-amd64.h (spec_stop_avx512_intel_syntax): New. * cipher/poly1305-amd64-avx512.S: Use spec_stop_avx512_intel_syntax instead of spec_stop_avx512. * cipher/sha512-avx512-amd64.S: Likewise. -- Reported-by: Clemens Lang <cllang@redhat.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* build: Fix m4 macros for strict C compiler.NIIBE Yutaka2022-12-142-2/+2
| | | | | | | | | * m4/ax_cc_for_build.m4: Fix for no arg. * m4/noexecstack.m4: Likewise. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Fix configure.ac for strict C99.NIIBE Yutaka2022-12-141-0/+3
| | | | | | | | * configure.ac: More fixes for other architecture. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Fix configure.ac for strict C99.NIIBE Yutaka2022-12-131-29/+43
| | | | | | | | | * configure.ac: Add function declarations for asm functions. -- Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* avx512: tweak AVX512 spec stop, use common macro in assemblyJussi Kivilinna2022-12-1210-20/+44
| | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c: Use xmm registers for AVX512 spec stop. * cipher/asm-common-amd64.h (spec_stop_avx512): New. * cipher/blake2b-amd64-avx512.S: Use spec_stop_avx512. * cipher/blake2s-amd64-avx512.S: Likewise. * cipher/camellia-gfni-avx512-amd64.S: Likewise. * cipher/chacha20-avx512-amd64.S: Likewise. * cipher/keccak-amd64-avx512.S: Likewise. * cipher/poly1305-amd64-avx512.S: Likewise. * cipher/sha512-avx512-amd64.S: Likewise. * cipher/sm4-gfni-avx512-amd64.S: Likewise. --- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* chacha20-avx512: add handling for any input block count and tweak 16 block ↵Jussi Kivilinna2022-12-122-55/+496
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | code a bit * cipher/chacha20-amd64-avx512.S: Add tail handling for 8/4/2/1 blocks; Rename `_gcry_chacha20_amd64_avx512_blocks16` to `_gcry_chacha20_amd64_avx512_blocks`; Tweak 16 parallel block processing for small speed improvement. * cipher/chacha20.c (_gcry_chacha20_amd64_avx512_blocks16): Rename to ... (_gcry_chacha20_amd64_avx512_blocks): ... this. (chacha20_blocks) [USE_AVX512]: Add AVX512 code-path. (do_chacha20_encrypt_stream_tail) [USE_AVX512]: Change to handle any number of full input blocks instead of multiples of 16. -- Patch improves performance of ChaCha20-AVX512 implementation on small input buffer sizes (less than 64*16B = 1024B). Following benchmarks show improvement in 16 parallel blocks processing performance. === Benchmark on AMD Ryzen 9 7900X: Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.130 ns/B 7330 MiB/s 0.716 c/B 5500 STREAM dec | 0.128 ns/B 7426 MiB/s 0.713 c/B 5555 POLY1305 enc | 0.175 ns/B 5444 MiB/s 0.964 c/B 5500 POLY1305 dec | 0.175 ns/B 5455 MiB/s 0.962 c/B 5500 After: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.123 ns/B 7767 MiB/s 0.691 c/B 5625 STREAM dec | 0.123 ns/B 7736 MiB/s 0.693 c/B 5625 POLY1305 enc | 0.168 ns/B 5679 MiB/s 0.945 c/B 5625 POLY1305 dec | 0.167 ns/B 5708 MiB/s 0.940 c/B 5625 === Benchmark on Intel Core i3-1115G4: Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.161 ns/B 5934 MiB/s 0.658 c/B 4097±3 STREAM dec | 0.160 ns/B 5951 MiB/s 0.656 c/B 4097±4 POLY1305 enc | 0.220 ns/B 4333 MiB/s 0.902 c/B 4096±3 POLY1305 dec | 0.220 ns/B 4325 MiB/s 0.903 c/B 4096±3 After: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.152 ns/B 6267 MiB/s 0.623 c/B 4097±3 STREAM dec | 0.152 ns/B 6287 MiB/s 0.621 c/B 4097±3 POLY1305 enc | 0.215 ns/B 4443 MiB/s 0.879 c/B 4096±3 POLY1305 dec | 0.214 ns/B 4452 MiB/s 0.878 c/B 4096±3 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* doc: Minor fix up.NIIBE Yutaka2022-12-061-3/+3
| | | | | | -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* fips,rsa: Prevent usage of X9.31 keygen in FIPS mode.Jakub Jelen2022-12-063-7/+54
| | | | | | | | | | | | | * cipher/rsa.c (rsa_generate): Do not accept use-x931 or derive-parms in FIPS mode. * tests/pubkey.c (get_keys_x931_new): Expect failure in FIPS mode. (check_run): Skip checking X9.31 keys in FIPS mode. * doc/gcrypt.texi: Document "test-parms" and clarify some cases around the X9.31 keygen. -- Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* rsa: Prevent usage of long salt in FIPS modeJakub Jelen2022-11-303-2/+33
| | | | | | | | | * cipher/rsa-common.c (_gcry_rsa_pss_encode): Prevent usage of large salt lengths (_gcry_rsa_pss_verify): Ditto. * tests/basic.c (check_pubkey_sign): Check longer salt length fails in FIPS mode * tests/t-rsa-pss.c (one_test_sexp): Fix function name in error message
* random:w32: Don't emit message for diskperf when it's not useful.NIIBE Yutaka2022-11-211-2/+9
| | | | | | | | * random/rndw32.c (slow_gatherer): Suppress emitting by log_info. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* fips: Mark AES key wrapping as approved.Jakub Jelen2022-11-181-0/+1
| | | | | | | | | | * src/fips.c (_gcry_fips_indicator_cipher): Add key wrapping mode as approved. -- GnuPG-bug-id: 5512 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* pkdf2: Add checks for FIPS.Jakub Jelen2022-11-181-0/+12
| | | | | | | | | | * cipher/kdf.c (_gcry_kdf_pkdf2): Require 8 chars passphrase for FIPS. Set bounds for salt length and iteration count in FIPS mode. -- GnuPG-bug-id: 6039 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* doc: Update document for pkg-config and libgcrypt.m4.NIIBE Yutaka2022-11-151-28/+18
| | | | | | -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Prefer gpgrt-config when available.NIIBE Yutaka2022-11-011-2/+2
| | | | | | | | | | | | | * src/libgcrypt.m4: Overriding the decision by --with-libgcrypt-prefix, use gpgrt-config libgcrypt when gpgrt-config is available. -- This may offer better migration. GnuPG-bug-id: 5034 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* sha3-avx512: fix for "x32" targetJussi Kivilinna2022-10-261-3/+6
| | | | | | | | | | * cipher/keccak.c (_gcry_keccak_absorb_blocks_avx512): Change size_t to u64; change 'const byte **new_lanes' to 'u64 *new_lanes'. (keccak_absorb_lanes64_avx512): Get new lines pointer from assembly through 'u64' type. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* serpent: accelerate XTS and ECB modesJussi Kivilinna2022-10-264-1/+317
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/serpent-armv7-neon.S (_gcry_serpent_neon_blk8): New. * cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_blk16): New. * cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_blk8): New. * cipher/serpent.c (_gcry_serpent_sse2_blk8) (_gcry_serpent_avx2_blk16, _gcry_serpent_neon_blk8) (_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt) (serpent_crypt_blk1_16, serpent_encrypt_blk1_16) (serpent_decrypt_blk1_16): New. (serpent_setkey): Setup XTS and ECB bulk functions. -- Benchmark on AMD Ryzen 9 7900X: Before: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 5.42 ns/B 176.0 MiB/s 30.47 c/B 5625 ECB dec | 4.82 ns/B 197.9 MiB/s 27.11 c/B 5625 XTS enc | 5.57 ns/B 171.3 MiB/s 31.31 c/B 5625 XTS dec | 4.99 ns/B 191.1 MiB/s 28.07 c/B 5625 After: SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.708 ns/B 1347 MiB/s 3.98 c/B 5625 ECB dec | 0.694 ns/B 1373 MiB/s 3.91 c/B 5625 XTS enc | 0.766 ns/B 1246 MiB/s 4.31 c/B 5625 XTS dec | 0.754 ns/B 1264 MiB/s 4.24 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* serpent: fix compiler warning on 32-bit ARMJussi Kivilinna2022-10-261-3/+4
| | | | | | | | * cipher/serpent.c (_gcry_serpent_ocb_crypt) (_gcry_serpent_ocb_auth) [USE_NEON]: Cast "Ls" to 'const void **'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* twofish: accelerate XTS and ECB modesJussi Kivilinna2022-10-263-3/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/twofish-amd64.S (_gcry_twofish_amd64_blk3): New. * cipher/twofish-avx2-amd64.S (_gcry_twofish_avx2_blk16): New. (_gcry_twofish_xts_crypt, _gcry_twofish_ecb_crypt) (_gcry_twofish_avx2_blk16, _gcry_twofish_amd64_blk3) (twofish_crypt_blk1_16, twofish_encrypt_blk1_16) (twofish_decrypt_blk1_16): New. (twofish_setkey): Setup XTS and ECB bulk functions. -- Benchmark on AMD Ryzen 9 7900X: Before: TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 2.52 ns/B 378.2 MiB/s 14.18 c/B 5625 ECB dec | 2.51 ns/B 380.2 MiB/s 14.11 c/B 5625 XTS enc | 2.65 ns/B 359.9 MiB/s 14.91 c/B 5625 XTS dec | 2.63 ns/B 362.0 MiB/s 14.60 c/B 5541 After: TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 1.60 ns/B 594.8 MiB/s 9.02 c/B 5625 ECB dec | 1.60 ns/B 594.8 MiB/s 9.02 c/B 5625 XTS enc | 1.66 ns/B 573.9 MiB/s 9.35 c/B 5625 XTS dec | 1.67 ns/B 569.6 MiB/s 9.41 c/B 5619±2 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4: accelerate ECB (for benchmarking)Jussi Kivilinna2022-10-261-0/+32
| | | | | | | | | | | | | | | | | | | | | | | * cipher/sm4.c (_gcry_sm4_ecb_crypt): New. (sm4_setkey): Setup ECB bulk function. -- Benchmark on AMD Ryzen 9 7900X: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 4.75 ns/B 200.6 MiB/s 26.74 c/B 5625 ECB dec | 4.79 ns/B 199.3 MiB/s 26.92 c/B 5625 After (OCB for reference): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.252 ns/B 3782 MiB/s 1.42 c/B 5624 ECB dec | 0.253 ns/B 3770 MiB/s 1.42 c/B 5625 OCB enc | 0.277 ns/B 3446 MiB/s 1.56 c/B 5625 OCB dec | 0.281 ns/B 3399 MiB/s 1.54 c/B 5500 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4: fix lookup-table prefetchingJussi Kivilinna2022-10-261-2/+16
| | | | | | | | | | | * cipher/sm4.c (sm4_expand_key): Prefetch sbox table. (sm4_get_crypt_blk1_16_fn): Do not prefetch sbox table. (sm4_expand_key, _gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec) (_gcry_sm4_cfb_dec): Prefetch sbox table if table look-up implementation is used. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* camellia: accelerate ECB (for benchmarking)Jussi Kivilinna2022-10-262-4/+53
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/bulkhelp.h (bulk_ecb_crypt_128): New. * cipher/camellia-glue.c (_gcry_camellia_ecb_crypt): New. (camellia_setkey): Select ECB bulk function with AESNI/AVX2, VAES/AVX2 and GFNI/AVX2. -- Benchmark on AMD Ryzen 9 7900X: Before: CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 3.27 ns/B 291.8 MiB/s 18.38 c/B 5625 ECB dec | 3.25 ns/B 293.3 MiB/s 18.29 c/B 5625 After (OCB for reference): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.146 ns/B 6533 MiB/s 0.803 c/B 5500 ECB dec | 0.149 ns/B 6384 MiB/s 0.822 c/B 5500 OCB enc | 0.170 ns/B 5608 MiB/s 0.957 c/B 5625 OCB dec | 0.175 ns/B 5452 MiB/s 0.984 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-vaes: align asm functionsJussi Kivilinna2022-10-261-0/+7
| | | | | | | * cipher/rijndael-vaes-avx2-amd64.S: Align functions to 16 bytes. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: add ECB acceleration (for benchmarking purposes)Jussi Kivilinna2022-10-269-77/+997
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'. * cipher/cipher.c (do_ecb_crypt): Use bulk function if available. * cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label '.Ldeclast' to '.Lenclast'. (_gcry_aes_aesni_ecb_crypt): New. * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce) (_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change return value from void to size_t. (ocb_crypt_fn_t, xts_crypt_fn_t): Remove. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove indirect function call; Return value from called function (allows tail call optimization). (_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows tail call optimization). (_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce) (_gcry_aes_armv8_ce_ecb_crypt): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ecb_crypt_amd64): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64) (_gcry_aes_vaes_ecb_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt) (_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New. (do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE. -- Benchmark on AMD Ryzen 9 7900X: Before (OCB for reference): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1 ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608 OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625 OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625 After: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625 ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: update powerpc macros from GCCJussi Kivilinna2022-10-261-131/+81
| | | | | | | | | | | * mpi/longlong.h [__powerpc__, __powerpc64__]: Update macros. -- Update longlong.h powerpc macros with more up to date versions from GCC's longlong.h. Note, GCC's version is licensed under LGPLv2.1+. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* hwf-x86: enable VPGATHER usage for AMD CPUs with AVX512Jussi Kivilinna2022-10-261-74/+83
| | | | | | | | | | | | | | | | | | | | | | | | * src/hwf-x86.c (detect_x86_gnuc): Move model based checks and forced soft hwfeatures enablement at end; Enable VPGATHER for AMD CPUs with AVX512. -- AMD Zen4 is able to benefit from VPGATHER based table-lookup for Twofish. Benchmark on Ryzen 9 7900X: Before: TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CTR enc | 1.79 ns/B 532.8 MiB/s 10.07 c/B 5625 CTR dec | 1.79 ns/B 532.6 MiB/s 10.07 c/B 5625 After (~10% faster): TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CTR enc | 1.61 ns/B 593.5 MiB/s 9.05 c/B 5631±2 CTR dec | 1.61 ns/B 590.8 MiB/s 9.08 c/B 5625 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sha512-avx512: enable only on Intel CPUs for nowJussi Kivilinna2022-10-261-1/+1
| | | | | | | | | | | | | | | | * cipher/sha512.c (sha512_init_common): Enable AVX512 implementation only for Intel CPUs. -- SHA512-AVX512 implementation is slightly slower than AVX2 variant on AMD Zen4 (AVX512 4.88 cpb, AVX2 4.35 cpb). This is likely because AVX512 implementation uses vector registers for round function unlike AVX2 where general purpose registers are used for round function. On Zen4, message expansion and round function then end up competing for narrower vector execution bandwidth and gives slower performance. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* hmac,hkdf: Check the HMAC key length in FIPS mode.Jakub Jelen2022-10-261-0/+4
| | | | | | | | | | * src/visibility.c (gcry_md_setkey): Add the check here, too. -- GnuPG-bug-id: 6039 Fixes-commit: 58c92098d053aae7c78cc42bdd7c80c13efc89bb Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* Revert "kdf:pkdf2: Require longer input when FIPS mode."Jakub Jelen2022-10-261-4/+0
| | | | | | | | | | | * cipher/kdf.c (_gcry_kdf_pkdf2): Remove the length limitation of passphrase input length. -- This reverts commit 857e6f467d0fc9fd858a73d84122695425970075. Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* build: Update gpg-error.m4.NIIBE Yutaka2022-10-241-1/+5
| | | | | | | | * m4/gpg-error.m4: Update from libgpg-error 1.46. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* tests: Use proper format string for size_tJakub Jelen2022-10-191-2/+2
| | | | Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* cipher: Do not run RSA encryption selftest by defaultJakub Jelen2022-10-191-4/+7
| | | | | * cipher/rsa.c (selftests_rsa): Skip encryption selftest as this operation is not claimed as part of the certification.
* Revert "tests: Expect the RSA PKCS #1.5 encryption to fail in FIPS mode"Jakub Jelen2022-10-192-20/+5
| | | | | This reverts commit f736f3c70182d9c948f9105eb769c47c5578df35. The pubkey encryption has already separate explicit FIPS service indicator.
* Revert "Do not allow PKCS #1.5 padding for encryption in FIPS"Jakub Jelen2022-10-192-9/+1
| | | | | This reverts commit c7709f7b23848abf4ba65cb99cb2a9e9c7ebdefc. The pubkey encryption has already separate explicit FIPS service indicator.
* Revert "tests: Expect the OEAP tests to fail in FIPS mode."Jakub Jelen2022-10-192-22/+5
| | | | | This reverts commit 249ca431ef881d510b90a5d3db9cd8507c4d697b. The pubkey encryption has already separate explicit FIPS service indicator.
* Revert "fips: Disable RSA-OAEP padding in FIPS mode."Jakub Jelen2022-10-192-6/+2
| | | | | This reverts commit e552e37983da0c54840786eeff34481685fde1e9. The pubkey encryption has already separate explicit FIPS service indicator.
* fips: Mark gcry_pk_encrypt/decrypt function non-approved.Jakub Jelen2022-10-191-1/+3
| | | | | | | | | * src/fips.c (_gcry_fips_indicator_function): Add gcry_pk_encrypt/decrypt as non-approved. -- Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* fips: Fix fips indicator function.Jakub Jelen2022-10-191-2/+2
| | | | | | | | | | * src/fips.c (_gcry_fips_indicator_function): Fix typo in sign/verify function names. -- Fixes-commit: 05a9c9d1ba1db6c1cd160fba979e9ddf4700a0c0 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* doc: fix RFC reference for GCM-SIVJussi Kivilinna2022-10-081-1/+1
| | | | | | | | * doc/gcrypt.texi: Fix GCM-SIV RFC reference to RFC-8452. -- GnuPG-bug-id: 6232 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong.h: i386: use tzcnt instruction for trailing zerosJussi Kivilinna2022-10-081-1/+1
| | | | | | | | | | | | * mpi/longlong.h [__i386__] (count_trailing_zeros): Add 'rep' prefix for 'bsfq'. -- "rep;bsf" aka "tzcnt" is new instruction with well defined operation on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt" functions as old "bsf" with undefined behaviour on zero input. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong.h: x86-64: use tzcnt instruction for trailing zerosJussi Kivilinna2022-10-081-1/+1
| | | | | | | | | | | | * mpi/longlong.h [__x86_64__] (count_trailing_zeros): Add 'rep' prefix for 'bsfq'. -- "rep;bsf" aka "tzcnt" is new instruction with well defined operation on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt" functions as old "bsf" with undefined behaviour on zero input. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: fix generic smul_ppmm ifdefJussi Kivilinna2022-10-081-1/+1
| | | | | | | | * mpi/longlong.h [!umul_ppmm] (smul_ppmm): Change ifdef from !defined(umul_ppmm) to !defined(smul_ppmm). -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>