summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* ppc: enable P10 assembly with ENABLE_FORCE_SOFT_HWFEATURES on arch-3.00Jussi Kivilinna2022-06-123-2/+21
| | | | | | | | | | | | | | * cipher/chacha20.c (chacha20_do_setkey) [USE_PPC_VEC]: Enable P10 assembly for HWF_PPC_ARCH_3_00 if ENABLE_FORCE_SOFT_HWFEATURES is defined. * cipher/poly1305.c (poly1305_init) [POLY1305_USE_PPC_VEC]: Likewise. * cipher/rijndael.c (do_setkey) [USE_PPC_CRYPTO_WITH_PPC9LE]: Likewise. --- This change allows testing P10 implementations with P9 and with QEMU-PPC. GnuPG-bug-id: 6006 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Chacha20/poly1305 - Optimized chacha20/poly1305 for P10 operationDanny Tsen2022-06-127-3/+1804
| | | | | | | | | | | | | | | | | | | * configure.ac: Added chacha20 and poly1305 assembly implementations. * cipher/chacha20-p10le-8x.s: (New) - support 8 blocks (512 bytes) unrolling. * cipher/poly1305-p10le.s: (New) - support 4 blocks (128 bytes) unrolling. * cipher/Makefile.am: Added new chacha20 and poly1305 files. * cipher/chacha20.c: Added PPC p10 le support for 8x chacha20. * cipher/poly1305.c: Added PPC p10 le support for 4x poly1305. * cipher/poly1305-internal.h: Added PPC p10 le support for poly1305. --- GnuPG-bug-id: 6006 Signed-off-by: Danny Tsen <dtsen@us.ibm.com> [jk: cosmetic changes to C code] [jk: fix building on ppc64be] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* kdf: Add support for One-Step KDF with MAC.NIIBE Yutaka2022-06-083-10/+227
| | | | | | | | | | | | * src/gcrypt.h.in (GCRY_KDF_ONESTEP_KDF_MAC): New. * cipher/kdf.c (onestep_kdf_mac_open, onestep_kdf_mac_compute): New. (onestep_kdf_mac_final, onestep_kdf_mac_close): New. (_gcry_kdf_open, _gcry_kdf_compute, _gcry_kdf_final, _gcry_kdf_close): Add support for GCRY_KDF_ONESTEP_KDF_MAC. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* kdf: Add One-Step KDF with hash.NIIBE Yutaka2022-06-073-6/+237
| | | | | | | | | | | | | | | * src/gcrypt.h.in (GCRY_KDF_ONESTEP_KDF): New. * cipher/kdf.c (onestep_kdf_open, onestep_kdf_compute): New. (onestep_kdf_final): New. (_gcry_kdf_open, _gcry_kdf_compute, _gcry_kdf_final): Add GCRY_KDF_ONESTEP_KDF support. * tests/t-kdf.c (check_onestep_kdf): Add the test. (main): Call check_onestep_kdf. -- GnuPG-bug-id: 5964 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Fix for struct gcry_thread_cbs.NIIBE Yutaka2022-06-071-1/+1
| | | | | | | | | * src/gcrypt.h.in (struct gcry_thread_cbs): Since it's no use any more, even internally, use _GCRY_GCC_ATTR_DEPRECATED instead. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* secmem: Remove RISC OS support.NIIBE Yutaka2022-06-011-6/+0
| | | | | | | | * src/secmem.c [__riscos__]: Remove. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* secmem: Clean up ERRNO handling.NIIBE Yutaka2022-06-011-10/+6
| | | | | | | | | * src/secmem.c (lock_pool_pages): Use ERR only for the return value from mlock. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* secmem: Remove getting cap_ipc_lock by capabilities support.NIIBE Yutaka2022-06-011-42/+1
| | | | | | | | | | | * src/secmem.c (lock_pool_pages): Remove escalation of the capability. -- With CAP_SETPCAP, it might make sense before Linux 2.6.24 when file capabilityes were not supported. But not any more. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* tests: Fix copy paste errorJakub Jelen2022-05-311-1/+1
| | | | | | | | -- * tests/basic.c (check_ocb_cipher_checksum): Check the right value for errors Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* Fix memory leaks in testsJakub Jelen2022-05-3111-19/+63
| | | | | | | | | | | | | | | | | | | | | | | | * tests/aeswrap.c (check_one_with_padding): Free hd on error paths * tests/basic.c (check_ccm_cipher): Free context on error paths (check_ocb_cipher_checksum): Ditto. (do_check_xts_cipher): Ditto. (check_gost28147_cipher_basic): Ditto. * tests/bench-slope.c (bench_ecc_init): Free memory on invalid input. * tests/t-cv25519.c (test_it): Free memory on error path * tests/t-dsa.c (hex2buffer): Free memory on error path * tests/t-ecdsa.c (hex2buffer): Free memory on error path (one_test_sexp): Cleanup memory on exit * tests/t-mpi-point.c (check_ec_mul): Free memory on error (check_ec_mul_reduction): Ditto * tests/t-rsa-15.c (hex2buffer): Ditto * tests/t-rsa-pss.c (hex2buffer): Ditto * tests/t-x448.c (test_it): Free memory on error path * tests/testdrv.c (my_spawn): Free memory on error paths -- GnuPG-bug-id: 5973 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* cipher: Allow verification of small RSA signatures in FIPS modeJakub Jelen2022-05-191-2/+24
| | | | | | | | | | * cipher/rsa.c (rsa_check_keysize): Formatting. (rsa_check_verify_keysize): New function. (rsa_verify): Allow using smaller keys for verification. -- GnuPG-bug-id: 5975 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* Fix internal declaration of _gcry_kdf_compute.NIIBE Yutaka2022-05-171-2/+2
| | | | | | | | | * src/gcrypt-int.h (_gcry_kdf_compute): Return gcry_err_code_t. -- GnuPG-bug-id: 5980 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi: Allow building with --disable-asm for HPPA.NIIBE Yutaka2022-05-171-2/+2
| | | | | | | | | | * mpi/longlong.h [__hppa] (udiv_qrnnd): Only define when assembler is enabled. -- GnuPG-bug-id: 5976 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* aarch64-asm: use ADR for getting pointers for local labelsJussi Kivilinna2022-05-1510-29/+18
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-aarch64.h (GET_DATA_POINTER): Remove. (GET_LOCAL_POINTER): New. * cipher/camellia-aarch64.S: Use GET_LOCAL_POINTER instead of ADR instruction directly. * cipher/chacha20-aarch64.S: Use GET_LOCAL_POINTER instead of GET_DATA_POINTER. * cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise. * cipher/crc-armv8-aarch64-ce.S: Likewise. * cipher/sha1-armv8-aarch64-ce.S: Likewise. * cipher/sha256-armv8-aarch64-ce.S: Likewise. * cipher/sm3-aarch64.S: Likewise. * cipher/sm3-armv8-aarch64-ce.S: Likewise. * cipher/sm4-aarch64.S: Likewise. --- Switch to use ADR instead of ADRP/LDR or ADRP/ADD for getting data pointers within assembly files. ADR is more portable across targets and does not require labels to be declared in GOT tables. Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: move CBC/CFB/CTR self-tests to tests/basicJussi Kivilinna2022-05-1117-1050/+780
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Remove 'cipher-selftest.c' and 'cipher-selftest.h'. * cipher/cipher-selftest.c: Remove (refactor these tests to tests/basic.c). * cipher/cipher-selftest.h: Remove. * cipher/blowfish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/camellia-glue.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/cast5.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/des.c (bulk_selftest_setkey, selftest_ctr, selftest_cbc) (selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/rijndael.c (selftest_basic_128, selftest_basic_192) (selftest_basic_256): Allocate context from stack instead of heap and handle alignment manually. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/serpent.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/sm4.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/twofish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * tests/basic.c (buf_xor, cipher_cbc_bulk_test, buf_xor_2dst) (cipher_cfb_bulk_test, cipher_ctr_bulk_test): New. (check_ciphers): Run cipher_cbc_bulk_test(), cipher_cfb_bulk_test() and cipher_ctr_bulk_test() for block ciphers. --- CBC/CFB/CTR bulk self-tests are quite computationally heavy and slow down use cases where application opens cipher context once, does processing and exits. Better place for these tests is in `tests/basic`. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* camellia: add amd64 GFNI/AVX512 implementationJussi Kivilinna2022-05-119-43/+1873
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'camellia-gfni-avx512-amd64.S'. * cipher/bulkhelp.h (bulk_ocb_prepare_L_pointers_array_blk64): New. * cipher/camellia-aesni-avx2-amd64.h: Rename internal functions from "__camellia_???" to "FUNC_NAME(???)"; Minor changes to comments. * cipher/camellia-gfni-avx512-amd64.S: New. * cipher/camellia-gfni.c (USE_GFNI_AVX512): New. (CAMELLIA_context): Add 'use_gfni_avx512'. (_gcry_camellia_gfni_avx512_ctr_enc, _gcry_camellia_gfni_avx512_cbc_dec) (_gcry_camellia_gfni_avx512_cfb_dec, _gcry_camellia_gfni_avx512_ocb_enc) (_gcry_camellia_gfni_avx512_ocb_dec) (_gcry_camellia_gfni_avx512_enc_blk64) (_gcry_camellia_gfni_avx512_dec_blk64, avx512_burn_stack_depth): New. (camellia_setkey): Use GFNI/AVX512 if supported by CPU. (camellia_encrypt_blk1_64, camellia_decrypt_blk1_64): New. (_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec, _gcry_camellia_cfb_dec) (_gcry_camellia_ocb_crypt) [USE_GFNI_AVX512]: Add GFNI/AVX512 code path. (_gcry_camellia_xts_crypt): Change parallel block size from 32 to 64. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Increase test block size. * cipher/chacha20-amd64-avx512.S: Clear k-mask registers with xor. * cipher/poly1305-amd64-avx512.S: Likewise. * cipher/sha512-avx512-amd64.S: Likewise. --- Benchmark on Intel i3-1115G4 (tigerlake): Before (GFNI/AVX2): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.356 ns/B 2679 MiB/s 1.46 c/B 4089 CFB dec | 0.374 ns/B 2547 MiB/s 1.53 c/B 4089 CTR enc | 0.409 ns/B 2332 MiB/s 1.67 c/B 4089 CTR dec | 0.406 ns/B 2347 MiB/s 1.66 c/B 4089 XTS enc | 0.430 ns/B 2216 MiB/s 1.76 c/B 4090 XTS dec | 0.433 ns/B 2201 MiB/s 1.77 c/B 4090 OCB enc | 0.460 ns/B 2071 MiB/s 1.88 c/B 4089 OCB dec | 0.492 ns/B 1939 MiB/s 2.01 c/B 4089 After (GFNI/AVX512): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.207 ns/B 4600 MiB/s 0.827 c/B 3989 CFB dec | 0.207 ns/B 4610 MiB/s 0.825 c/B 3989 CTR enc | 0.218 ns/B 4382 MiB/s 0.868 c/B 3990 CTR dec | 0.217 ns/B 4389 MiB/s 0.867 c/B 3990 XTS enc | 0.330 ns/B 2886 MiB/s 1.35 c/B 4097±4 XTS dec | 0.328 ns/B 2904 MiB/s 1.35 c/B 4097±3 OCB enc | 0.246 ns/B 3879 MiB/s 0.981 c/B 3990 OCB dec | 0.247 ns/B 3855 MiB/s 0.987 c/B 3990 CBC dec: 70% faster CFB dec: 80% faster CTR: 87% faster XTS: 31% faster OCB: 92% faster Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: Fix for 64-bit for _gcry_mpih_cmp_ui.NIIBE Yutaka2022-05-101-1/+8
| | | | | | | | | | | * mpi/mpih-const-time.c (_gcry_mpih_cmp_ui): Compare 64-bit value correctly. -- Reported-by: Guido Vranken <guidovranken@gmail.com> GnuPG-bug-id: 5970 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* random: Fix rndjent for Windows.NIIBE Yutaka2022-05-102-1/+25
| | | | | | | | | | | | | * random/jitterentropy-base-user.h [HAVE_W32_SYSTEM] (jent_ncpu): Implement. * random/rndjent.c (_WIN32_WINNT): Define for GetNativeSystemInfo. (EOPNOTSUPP): Define when not available. -- Reported-by: Eli Zaretskii GnuPG-bug-id: 5891 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* tests/basic: add testing for partial bulk processing code pathsJussi Kivilinna2022-04-301-10/+23
| | | | | | | | | | * tests/basic.c (check_one_cipher_core): Add 'split_mode' parameter and handling for split_mode==1. (check_one_cipher): Use split_mode==0 for existing check_one_cipher_core calls; Add new large buffer check with split_mode==1. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4-aesni-avx2: add generic 1 to 16 block bulk processing functionJussi Kivilinna2022-04-302-13/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/sm4-aesni-avx2-amd64.S: Remove unnecessary vzeroupper at function entries. (_gcry_sm4_aesni_avx2_crypt_blk1_16): New. * cipher/sm4.c (_gcry_sm4_aesni_avx2_crypt_blk1_16) (sm4_aesni_avx2_crypt_blk1_16): New. (sm4_get_crypt_blk1_16_fn) [USE_AESNI_AVX2]: Add 'sm4_aesni_avx2_crypt_blk1_16'. -- Benchmark AMD Ryzen 5800X: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 1.48 ns/B 643.2 MiB/s 7.19 c/B 4850 XTS dec | 1.48 ns/B 644.3 MiB/s 7.18 c/B 4850 After (1.37x faster): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 1.07 ns/B 888.7 MiB/s 5.21 c/B 4850 XTS dec | 1.07 ns/B 889.4 MiB/s 5.20 c/B 4850 Benchmark on Intel i5-6200U 2.30GHz: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 2.95 ns/B 323.0 MiB/s 8.25 c/B 2792 XTS dec | 2.95 ns/B 323.0 MiB/s 8.24 c/B 2792 After (1.64x faster): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 1.79 ns/B 531.4 MiB/s 5.01 c/B 2791 XTS dec | 1.79 ns/B 531.6 MiB/s 5.01 c/B 2791 Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add SM4 x86-64/GFNI/AVX2 implementationJussi Kivilinna2022-04-305-42/+1467
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'sm4-gfni-avx2-amd64.S'. * cipher/sm4-aesni-avx2-amd64.S: New. * cipher/sm4.c (USE_GFNI_AVX2): New. (SM4_context): Add 'use_gfni_avx2'. (crypt_blk1_8_fn_t): Rename to... (crypt_blk1_16_fn_t): ...this. (sm4_aesni_avx_crypt_blk1_8): Rename to... (sm4_aesni_avx_crypt_blk1_16): ...this and add handling for 9 to 16 input blocks. (_gcry_sm4_gfni_avx_expand_key, _gcry_sm4_gfni_avx2_ctr_enc) (_gcry_sm4_gfni_avx2_cbc_dec, _gcry_sm4_gfni_avx2_cfb_dec) (_gcry_sm4_gfni_avx2_ocb_enc, _gcry_sm4_gfni_avx2_ocb_dec) (_gcry_sm4_gfni_avx2_ocb_auth, _gcry_sm4_gfni_avx2_crypt_blk1_16) (sm4_gfni_avx2_crypt_blk1_16): New. (sm4_aarch64_crypt_blk1_8): Rename to... (sm4_aarch64_crypt_blk1_16): ...this and add handling for 9 to 16 input blocks. (sm4_armv8_ce_crypt_blk1_8): Rename to... (sm4_armv8_ce_crypt_blk1_16): ...this and add handling for 9 to 16 input blocks. (sm4_expand_key): Add GFNI/AVX2 path. (sm4_setkey): Enable GFNI/AVX2 implementation if HW features available; Disable AESNI implementations when GFNI implementation is enabled. (sm4_encrypt) [USE_GFNI_AVX2]: New. (sm4_decrypt) [USE_GFNI_AVX2]: New. (sm4_get_crypt_blk1_8_fn): Rename to... (sm4_get_crypt_blk1_16_fn): ...this; Update to use *_blk1_16 functions; Add GFNI/AVX2 selection. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Add GFNI/AVX2 path; Widen generic bulk processing from 8 blocks to 16 blocks. (_gcry_sm4_xts_crypt): Widen generic bulk processing from 8 blocks to 16 blocks. -- Benchmark on Intel i3-1115G4 (tigerlake): Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 10.34 ns/B 92.21 MiB/s 42.29 c/B 4089 ECB dec | 10.34 ns/B 92.24 MiB/s 42.29 c/B 4090 CBC enc | 11.06 ns/B 86.26 MiB/s 45.21 c/B 4090 CBC dec | 1.13 ns/B 844.8 MiB/s 4.62 c/B 4090 CFB enc | 11.06 ns/B 86.27 MiB/s 45.22 c/B 4090 CFB dec | 1.13 ns/B 846.0 MiB/s 4.61 c/B 4090 CTR enc | 1.14 ns/B 834.3 MiB/s 4.67 c/B 4089 CTR dec | 1.14 ns/B 834.5 MiB/s 4.67 c/B 4089 XTS enc | 1.93 ns/B 494.1 MiB/s 7.89 c/B 4090 XTS dec | 1.94 ns/B 492.5 MiB/s 7.92 c/B 4090 OCB enc | 1.16 ns/B 823.3 MiB/s 4.74 c/B 4090 OCB dec | 1.16 ns/B 818.8 MiB/s 4.76 c/B 4089 OCB auth | 1.15 ns/B 831.0 MiB/s 4.69 c/B 4089 After: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 8.39 ns/B 113.6 MiB/s 34.33 c/B 4090 ECB dec | 8.40 ns/B 113.5 MiB/s 34.35 c/B 4090 CBC enc | 9.45 ns/B 101.0 MiB/s 38.63 c/B 4089 CBC dec | 0.650 ns/B 1468 MiB/s 2.66 c/B 4090 CFB enc | 9.44 ns/B 101.1 MiB/s 38.59 c/B 4090 CFB dec | 0.660 ns/B 1444 MiB/s 2.70 c/B 4090 CTR enc | 0.664 ns/B 1437 MiB/s 2.71 c/B 4090 CTR dec | 0.664 ns/B 1437 MiB/s 2.71 c/B 4090 XTS enc | 0.756 ns/B 1262 MiB/s 3.09 c/B 4090 XTS dec | 0.757 ns/B 1260 MiB/s 3.10 c/B 4090 OCB enc | 0.673 ns/B 1417 MiB/s 2.75 c/B 4090 OCB dec | 0.675 ns/B 1413 MiB/s 2.76 c/B 4090 OCB auth | 0.672 ns/B 1418 MiB/s 2.75 c/B 4090 ECB: 1.2x faster CBC-enc / CFB-enc: 1.17x faster CBC-dec / CFB-dec / CTR / OCB: 1.7x faster XTS: 2.5x faster Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4: add XTS bulk processingJussi Kivilinna2022-04-301-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/sm4.c (_gcry_sm4_xts_crypt): New. (sm4_setkey): Set XTS bulk function. -- Benchmark on Ryzen 5800X: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 7.28 ns/B 131.0 MiB/s 35.31 c/B 4850 XTS dec | 7.29 ns/B 130.9 MiB/s 35.34 c/B 4850 After (4.8x faster): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 1.49 ns/B 638.6 MiB/s 7.24 c/B 4850 XTS dec | 1.49 ns/B 639.3 MiB/s 7.24 c/B 4850 Benchmark on Intel i5-6200U 2.30GHz: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 13.41 ns/B 71.10 MiB/s 37.45 c/B 2792 XTS dec | 13.43 ns/B 71.03 MiB/s 37.49 c/B 2792 After (4.54x faster): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 2.96 ns/B 322.7 MiB/s 8.25 c/B 2792 XTS dec | 2.96 ns/B 322.5 MiB/s 8.26 c/B 2792 Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* camellia-avx2: add bulk processing for XTS modeJussi Kivilinna2022-04-292-0/+107
| | | | | | | | | | | | | | | | | | | | | | * cipher/bulkhelp.h (bulk_xts_crypt_128): New. * cipher/camellia-glue.c (_gcry_camellia_xts_crypt): New. (camellia_set_key) [USE_AESNI_AVX2]: Set XTS bulk function if AVX2 implementation is available. -- Benchmark on AMD Ryzen 5800X: Before: CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 3.79 ns/B 251.8 MiB/s 18.37 c/B 4850 XTS dec | 3.77 ns/B 253.2 MiB/s 18.27 c/B 4850 After (6.8x faster): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz XTS enc | 0.554 ns/B 1720 MiB/s 2.69 c/B 4850 XTS dec | 0.541 ns/B 1762 MiB/s 2.63 c/B 4850 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* camellia-avx2: add partial parallel block processingJussi Kivilinna2022-04-292-80/+438
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/camellia-aesni-avx2-amd64.h: Remove unnecessary vzeroupper from function entry. (enc_blk1_32, dec_blk1_32): New. * cipher/camellia-glue.c (avx_burn_stack_depth) (avx2_burn_stack_depth): Move outside of bulk functions to deduplicate. (camellia_setkey): Disable AESNI & VAES implementation when GFNI implementation is enabled. (_gcry_camellia_aesni_avx2_enc_blk1_32) (_gcry_camellia_aesni_avx2_dec_blk1_32) (_gcry_camellia_vaes_avx2_enc_blk1_32) (_gcry_camellia_vaes_avx2_dec_blk1_32) (_gcry_camellia_gfni_avx2_enc_blk1_32) (_gcry_camellia_gfni_avx2_dec_blk1_32, camellia_encrypt_blk1_32) (camellia_decrypt_blk1_32): New. (_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec, _gcry_camellia_cfb_dec) (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Use new bulk processing helpers from 'bulkhelp.h' and 'camellia_encrypt_blk1_32' and 'camellia_decrypt_blk1_32' for partial parallel processing. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processingJussi Kivilinna2022-04-242-149/+260
| | | | | | | | | | | | * cipher/bulkhelp.h (bulk_crypt_fn_t, bulk_ctr_enc_128) (bulk_cbc_dec_128, bulk_cfb_dec_128, bulk_ocb_crypt_128) (bulk_ocb_auth_128): New. * cipher/sm4.c (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec) (_gcry_sm4_cfb_dec, _gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Switch to use helper functions from 'bulkhelp.h'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Move bulk OCB L pointer array setup code to common headerJussi Kivilinna2022-04-245-248/+132
| | | | | | | | | | | | | | | | * cipher/bulkhelp.h: New. * cipher/camellia-glue.c (_gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_crypt): Use new `bulk_ocb_prepare_L_pointers_array_blkXX` function for OCB L pointer array setup. * cipher/serpent.c (_gcry_serpent_ocb_crypt) (_gcry_serpent_ocb_auth): Likewise. * cipher/sm4.c (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Likewise. * cipher/twofish.c (_gcry_twofish_ocb_crypt) (_gcry_twofish_ocb_auth): Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sm4: deduplicate bulk processing function selectionJussi Kivilinna2022-04-241-145/+45
| | | | | | | | | | | | | | * cipher/sm4.c (crypt_blk1_8_fn_t): New. (sm4_aesni_avx_crypt_blk1_8, sm4_aarch64_crypt_blk1_8) (sm4_armv8_ce_crypt_blk1_8, sm4_crypt_blocks): Change first parameter to void pointer type. (sm4_get_crypt_blk1_8_fn): New. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Use sm4_get_crypt_blk1_8_fn for selecting crypt_blk1_8. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add GFNI/AVX2 implementation of CamelliaJussi Kivilinna2022-04-245-63/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add "camellia-gfni-avx2-amd64.S". * cipher/camellia-aesni-avx2-amd64.h [CAMELLIA_GFNI_BUILD]: Add GFNI support. * cipher/camellia-gfni-avx2-amd64.S: New. * cipher/camellia-glue.c (USE_GFNI_AVX2): New. (CAMELLIA_context) [USE_AESNI_AVX2]: New member "use_gfni_avx2". [USE_GFNI_AVX2] (_gcry_camellia_gfni_avx2_ctr_enc) (_gcry_camellia_gfni_avx2_cbc_dec, _gcry_camellia_gfni_avx2_cfb_dec) (_gcry_camellia_gfni_avx2_ocb_enc, _gcry_camellia_gfni_avx2_ocb_dec) (_gcry_camellia_gfni_avx2_ocb_auth): New. (camellia_setkey) [USE_GFNI_AVX2]: Enable GFNI if supported by HW. (_gcry_camellia_ctr_enc) [USE_GFNI_AVX2]: Add GFNI support. (_gcry_camellia_cbc_dec) [USE_GFNI_AVX2]: Add GFNI support. (_gcry_camellia_cfb_dec) [USE_GFNI_AVX2]: Add GFNI support. (_gcry_camellia_ocb_crypt) [USE_GFNI_AVX2]: Add GFNI support. (_gcry_camellia_ocb_auth) [USE_GFNI_AVX2]: Add GFNI support. * configure.ac: Add "camellia-gfni-avx2-amd64.lo". -- Benchmark on Intel Core i3-1115G4 (tigerlake): Before (VAES/AVX2 implementation): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.579 ns/B 1646 MiB/s 2.37 c/B 4090 CFB dec | 0.579 ns/B 1648 MiB/s 2.37 c/B 4089 CTR enc | 0.586 ns/B 1628 MiB/s 2.40 c/B 4090 CTR dec | 0.587 ns/B 1626 MiB/s 2.40 c/B 4090 OCB enc | 0.607 ns/B 1570 MiB/s 2.48 c/B 4089 OCB dec | 0.611 ns/B 1561 MiB/s 2.50 c/B 4089 OCB auth | 0.602 ns/B 1585 MiB/s 2.46 c/B 4089 After (~80% faster): CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.299 ns/B 3186 MiB/s 1.22 c/B 4090 CFB dec | 0.314 ns/B 3039 MiB/s 1.28 c/B 4089 CTR enc | 0.322 ns/B 2962 MiB/s 1.32 c/B 4090 CTR dec | 0.321 ns/B 2970 MiB/s 1.31 c/B 4090 OCB enc | 0.339 ns/B 2817 MiB/s 1.38 c/B 4089 OCB dec | 0.346 ns/B 2756 MiB/s 1.41 c/B 4089 OCB auth | 0.337 ns/B 2831 MiB/s 1.38 c/B 4089 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add detection for HW feature "intel-gfni"Jussi Kivilinna2022-04-245-1/+52
| | | | | | | | | | | | * configure.ac (gfnisupport, gcry_cv_gcc_inline_asm_gfni) (ENABLE_GFNI_SUPPORT): New. * src/g10lib.h (HWF_INTEL_GFNI): New. * src/hwf-x86.c (detect_x86_gnuc): Add GFNI detection. * src/hwfeatures.c (hwflist): Add "intel-gfni". * doc/gcrypt.texi: Add "intel-gfni" to HW features list. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* tests: Expect the RSA PKCS #1.5 encryption to fail in FIPS modeJakub Jelen2022-04-212-5/+20
| | | | | | | | | | | * tests/basic.c (check_pubkey_crypt): Expect RSA PKCS #1.5 encryption to fail in FIPS mode. Expect failure when wrong padding is selected * tests/pkcs1v2.c (check_v15crypt): Expect RSA PKCS #1.5 encryption to fail in FIPS mode -- GnuPG-bug-id: 5918 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* tests: Replace custom bit with more generic flagsJakub Jelen2022-04-211-9/+10
| | | | | | | | | | * tests/basic.c (global): New flag FLAG_SPECIAL (check_pubkey_crypt): Change to use bitfield flags -- GnuPG-bug-id: 5918 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* Do not allow PKCS #1.5 padding for encryption in FIPSJakub Jelen2022-04-212-1/+9
| | | | | | | | | | * cipher/pubkey-util.c (_gcry_pk_util_data_to_mpi): Block PKCS #1.5 padding for encryption in FIPS mode * cipher/rsa.c (rsa_decrypt): Block PKCS #1.5 decryption in FIPS mode -- GnuPG-bug-id: 5918 Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* random: Not use secure memory for DRBG instance.NIIBE Yutaka2022-04-211-4/+4
| | | | | | | | | | | * random/random-drbg.c (drbg_instance): New at BSS. (_drbg_init_internal): Don't allocate at secure memory. (_gcry_rngdrbg_close_fds): Follow the change. -- GnuPG-bug-id: 5933 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* cipher: Change the bounds for RSA key generation round.NIIBE Yutaka2022-04-201-4/+4
| | | | | | | | | | | * cipher/rsa.c (generate_fips): Use 10 for p, 20 for q. -- Constants from FIPS 186-5-draft. GnuPG-bug-id: 5919 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Use offsetof instead of null ptr calculation.NIIBE Yutaka2022-04-191-1/+1
| | | | | | | | * src/secmem.c (_gcry_secmem_realloc_internal): Use offsetof. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* cipher: Fix rsa key generation.NIIBE Yutaka2022-04-181-0/+2
| | | | | | | | | | * cipher/rsa.c (generate_fips): Set the least significant bit. -- GnuPG-bug-id: 5919 Fixes-commit: 5f9b3c2e220ca6d0eaff32324a973ef67933a844 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Fix make dist after socklen.m4 removalClemens Lang2022-04-121-1/+1
| | | | | | | | * m4/Makefile.am: Remove socklen.m4 from EXTRA_DIST -- Signed-off-by: Clemens Lang <cllang@redhat.com>
* build: Remove configure checking for socklen_t.NIIBE Yutaka2022-04-082-78/+0
| | | | | | | | | * configure.ac (gl_TYPE_SOCKLEN_T): Remove. * m4/socklen.m4: Remove. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* doc: Fix missing ARM hardware featuresTianjia Zhang2022-04-061-0/+4
| | | | | | | * doc/gcrypt.texi: Add sha3/sm3/sm4/sha512 to ARM hardware features. -- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
* build: Fix for arm crypto supportTianjia Zhang2022-04-061-1/+1
| | | | | | | * configure.ac: Correct wrong variable names. -- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
* chacha20: add AVX512 implementationJussi Kivilinna2022-04-064-6/+357
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'chacha20-amd64-avx512.S'. * cipher/chacha20-amd64-avx512.S: New. * cipher/chacha20.c (USE_AVX512): New. (CHACHA20_context_s): Add 'use_avx512'. [USE_AVX512] (_gcry_chacha20_amd64_avx512_blocks16): New. (chacha20_do_setkey) [USE_AVX512]: Setup 'use_avx512' based on HW features. (do_chacha20_encrypt_stream_tail) [USE_AVX512]: Use AVX512 implementation if supported. (_gcry_chacha20_poly1305_encrypt) [USE_AVX512]: Disable stitched chacha20-poly1305 implementations if AVX512 implementation is used. (_gcry_chacha20_poly1305_decrypt) [USE_AVX512]: Disable stitched chacha20-poly1305 implementations if AVX512 implementation is used. -- Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.276 ns/B 3451 MiB/s 1.13 c/B 4090 STREAM dec | 0.284 ns/B 3359 MiB/s 1.16 c/B 4090 POLY1305 enc | 0.411 ns/B 2320 MiB/s 1.68 c/B 4098±3 POLY1305 dec | 0.408 ns/B 2338 MiB/s 1.67 c/B 4091±1 POLY1305 auth | 0.060 ns/B 15785 MiB/s 0.247 c/B 4090±1 After (stream 1.7x faster, poly1305-aead 1.8x faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz STREAM enc | 0.162 ns/B 5869 MiB/s 0.665 c/B 4092±1 STREAM dec | 0.162 ns/B 5884 MiB/s 0.664 c/B 4096±3 POLY1305 enc | 0.221 ns/B 4306 MiB/s 0.907 c/B 4097±3 POLY1305 dec | 0.220 ns/B 4342 MiB/s 0.900 c/B 4096±3 POLY1305 auth | 0.060 ns/B 15797 MiB/s 0.247 c/B 4085±2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: add AVX512 implementationJussi Kivilinna2022-04-066-3/+1720
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S. * cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'. * cipher/poly1305-amd64-avx512.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX512): New. (poly1305_context_s): Add 'use_avx512'. * cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New. [POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks) (poly1305_amd64_avx512_blocks): New. (poly1305_init): Use AVX512 is HW feature available (set use_avx512). [USE_MPI_64BIT] (poly1305_blocks): Rename to ... [USE_MPI_64BIT] (poly1305_blocks_generic): ... this. [USE_MPI_64BIT] (poly1305_blocks): New. -- Patch adds AMD64 AVX512-FMA52 implementation for Poly1305. Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090 After (5.0x faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* doc: Update yat2m from libgpg-error.NIIBE Yutaka2022-04-051-47/+278
| | | | | | | | | | | * doc/yat2m.c: Update. -- Stderr output of "writing '<THE PAGE NAME>'" will be suppressed unless --verbose is specified. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Add SM3 ARMv8/AArch64/CE assembly implementationTianjia Zhang2022-04-044-1/+248
| | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'sm3-armv8-aarch64-ce.S'. * cipher/sm3-armv8-aarch64-ce.S: New. * cipher/sm3.c (USE_ARM_CE): New. [USE_ARM_CE] (_gcry_sm3_transform_armv8_ce) (do_sm3_transform_armv8_ce): New. (sm3_init) [USE_ARM_CE]: New. * configure.ac: Add 'sm3-armv8-aarch64-ce.lo'. -- Benchmark on T-Head Yitian-710 2.75 GHz: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 2.84 ns/B 335.3 MiB/s 7.82 c/B 2749 After (~55% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 1.84 ns/B 518.1 MiB/s 5.06 c/B 2749 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
* hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW featureJussi Kivilinna2022-04-011-0/+1
| | | | | | | | * src/hwf-ppc.c (ppc_features): Add HWF_PPC_ARCH_3_10. -- GnuPG-bug-id: T5913 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* random:drbg: Fix the behavior for child process.NIIBE Yutaka2022-03-311-0/+3
| | | | | | | | | | | | * random/random-drbg.c (_gcry_rngdrbg_randomize): Update change of PID detection. -- In a child process, it calls to drbg_reseed again and again, without this change. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: When no gpg-error-config, not install libgcrypt-config.NIIBE Yutaka2022-03-312-0/+7
| | | | | | | | | | | | | | * configure.ac (USE_GPGRT_CONFIG): New. * src/Makefile.am [USE_GPGRT_CONFIG]: Conditionalize the install of libgcrypt-config. -- When system will migrate use of gpgrt-config and removal of gpg-error-config, libgcrypt-config will not be installed (but use libgcrypt.pc by gpgrt-config). Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* tests: Add brainpoolP256r1 to bench-slope.Werner Koch2022-03-301-0/+16
| | | | | | | | | * tests/bench-slope.c (ECC_ALGO_BRAINP256R1): New. (ecc_algo_fips_allowed): Support this curve. (ecc_algo_name): Ditto. (ecc_algo_curve): Ditto. (ecc_nbits): Ditto. (bench_ecc_init): Ditto.
* configure: fix avx512 check for i386Jussi Kivilinna2022-03-291-3/+3
| | | | | | | | * configure.ac (gcry_cv_gcc_inline_asm_avx512): Do not use ZMM22 register; Check for broadcast memory source. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Fix configure.ac error of intel-avx512Tianjia Zhang2022-03-291-0/+6
| | | | | | | * configure.ac: Correctly set value for avx512support. -- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>