summaryrefslogtreecommitdiff
path: root/configure.ac
Commit message (Collapse)AuthorAgeFilesLines
* random: Remove random-daemon use remained.NIIBE Yutaka2021-12-171-1/+1
| | | | | | | | | | | | | * configure.ac (--enable-random-daemon): Fix the message. * random/random-csprng.c [USE_RANDOM_DAEMON] (initialize_basics): Remove the dependency to random daemon. * random/random.h [USE_RANDOM_DAEMON]: Likewise. -- GnuPG-bug-id: 5706 Fixes-commit: 754ad5815b5bb7462260414f2bc5f449bee0b1c6 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Add SM3 x86-64 AVX/BMI2 assembly implementationJussi Kivilinna2021-12-141-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'sm3-avx-bmi2-amd64.S'. * cipher/sm3-avx-bmi2-amd64.S: New. * cipher/sm3.c (USE_AVX_BMI2, ASM_FUNC_ABI, ASM_EXTRA_STACK): New. (SM3_CONTEXT): Define 'h' as array instead of separate fields 'h1', 'h2', etc. [USE_AVX_BMI2] (_gcry_sm3_transform_amd64_avx_bmi2) (do_sm3_transform_amd64_avx_bmi2): New. (sm3_init): Select AVX/BMI2 transform function if support by HW; Update to use 'hd->h' as array. (transform_blk, sm3_final): Update to use 'hd->h' as array. * configure.ac: Add 'sm3-avx-bmi2-amd64.lo'. -- Benchmark on AMD Zen3: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 2.18 ns/B 436.6 MiB/s 10.59 c/B 4850 After (~43% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 1.52 ns/B 627.4 MiB/s 7.37 c/B 4850 Benchmark on Intel Skylake: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 4.35 ns/B 219.2 MiB/s 13.48 c/B 3098 After (~34% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 3.24 ns/B 294.4 MiB/s 10.04 c/B 3098 Benchmark on AMD Zen2: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 2.73 ns/B 348.9 MiB/s 11.86 c/B 4339 After (~38% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SM3 | 1.97 ns/B 483.0 MiB/s 8.52 c/B 4318 Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Do not build 'cipher/' assembly files when --disable-asm usedJussi Kivilinna2021-11-181-90/+96
| | | | | | | | | | * configure.ac: Collect assembly implementation *.lo files under GCRYPT_ASM_CIPHERS and GCRYPT_ASM_DIGEST for --disable-asm selection. -- GnuPG-bug-id: 5694 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Do not build poly1305-s390x.S on foreign architecturesJussi Kivilinna2021-11-181-0/+7
| | | | | | | | | | * configure.ac [host=s390x-*-*]: Add 'poly1305-s390x.lo'. * cipher/Makefile.am: Move 'poly1305-s390x.S' to 'EXTRA_libcipher_la_SOURCES'. -- GnuPG-bug-id: 5694 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* build: Fix excess quotation to enable config.status --recheck works.NIIBE Yutaka2021-11-181-1/+1
| | | | | | | | | * configure.ac (DEF_HMAC_BINARY_CHECK): Fix quatation. -- GnuPG-bug-id: 5550 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Support rndgetentropy random module.NIIBE Yutaka2021-11-151-2/+10
| | | | | | | | | | * configure.ac: Add getentropy random module. * random/Makefile.am (EXTRA_librandom_la_SOURCES): Add. -- GnuPG-bug-id: 5636 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Support specifying HMAC key by --enable-hmac-binary-check.NIIBE Yutaka2021-10-121-4/+10
| | | | | | | | | | | | * configure.ac (DEF_HMAC_BINARY_CHECK): New SUBSTITUTION. (DL_LIBS): Fix the condition. * src/Makefile.am (libgcrypt_la_CFLAGS): Use DEF_HMAC_BINARY_CHECK. (hmac256_CFLAGS): Likewise. -- GnuPG-bug-id: 5550 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build,gcrypt.h: Don't define gcry_socklen_t.NIIBE Yutaka2021-10-051-15/+0
| | | | | | | | | | * configure.ac (FALLBACK_SOCKLEN_T): Remove. * src/gcrypt.h.in: Remove FALLBACK_SOCKLEN_T. -- GnuPG-bug-id: 5637 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build,gcrypt.h: Remove INSERT_SYS_SELECT_H.NIIBE Yutaka2021-10-051-6/+1
| | | | | | | | | | | | | * configure.ac (INSERT_SYS_SELECT_H): Remove. Remove checking sys/select.h. * src/gcrypt.h.in: Remove INSERT_SYS_SELECT_H. -- It has been no use any more. GnuPG-bug-id: 5637 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Allow passing FIPS module versionJakub Jelen2021-09-201-0/+7
| | | | | | | | | | | | | | * README: Document new --with-fips-module-version=version switch * configure.ac: Implementation of the --with-fips-module-version * src/global.c (print_config): Print FIPS module version from above -- Signed-off-by: Jakub Jelen <jjelen@redhat.com> Moved the module version to a 3rd field to keep the semantics of that line. Signed-off-by: Werner Koch <wk@gnupg.org> GnuPG-bug-id: 1600
* build: Generate hash for integrity check with hmac256.NIIBE Yutaka2021-08-181-0/+2
| | | | | | | | | | | | | | * configure.ac [ENABLE_HMAC_BINARY_CHECK]: Check objcopy. (USE_HMAC_BINARY_CHECK): New Automake conditional. * src/Makefile.am (libgcrypt.la.done): New target. [USE_HMAC_BINARY_CHECK] (libgcrypt.so.hmac): Compute the hash. [USE_HMAC_BINARY_CHECK] (libgcrypt.la.done): Add .hmac section. -- GnuPG-bug-id: 5550 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Update checking headers.NIIBE Yutaka2021-08-051-1/+1
| | | | | | * configure.ac (AC_CHECK_HEADERS): Remove sys/msg.h. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi/ec: add fast reduction functions for NIST curvesJussi Kivilinna2021-06-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (ASM_DISABLED): New. * mpi/Makefile.am: Add 'ec-nist.c' and 'ec-inline.h'. * mpi/ec-nist.c: New. * mpi/ec-inline.h: New. * mpi/ec-internal.h (_gcry_mpi_ec_nist192_mod) (_gcry_mpi_ec_nist224_mod, _gcry_mpi_ec_nist256_mod) (_gcry_mpi_ec_nist384_mod, _gcry_mpi_ec_nist521_mod): New. * mpi/ec.c (ec_addm, ec_subm, ec_mulm, ec_mul2): Use 'ctx->mod'. (field_table): Add 'mod' function; Add NIST reduction functions. (ec_p_init): Setup ctx->mod; Setup function pointers from field_table only if pointer is not NULL; Resize ctx->a and ctx->b only if set. * mpi/mpi-internal.h (RESIZE_AND_CLEAR_IF_NEEDED): New. * mpi/mpiutil.c (_gcry_mpi_resize): Clear all unused limbs also in realloc case. * src/ec-context.h (mpi_ec_ctx_s): Add 'mod' function. -- Benchmark on AMD Ryzen 7 5800X (x86_64): Before: NIST-P192 | nanosecs/iter cycles/iter auto Mhz mult | 283346 1369473 4833 keygen | 1688442 8185744 4848 sign | 549683 2662984 4845 verify | 615284 2984325 4850 = NIST-P224 | nanosecs/iter cycles/iter auto Mhz mult | 516443 2501173 4843 keygen | 2859746 13866802 4849 sign | 918472 4455043 4850 verify | 1057940 5131372 4850 = NIST-P256 | nanosecs/iter cycles/iter auto Mhz mult | 423536 2054040 4850 keygen | 2383097 11557572 4850 sign | 774346 3754243 4848 verify | 864934 4196315 4852 = NIST-P384 | nanosecs/iter cycles/iter auto Mhz mult | 929985 4511881 4852 keygen | 5230788 25367299 4850 sign | 1671432 8109726 4852 verify | 1902729 9228568 4850 = NIST-P521 | nanosecs/iter cycles/iter auto Mhz mult | 2123546 10300952 4851 keygen | 12019340 58297774 4850 sign | 3886988 18853054 4850 verify | 4507885 21864015 4850 After: NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 186679 905603 4851 +51% keygen | 1161423 5623822 4842 +46% sign | 389531 1887557 4846 +41% verify | 412936 2000461 4844 +49% = NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 260621 1256327 4821 +99% keygen | 1557845 7531677 4835 +84% sign | 521678 2527083 4844 +76% verify | 554084 2677949 4833 +92% = NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 319045 1542061 4833 +33% keygen | 1834822 8898950 4850 +30% sign | 612866 2972630 4850 +26% verify | 664821 3222597 4847 +30% = NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 593894 2875260 4841 +57% keygen | 3526600 17089717 4846 +48% sign | 1178098 5710151 4847 +42% verify | 1260185 6107449 4846 +51% = NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 1160220 5621946 4846 +83% keygen | 6862975 33247351 4844 +75%´ sign | 2287366 11096711 4851 +70% verify | 2455858 11888045 4841 +84% Benchmark on AMD Ryzen 7 5800X (i386): Before: NIST-P192 | nanosecs/iter cycles/iter auto Mhz mult | 648039 3143236 4850 keygen | 3554452 17244822 4852 sign | 1163173 5641932 4850 verify | 1300076 6305673 4850 = NIST-P224 | nanosecs/iter cycles/iter auto Mhz mult | 798607 3874405 4851 keygen | 4657604 22589864 4850 sign | 1515803 7352049 4850 verify | 1635470 7935373 4852 = NIST-P256 | nanosecs/iter cycles/iter auto Mhz mult | 927033 4496283 4850 keygen | 5313601 25771983 4850 sign | 1735795 8418514 4850 verify | 1945804 9438212 4851 = NIST-P384 | nanosecs/iter cycles/iter auto Mhz mult | 2301781 11164473 4850 keygen | 12856001 62353242 4850 sign | 4161041 20180651 4850 verify | 4705961 22827478 4851 = NIST-P521 | nanosecs/iter cycles/iter auto Mhz mult | 6066635 29422721 4850 keygen | 32995868 160046407 4850 sign | 10503306 50945387 4850 verify | 12225252 59294323 4850 After: NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 413605 2007498 4854 +57% keygen | 2479429 12010926 4844 +44% sign | 825111 3997147 4844 +41% verify | 890206 4318723 4851 +46% = NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 551703 2676454 4851 +45% keygen | 3257022 15781844 4845 +43% sign | 1085678 5258894 4844 +40% verify | 1172195 5678499 4844 +40% = NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 720395 3497486 4855 +29% keygen | 4217758 20461257 4851 +26% sign | 1404350 6814131 4852 +24% verify | 1515136 7353955 4854 +28% = NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 1525742 7400771 4851 +51% keygen | 9046660 43877889 4850 +42% sign | 2974641 14408703 4844 +40% verify | 3265285 15834951 4849 +44% = NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 3289348 15968678 4855 +84% keygen | 19354174 93873531 4850 +70% sign | 6351493 30830140 4854 +65% verify | 6979292 33854215 4851 +75% Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* build: _DARWIN_C_SOURCE should be 1.NIIBE Yutaka2021-05-271-1/+1
| | | | | | | | | | * configure.ac (*-apple-darwin*): Set _DARWIN_C_SOURCE 1. -- GnuPG-bug-id: 5440 Reported-by: Jay Freeman Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Post release updates.libgcrypt-1.10-baseWerner Koch2021-04-191-1/+1
| | | | --
* Release 1.9.3libgcrypt-1.9.3Werner Koch2021-04-191-1/+1
|
* Compile arch specific GCM implementations only on target archJussi Kivilinna2021-03-071-6/+10
| | | | | | | | | | * cipher/Makefile.am: Move arch specific 'cipher-gcm-*.[cS]' files from libcipher_la_SOURCES to EXTRA_libcipher_la_SOURCES. * configure.ac: Add 'cipher-gcm-intel-pclmul.lo' and 'cipher-gcm-arm*.lo'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* configure.ac: fix digest implementations going to cipher listJussi Kivilinna2021-03-071-11/+11
| | | | | | | | * configure.ac: Add 'crc-arm*.lo', 'crc-ppc.lo', 'sha*-ppc.lo' to GCRYPT_DIGESTS instead of GCRYPT_CIPHERS. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* VPMSUMD acceleration for GCM mode on PPCShawn Landden2021-03-071-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-ppc.c'. * cipher/cipher-gcm-ppc.c: New. * cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum) (_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New. (setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if HW feature "ppc-vcrypto" is available. * cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New. (gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align 'gcm_table' to 16 bytes. * configure.ac: Add 'cipher-gcm-ppc.lo'. * tests/basic.c (_check_gcm_cipher): New AES256 test vector. * AUTHORS: Add 'CRYPTOGAMS'. * LICENSES: Add original license to 3-clause-BSD section. -- https://dev.gnupg.org/D501: 10-20X speed. However this Power 9 machine is faster than the last Power 9 benchmarks on the optimized versions, so while better than the last patch, it is not all due to the code. Before: GCM enc | 4.23 ns/B 225.3 MiB/s - c/B GCM dec | 3.58 ns/B 266.2 MiB/s - c/B GCM auth | 3.34 ns/B 285.3 MiB/s - c/B After: GCM enc | 0.370 ns/B 2578 MiB/s - c/B GCM dec | 0.371 ns/B 2571 MiB/s - c/B GCM auth | 0.159 ns/B 6003 MiB/s - c/B Signed-off-by: Shawn Landden <shawn@git.icu> [jk: coding style fixes, Makefile.am integration, patch from Differential to git, commit changelog, fixed few compiler warnings] GnuPG-bug-id: 5040 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* hwf-x86: add "intel-vaes-vpclmul" HW featureJussi Kivilinna2021-02-281-0/+32
| | | | | | | | | | | | | | * configure.ac (HAVE_GCC_INLINE_ASM_VAES_VPCLMUL): New. * src/g10lib.h (HWF_INTEL_VAES_VPCLMUL): New. * src/hwf-x86.c (detect_x86_gnuc): Check for VAES and VPCLMUL. * src/hwfeatures.c (hwflist): Add "intel-vaes-vpclmul". -- Detect support for VAES and VPCLMUL instruction sets, which allow use of AES and PCLMUL instruction with 256-bit and 512-bit vector registers. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Post release updatesWerner Koch2021-02-171-1/+1
| | | | --
* Release 1.9.2libgcrypt-1.9.2Werner Koch2021-02-171-1/+1
|
* Post release updatesWerner Koch2021-01-291-1/+1
| | | | --
* Release 1.9.1libgcrypt-1.9.1Werner Koch2021-01-291-1/+1
| | | | * configure.ac: Bump LT version to C23/A3/R1.
* build: Check spawn.h for MacOS X Tiger.NIIBE Yutaka2021-01-271-0/+1
| | | | | | | | | | | | | * configure.ac: Add check for spawn.h. * tests/random.c: Only use posix_spawn if available. -- Since older version doesn't have SIP or it is not enabled, no problem using system(3). GnuPG-bug-id: 5159 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* sha512/sha256: remove assembler macros from AMD64 implementationsJussi Kivilinna2021-01-261-15/+5
| | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_platform_as_ok_for_intel_syntax): Remove assembler macro check from Intel syntax assembly support check. * cipher/sha256-avx-amd64.S: Replace assembler macros with C preprocessor counterparts. * cipher/sha256-avx2-bmi2-amd64.S: Ditto. * cipher/sha256-ssse3-amd64.S: Ditto. * cipher/sha512-avx-amd64.S: Ditto. * cipher/sha512-avx2-bmi2-amd64.S: Ditto. * cipher/sha512-ssse3-amd64.S: Ditto. -- Removing GNU assembler macros allows building these implementations with clang. GnuPG-bug-id: 5255 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* configure.ac: run assembler checks through linker for better LTO supportJussi Kivilinna2021-01-261-46/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_arm_platform_as_ok) (gcry_cv_gcc_aarch64_platform_as_ok) (gcry_cv_gcc_inline_asm_ssse3, gcry_cv_gcc_inline_asm_pclmul) (gcry_cv_gcc_inline_asm_shaext, gcry_cv_gcc_inline_asm_sse41) (gcry_cv_gcc_inline_asm_avx, gcry_cv_gcc_inline_asm_avx2) (gcry_cv_gcc_inline_asm_bmi2, gcry_cv_gcc_as_const_division_ok) (gcry_cv_gcc_as_const_division_with_wadivide_ok) (gcry_cv_gcc_amd64_platform_as_ok, gcry_cv_gcc_win64_platform_as_ok) (gcry_cv_gcc_platform_as_ok_for_intel_syntax) (gcry_cv_gcc_inline_asm_neon, gcry_cv_gcc_inline_asm_aarch32_crypto) (gcry_cv_gcc_inline_asm_aarch64_neon) (gcry_cv_gcc_inline_asm_aarch64_crypto) (gcry_cv_gcc_inline_asm_ppc_altivec) (gcry_cv_gcc_inline_asm_ppc_arch_3_00) (gcry_cv_gcc_inline_asm_s390x, gcry_cv_gcc_inline_asm_s390x): Use AC_LINK_IFELSE check instead of AC_COMPILE_IFELSE. -- LTO may defer assembly checking to linker stage, thus we need to use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE for these checks. GnuPG-bug-id: 5255 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add configure option to force enable 'soft' HW feature bitsJussi Kivilinna2021-01-261-0/+14
| | | | | | | | | | | | | | | | | | | | | * configure.ac (force_soft_hwfeatures) (ENABLE_FORCE_SOFT_HWFEATURES): New. * src/hwf-x86.c (detect_x86_gnuc): Enable HWF_INTEL_FAST_SHLD and HWF_INTEL_FAST_VPGATHER if ENABLE_FORCE_SOFT_HWFEATURES enabled. -- Patch allows enabling HW features, that are fast only select CPU models, on all CPUs. For example, SHLD instruction is fast on only select Intel processors and should not be used on others. This configuration option allows enabling these 'soft' HW features for testing purposes on all CPUs. Current 'soft' HW features are: - "intel-fast-shld": supported by all x86 (but very slow on most) - "intel-fast-vpgather": supported by all x86 with AVX2 (but slow on most) Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Merge branch 'LIBGCRYPT-1.9-BRANCH'Werner Koch2021-01-211-2/+2
|\ | | | | | | | | | | | | -- Master is missing latest NEWS and some other last minute changes from the 1.9.0 release.
| * Post release updatesWerner Koch2021-01-191-1/+1
| | | | | | | | --
| * Release 1.9.0libgcrypt-1.9.0Werner Koch2021-01-191-1/+1
| |
* | mpi/longlong: make use of compiler provided __builtin_ctz/__builtin_clzJussi Kivilinna2021-01-201-0/+45
| | | | | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_have_builtin_ctzl, gcry_cv_have_builtin_clz) (gcry_cv_have_builtin_clzl): New checks. * mpi/longlong.h (count_leading_zeros, count_trailing_zeros): Use __buildin_clz[l]/__builtin_ctz[l] if available and bit counting macros not yet provided by inline assembly. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* | Add s390x/zSeries implementation of Poly1305cipher-s390x-optimizationsJussi Kivilinna2020-12-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'poly1305-s390x.S' and 'asm-poly1305-s390x.h'. * cipher/asm-poly1305-s390x.h: New * cipher/chacha20-s390x.S (_gcry_chacha20_poly1305_s390x_vx_blocks8) (_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New, stitched chacha20-poly1305 implementation. * cipher/chacha20.c (USE_S390X_VX_POLY1305): New. (_gcry_chacha20_poly1305_s390x_vx_blocks8) (_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New prototypes. (_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add s390x/VX stitched chacha20-poly1305 code-path. * cipher/poly1305-s390x.S: New. * cipher/poly1305.c (USE_S390X_ASM, HAVE_ASM_POLY1305_BLOCKS): New. [USE_S390X_ASM] (_gcry_poly1305_s390x_blocks1, poly1305_blocks): New. * configure.ac (gcry_cv_gcc_inline_asm_s390x): Check for 'risbgn' and 'algrk' instructions. * tests/basic.c (_check_poly1305_cipher): Add large chacha20-poly1305 test vector. -- Patch adds Poly1305 and stitched ChaCha20-Poly1305 implementation for zSeries. Stitched implementation interleaves ChaCha20 and Poly1305 processing for higher instruction level parallelism and better utilization of execution units. Benchmark on z15 (4504 Mhz): Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.16 ns/B 823.2 MiB/s 5.22 c/B POLY1305 dec | 1.16 ns/B 823.2 MiB/s 5.22 c/B POLY1305 auth | 0.736 ns/B 1295 MiB/s 3.32 c/B After (chacha20-poly1305 ~71% faster, poly1305 ~29% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.677 ns/B 1409 MiB/s 3.05 c/B POLY1305 dec | 0.655 ns/B 1456 MiB/s 2.95 c/B POLY1305 auth | 0.569 ns/B 1675 MiB/s 2.56 c/B GnuPG-bug-id: 5202 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* | Add s390x/zSeries implementation of ChaCha20Jussi Kivilinna2020-12-301-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'asm-common-s390x.h' and 'chacha20-s390x.S'. * cipher/asm-common-s390x.h: New. * cipher/chacha20-s390x.S: New. * cipher/chacha20.c (USE_S390X_VX): New. (CHACHA20_context_t): Change 'use_*' bit-field to unsigned type; Add 'use_s390x'. (_gcry_chacha20_s390x_vx_blocks8) (_gcry_chacha20_s390x_vx_blocks4_2_1): New. (chacha20_do_setkey): Add HW feature detect for s390x/VX. (chacha20_blocks, do_chacha20_encrypt_stream_tail): Add s390x/VX code-path. * configure.ac: Add 'chacha20-s390x.lo'. -- Patch adds VX vector instruction set accelerated ChaCha20 implementation for zSeries. Benchmark on z15 (4504 Mhz): Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.62 ns/B 364.0 MiB/s 11.80 c/B STREAM dec | 2.62 ns/B 363.8 MiB/s 11.81 c/B After (~5x faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.505 ns/B 1888 MiB/s 2.28 c/B STREAM dec | 0.506 ns/B 1887 MiB/s 2.28 c/B GnuPG-bug-id: 5201 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* | hwf-s390x: add VX vector instruction set detectionJussi Kivilinna2020-12-301-0/+30
| | | | | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_inline_asm_s390x_vx): New check. * src/g10lib.h (HWF_S390X_VX): New. * src/hwf-s390x.c (HWCAP_S390_VXRS): New. (s390x_features) [HAVE_GCC_INLINE_ASM_S390X_VX]: Add VX feature check. * src/hwfeatures.c (hwlist): Add "s390x-vx". -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* | Add s390x/zSeries acceleration for AESJussi Kivilinna2020-12-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac: Add 'rijndael-s390x.lo'. * cipher/Makefile.am: Add 'rijndael-s390x.c'. * cipher/rijndael-internal.c (USE_S390X_CRYPTO): New. (RIJNDAEL_context_s) [USE_S390X_CRYPTO]: New 'km*_func' members. * cipher/rijndael-s390x.c: New. * cipher/rijndael.c (_gcry_aes_s390x_setup_acceleration) (_gcry_aes_s390x_setup_setkey) (_gcry_aes_s390x_setup_prepare_decryption, _gcry_aes_s390x_encrypt) (_gcry_aes_s390x_decrypt): New. (do_setkey) [USE_S390X_CRYPTO]: Add s390x acceleration setup. -- Patchs adds acceleration for single-block AES and following modes: - CBC, CBC-MAC, CFB, OFB, CTR, XTS and OCB Benchmarks (z15, 5.2Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 3.81 ns/B 250.2 MiB/s 19.82 c/B ECB dec | 4.13 ns/B 231.1 MiB/s 21.46 c/B CBC enc | 3.69 ns/B 258.5 MiB/s 19.19 c/B CBC dec | 3.71 ns/B 257.1 MiB/s 19.29 c/B CFB enc | 3.69 ns/B 258.7 MiB/s 19.17 c/B CFB dec | 3.56 ns/B 267.8 MiB/s 18.52 c/B OFB enc | 3.85 ns/B 247.8 MiB/s 20.01 c/B OFB dec | 3.85 ns/B 247.9 MiB/s 20.01 c/B CTR enc | 3.65 ns/B 261.6 MiB/s 18.96 c/B CTR dec | 3.64 ns/B 261.6 MiB/s 18.95 c/B XTS enc | 3.66 ns/B 260.8 MiB/s 19.02 c/B XTS dec | 3.75 ns/B 254.2 MiB/s 19.51 c/B CCM enc | 7.34 ns/B 129.9 MiB/s 38.19 c/B CCM dec | 7.34 ns/B 129.9 MiB/s 38.19 c/B CCM auth | 3.70 ns/B 257.6 MiB/s 19.25 c/B EAX enc | 7.34 ns/B 129.8 MiB/s 38.19 c/B EAX dec | 7.35 ns/B 129.8 MiB/s 38.20 c/B EAX auth | 3.70 ns/B 257.8 MiB/s 19.24 c/B GCM enc | 6.22 ns/B 153.3 MiB/s 32.36 c/B GCM dec | 6.23 ns/B 153.0 MiB/s 32.42 c/B GCM auth | 2.59 ns/B 368.9 MiB/s 13.44 c/B OCB enc | 3.82 ns/B 249.7 MiB/s 19.86 c/B OCB dec | 3.90 ns/B 244.2 MiB/s 20.31 c/B OCB auth | 3.88 ns/B 245.5 MiB/s 20.20 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 2.10 ns/B 453.1 MiB/s 10.94 c/B ECB dec | 2.11 ns/B 453.0 MiB/s 10.95 c/B CBC enc | 0.182 ns/B 5240 MiB/s 0.946 c/B CBC dec | 0.044 ns/B 21581 MiB/s 0.230 c/B CFB enc | 0.206 ns/B 4623 MiB/s 1.07 c/B CFB dec | 0.140 ns/B 6826 MiB/s 0.727 c/B OFB enc | 0.183 ns/B 5222 MiB/s 0.950 c/B OFB dec | 0.182 ns/B 5252 MiB/s 0.944 c/B CTR enc | 0.059 ns/B 16095 MiB/s 0.308 c/B CTR dec | 0.059 ns/B 16045 MiB/s 0.309 c/B XTS enc | 0.043 ns/B 21998 MiB/s 0.225 c/B XTS dec | 0.043 ns/B 22012 MiB/s 0.225 c/B CCM enc | 0.239 ns/B 3989 MiB/s 1.24 c/B CCM dec | 0.239 ns/B 3987 MiB/s 1.24 c/B CCM auth | 0.180 ns/B 5288 MiB/s 0.938 c/B EAX enc | 0.242 ns/B 3940 MiB/s 1.26 c/B EAX dec | 0.243 ns/B 3926 MiB/s 1.26 c/B EAX auth | 0.183 ns/B 5218 MiB/s 0.950 c/B GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B OCB enc | 0.186 ns/B 5132 MiB/s 0.966 c/B OCB dec | 0.176 ns/B 5414 MiB/s 0.916 c/B OCB auth | 0.149 ns/B 6394 MiB/s 0.776 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* | hwf: add detection of s390x/zSeries hardware featuresJussi Kivilinna2020-12-181-0/+56
|/ | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_inline_asm_s390x) (HAVE_CPU_ARCH_S390X): Add s390x detection support. * mpi/config.links: Add setup for s390x links. * src/Makefile.am: Add 'hwf-s390x.c'. * src/g10lib.h (HWF_S390X_MSA, HWF_S390X_MSA_4, HWF_S390X_8): New. * src/hwf_common.h (_gcry_hwf_detect_s390x): New. * src/hwf-s390x.c: New. * src/hwfeatures.c: Add "s390x-msa", "s390x-msa-4" and "s390x-msa-8". -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aarch64: use configure check for assembly ELF directives supportJussi Kivilinna2020-12-181-0/+20
| | | | | | | | | | * configure.ac (gcry_cv_gcc_asm_elf_directives): New check. (HAVE_GCC_ASM_ELF_DIRECTIVES): New 'config.h' macro. * cipher/asm-common-aarch64.h (ELF): Change feature macro check from __ELF__ to HAVE_GCC_ASM_ELF_DIRECTIVES. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* tests: Put a work around to tests/random for macOS.NIIBE Yutaka2020-12-031-0/+2
| | | | | | | | | | * configure.ac [*-apple-darwin*] (USE_POSIX_SPAWN_FOR_TESTS): New. * tests/random.c [USE_POSIX_SPAWN_FOR_TESTS] (run_all_rng_tests): New. -- GnuPG-bug-id: 5159 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Update to newer autoconf constructs.NIIBE Yutaka2020-11-181-49/+47
| | | | | | | | | | | | | | | | | | | | * acinclude.m4 (GNUPG_SYS_SYMBOL_UNDERSCORE): Use AS_MESSAGE_LOG_FD instead of AC_FD_CC. (GNUPG_CHECK_MLOCK): Use AC_LINK_IFELSE instead of AC_TRY_LINK. Use AC_RUN_IFELSE instead of AC_TRY_RUN. * configure.ac (AC_ISC_POSIX): Replace by AC_SEARCH_LIBS. Use AC_USE_SYSTEM_EXTENSIONS instead of AC_GNU_SOURCE. Use AS_HELP_STRING instead of AC_HELP_STRING. (AC_TYPE_SIGNAL): Remove. (AC_DECL_SYS_SIGLIST): Remove. * m4/Makefile.am (EXTRA_DIST): Update. * m4/onceonly.m4: Remove. * m4/socklen.m4: Update from gnulib. * m4/libtool.m4: Update from libgpg-error. * m4/gpg-error.m4: Update from libgpg-error. * m4/noexecstack.m4: Use AS_HELP_STRING instead of AC_HELP_STRING. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* build: Use modern Autoconf check for type.NIIBE Yutaka2020-11-181-4/+1
| | | | | | | | | | * configure.ac (byte, ushort, us6, u32, u64): Use AC_CHECK_TYPES. * cipher/poly1305.c: Use HAVE_TYPE_U64. * src/hmac256.c: HAVE_TYPE_U32. * src/types.h: Use HAVE_TYPE_BYTE, HAVE_TYPE_USHORT, HAVE_TYPE_U16, HAVE_TYPE_U32, and HAVE_TYPE_U64. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* tests/bench-slope: improve CPU frequency auto-detectionJussi Kivilinna2020-07-231-2/+9
| | | | | | | | | | | | * configure.ac (gcry_cv_have_asm_volatile_memory): Check also if assembly memory barrier with input/output register is supported. * tests/bench-slope.c (auto_ghz_bench): Change to use base operation that takes two CPU cycles and unroll loop by 1024 operations. -- CPU frequency is now correctly detected on AWS Graviton CPU (2.3Ghz). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Enable jitter entropy also on non-x86 architecturesJussi Kivilinna2020-07-231-1/+0
| | | | | | | | | * configure.ac: Do not force jentsupport to "n/a" on non-x86 architectures. -- GnuPG-bug-id: 4966 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add SM4 x86-64/AES-NI/AVX2 implementationJussi Kivilinna2020-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'sm4-aesni-avx2-amd64.S'. * cipher/sm4-aesni-avx2-amd64.S: New. * cipher/sm4.c (USE_AESNI_AVX2): New. (SM4_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'. [USE_AESNI_AVX2] (_gcry_sm4_aesni_avx2_ctr_enc) (_gcry_sm4_aesni_avx2_cbc_dec, _gcry_sm4_aesni_avx2_cfb_dec) (_gcry_sm4_aesni_avx2_ocb_enc, _gcry_sm4_aesni_avx2_ocb_dec) (_gcry_sm4_aesni_avx_ocb_auth): New. (sm4_setkey): Enable AES-NI/AVX2 if supported by HW. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AESNI_AVX2]: Add AES-NI/AVX2 bulk functions. * configure.ac: Add ''sm4-aesni-avx2-amd64.lo'. -- This patch adds x86-64/AES-NI/AVX2 bulk encryption/decryption. Bulk functions process 16 blocks in parallel. Benchmark on AMD Ryzen 7 3700X: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC enc | 8.98 ns/B 106.2 MiB/s 38.62 c/B 4300 CBC dec | 1.55 ns/B 613.7 MiB/s 6.64 c/B 4275 CFB enc | 8.96 ns/B 106.4 MiB/s 38.52 c/B 4300 CFB dec | 1.54 ns/B 617.4 MiB/s 6.60 c/B 4275 CTR enc | 1.57 ns/B 607.8 MiB/s 6.75 c/B 4300 CTR dec | 1.57 ns/B 608.9 MiB/s 6.74 c/B 4300 OCB enc | 1.58 ns/B 603.8 MiB/s 6.75 c/B 4275 OCB dec | 1.57 ns/B 605.7 MiB/s 6.73 c/B 4275 OCB auth | 1.53 ns/B 624.5 MiB/s 6.57 c/B 4300 After (~56% faster than AES-NI/AVX impl.): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC enc | 8.93 ns/B 106.8 MiB/s 38.61 c/B 4326 CBC dec | 0.984 ns/B 969.5 MiB/s 4.23 c/B 4300 CFB enc | 8.93 ns/B 106.8 MiB/s 38.62 c/B 4325 CFB dec | 0.983 ns/B 970.3 MiB/s 4.23 c/B 4300 CTR enc | 0.998 ns/B 955.1 MiB/s 4.29 c/B 4300 CTR dec | 0.996 ns/B 957.4 MiB/s 4.28 c/B 4300 OCB enc | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300 OCB dec | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300 OCB auth | 0.993 ns/B 960.2 MiB/s 4.28 c/B 4304±2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add SM4 x86-64/AES-NI/AVX implementationJussi Kivilinna2020-06-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'sm4-aesni-avx-amd64.S'. * cipher/sm4-aesni-avx-amd64.S: New. * cipher/sm4.c (USE_AESNI_AVX, ASM_FUNC_ABI): New. (SM4_context) [USE_AESNI_AVX]: Add 'use_aesni_avx'. [USE_AESNI_AVX] (_gcry_sm4_aesni_avx_expand_key) (_gcry_sm4_aesni_avx_crypt_blk1_8, _gcry_sm4_aesni_avx_ctr_enc) (_gcry_sm4_aesni_avx_cbc_dec, _gcry_sm4_aesni_avx_cfb_dec) (_gcry_sm4_aesni_avx_ocb_enc, _gcry_sm4_aesni_avx_ocb_dec) (_gcry_sm4_aesni_avx_ocb_auth, sm4_aesni_avx_crypt_blk1_8): New. (sm4_expand_key) [USE_AESNI_AVX]: Use AES-NI/AVX key setup. (sm4_setkey): Enable AES-NI/AVX if supported by HW. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AESNI_AVX]: Add AES-NI/AVX bulk functions. * configure.ac: Add ''sm4-aesni-avx-amd64.lo'. -- This patch adds x86-64/AES-NI/AVX bulk encryption/decryption and key setup for SM4 cipher. Bulk functions process eight blocks in parallel. Benchmark on AMD Ryzen 7 3700X: Before: SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC enc | 8.94 ns/B 106.7 MiB/s 38.66 c/B 4325 CBC dec | 4.78 ns/B 199.7 MiB/s 20.42 c/B 4275 CFB enc | 8.95 ns/B 106.5 MiB/s 38.72 c/B 4325 CFB dec | 4.81 ns/B 198.2 MiB/s 20.57 c/B 4275 CTR enc | 4.81 ns/B 198.2 MiB/s 20.69 c/B 4300 CTR dec | 4.80 ns/B 198.8 MiB/s 20.63 c/B 4300 GCM auth | 0.116 ns/B 8232 MiB/s 0.504 c/B 4351 OCB enc | 4.88 ns/B 195.5 MiB/s 20.86 c/B 4275 OCB dec | 4.85 ns/B 196.6 MiB/s 20.86 c/B 4301 OCB auth | 4.80 ns/B 198.9 MiB/s 20.62 c/B 4301 After (~3.0x faster): SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC enc | 8.98 ns/B 106.2 MiB/s 38.62 c/B 4300 CBC dec | 1.55 ns/B 613.7 MiB/s 6.64 c/B 4275 CFB enc | 8.96 ns/B 106.4 MiB/s 38.52 c/B 4300 CFB dec | 1.54 ns/B 617.4 MiB/s 6.60 c/B 4275 CTR enc | 1.57 ns/B 607.8 MiB/s 6.75 c/B 4300 CTR dec | 1.57 ns/B 608.9 MiB/s 6.74 c/B 4300 OCB enc | 1.58 ns/B 603.8 MiB/s 6.75 c/B 4275 OCB dec | 1.57 ns/B 605.7 MiB/s 6.73 c/B 4275 OCB auth | 1.53 ns/B 624.5 MiB/s 6.57 c/B 4300 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> sm4 avx fix sm4 avx fix
* Add SM4 symmetric cipher algorithmTianjia Zhang2020-06-161-0/+7
| | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add sm4.c. * cipher/cipher.c (cipher_list, cipher_list_algo301): Add _gcry_cipher_spec_sm4. * cipher/mac-cmac.c (map_mac_algo_to_cipher): Add cmac SM4. (_gcry_mac_type_spec_cmac_sm4): Add cmac SM4. * cipher/mac-internal.h: Declare spec_cmac_sm4. * cipher/mac.c (mac_list, mac_list_algo201): Add cmac SM4. * cipher/sm4.c: New. * configure.ac (available_ciphers): Add sm4. * doc/gcrypt.texi: Add SM4 document. * src/cipher.h: Add declarations for SM4 and cmac SM4. * src/gcrypt.h.in (gcry_cipher_algos): Add algorithm ID for SM4. -- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> [jk: add missing mapping in mac-cmac.c:map_mac_algo_to_cipher] [jk: add GCRY_MAC_CMAC_SM4 to gcrypt.texi] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Disable all assembly modules with --disable-asmJussi Kivilinna2020-06-081-30/+56
| | | | | | | | | | | | | | | | | | | | | | * configure.ac (try_asm_modules): Update description, "MPI" => "MPI and cipher". (gcry_cv_gcc_arm_platform_as_ok, gcry_cv_gcc_aarch64_platform_as_ok) (gcry_cv_gcc_inline_asm_ssse3, gcry_cv_gcc_inline_asm_pclmul) (gcry_cv_gcc_inline_asm_shaext, gcry_cv_gcc_inline_asm_sse41) (gcry_cv_gcc_inline_asm_avx, gcry_cv_gcc_inline_asm_avx2) (gcry_cv_gcc_inline_asm_bmi2, gcry_cv_gcc_amd64_platform_as_ok) (gcry_cv_gcc_platform_as_ok_for_intel_syntax) (gcry_cv_cc_arm_arch_is_v6, gcry_cv_gcc_inline_asm_neon) (gcry_cv_gcc_inline_asm_aarch32_crypto) (gcry_cv_gcc_inline_asm_aarch64_neon) (gcry_cv_gcc_inline_asm_aarch64_crypto) (gcry_cv_cc_ppc_altivec, gcry_cv_gcc_inline_asm_ppc_altivec) (gcry_cv_gcc_inline_asm_ppc_arch_3_00): Check for "try_asm_modules". * mpi/config.links: Set "mpi_cpu_arch" to "disabled" with --disable-asm. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ppc: avoid using vec_vsx_ld/vec_vsx_st for 2x64-bit vectorsJussi Kivilinna2020-04-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | * cipher/crc-ppc.c (CRC_VEC_U64_LOAD, CRC_VEC_U64_LOAD_LE) (CRC_VEC_U64_LOAD_BE): Remove vec_vsx_ld usage. (asm_vec_u64_load, asm_vec_u64_load_le): New. * cipher/sha512-ppc.c (vec_vshasigma_u64): Use '__asm__' instead of 'asm' for assembly block. (vec_u64_load, vec_u64_store): New. (_gcry_sha512_transform_ppc8): Use vec_u64_load/store instead of vec_vsx_ld/vec_vsx_st. * configure.ac (gcy_cv_cc_ppc_altivec) (gcy_cv_cc_ppc_altivec_cflags): Add check for vec_vsx_ld with 'unsigned int *' pointer type. -- GCC 7.5 and clang 8.0 do not support vec_vsx_ld with 'unsigned long long *' pointer type. Switch code to use inline assembly instead. As vec_vsx_ld is still used with 'unsigned int *' pointers, add new check for this in configure.ac. GnuPG-bug-id: 4906 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* build: More accurate dependency to -lgpg-error.NIIBE Yutaka2020-02-251-1/+0
| | | | | | | | * configure.ac (LIBGCRYPT_CONFIG_LIBS): Remove DL_LIBS. * src/libgcrypt.c.in: Distinguish static link use case. * tests/Makefile.am: Fix use of -lgpg-error. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Add POWER9 little-endian variant of PPC AES implementationJussi Kivilinna2020-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac: Add 'rijndael-ppc9le.lo'. * cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New. (RIJNDAEL_context_s): Add 'use_ppc9le_crypto'. * cipher/rijndael.c (_gcry_aes_ppc9le_encrypt) (_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc) (_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc) (_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec) (_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth) (_gcry_aes_ppc9le_xts_crypt): New. (do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt) [USE_PPC_CRYPTO_WITH_PPC9LE]: New. * cipher/rijndael-ppc.c: Split common code to headers 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'. (asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New. * cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'. (CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2. (XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm instruction. * cipher/rijndael-ppc9le.c: New. -- Provide POWER9 little-endian optimized variant of PPC vcrypto AES implementation. This implementation uses 'lxvb16x' and 'stxvb16x' instructions to load/store vectors directly in big-endian order. Benchmark on POWER9 (~3.8Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>