summaryrefslogtreecommitdiff
path: root/mpi
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright notices to use URL.NIIBE Yutaka2023-04-2795-208/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * build-aux/db2any: Update copyright notice. * cipher/arcfour.c, cipher/blowfish.ccipher/cast5.c: Likewise. * cipher/crc-armv8-ce.c, cipher/crc-intel-pclmul.c: Likewise. * cipher/crc-ppc.c, cipher/crc.c, cipher/des.c: Likewise. * cipher/md2.c, cipher/md4.c, cipher/md5.c: Likewise. * cipher/primegen.c, cipher/rfc2268.c, cipher/rmd160.c: Likewise. * cipher/seed.c, cipher/serpent.c, cipher/tiger.c: Likewise. * cipher/twofish.c: Likewise. * mpi/alpha/mpih-add1.S, mpi/alpha/mpih-lshift.S: Likewise. * mpi/alpha/mpih-mul1.S, mpi/alpha/mpih-mul2.S: Likewise. * mpi/alpha/mpih-mul3.S, mpi/alpha/mpih-rshift.S: Likewise. * mpi/alpha/mpih-sub1.S, mpi/alpha/udiv-qrnnd.S: Likewise. * mpi/amd64/mpih-add1.S, mpi/amd64/mpih-lshift.S: Likewise. * mpi/amd64/mpih-mul1.S, mpi/amd64/mpih-mul2.S: Likewise. * mpi/amd64/mpih-mul3.S, mpi/amd64/mpih-rshift.S: Likewise. * mpi/amd64/mpih-sub1.S, mpi/config.links: Likewise. * mpi/generic/mpih-add1.c, mpi/generic/mpih-lshift.c: Likewise. * mpi/generic/mpih-mul1.c, mpi/generic/mpih-mul2.c: Likewise. * mpi/generic/mpih-mul3.c, mpi/generic/mpih-rshift.c: Likewise. * mpi/generic/mpih-sub1.c, mpi/generic/udiv-w-sdiv.c: Likewise. * mpi/hppa/mpih-add1.S, mpi/hppa/mpih-lshift.S: Likewise. * mpi/hppa/mpih-rshift.S, mpi/hppa/mpih-sub1.S: Likewise. * mpi/hppa/udiv-qrnnd.S, mpi/hppa1.1/mpih-mul1.S: Likewise. * mpi/hppa1.1/mpih-mul2.S, mpi/hppa1.1/mpih-mul3.S: Likewise. * mpi/hppa1.1/udiv-qrnnd.S, mpi/i386/mpih-add1.S: Likewise. * mpi/i386/mpih-lshift.S, mpi/i386/mpih-mul1.S: Likewise. * mpi/i386/mpih-mul2.S, mpi/i386/mpih-mul3.S: Likewise. * mpi/i386/mpih-rshift.S, mpi/i386/mpih-sub1.S: Likewise. * mpi/i386/syntax.h, mpi/longlong.h: Likewise. * mpi/m68k/mc68020/mpih-mul1.S, mpi/m68k/mc68020/mpih-mul2.S: Likewise. * mpi/m68k/mc68020/mpih-mul3.S, mpi/m68k/mpih-add1.S: Likewise. * mpi/m68k/mpih-lshift.S, mpi/m68k/mpih-rshift.S: Likewise. * mpi/m68k/mpih-sub1.S, mpi/m68k/syntax.h: Likewise. * mpi/mips3/mpih-add1.S, mpi/mips3/mpih-lshift.S: Likewise. * mpi/mips3/mpih-mul1.S, mpi/mips3/mpih-mul2.S: Likewise. * mpi/mips3/mpih-mul3.S, mpi/mips3/mpih-rshift.S: Likewise. * mpi/mips3/mpih-sub1.S, mpi/mpi-add.c: Likewise. * mpi/mpi-bit.c, mpi/mpi-cmp.c, mpi/mpi-div.c: Likewise. * mpi/mpi-gcd.c, mpi/mpi-inline.c, mpi/mpi-inline.h: Likewise. * mpi/mpi-internal.h, mpi/mpi-mpow.c, mpi/mpi-mul.c: Likewise. * mpi/mpi-scan.c, mpi/mpih-div.c, mpi/mpih-mul.c: Likewise. * mpi/pa7100/mpih-lshift.S, mpi/pa7100/mpih-rshift.S: Likewise. * mpi/power/mpih-add1.S, mpi/power/mpih-lshift.S: Likewise. * mpi/power/mpih-mul1.S, mpi/power/mpih-mul2.S: Likewise. * mpi/power/mpih-mul3.S, mpi/power/mpih-rshift.S: Likewise. * mpi/power/mpih-sub1.S, mpi/powerpc32/mpih-add1.S: Likewise. * mpi/powerpc32/mpih-lshift.S, mpi/powerpc32/mpih-mul1.S: Likewise. * mpi/powerpc32/mpih-mul2.S, mpi/powerpc32/mpih-mul3.S: Likewise. * mpi/powerpc32/mpih-rshift.S, mpi/powerpc32/mpih-sub1.S: Likewise. * mpi/powerpc32/syntax.h, mpi/sparc32/mpih-add1.S: Likewise. * mpi/sparc32/mpih-lshift.S, mpi/sparc32/mpih-rshift.S: Likewise. * mpi/sparc32/udiv.S, mpi/sparc32v8/mpih-mul1.S: Likewise. * mpi/sparc32v8/mpih-mul2.S, mpi/sparc32v8/mpih-mul3.S: Likewise. * mpi/supersparc/udiv.S: Likewise. * random/random.h, random/rndegd.c: Likewise. * src/cipher.h, src/libgcrypt.def, src/libgcrypt.vers: Likewise. * src/missing-string.c, src/mpi.h, src/secmem.h: Likewise. * src/stdmem.h, src/types.h: Likewise. * tests/aeswrap.c, tests/curves.c, tests/hmac.c: Likewise. * tests/keygrip.c, tests/prime.c, tests/random.c: Likewise. * tests/t-kdf.c, tests/testapi.c: Likewise. -- GnuPG-bug-id: 6271 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Update m4 files and Makefiles.NIIBE Yutaka2023-04-271-2/+2
| | | | | | | | | | | | | * acinclude.m4: Use URL and add SPDX identifier. * m4/noexecstack.m4: Likewise. * Makefile.am: Likewise. * doc/Makefile.am: Likewise. * mpi/Makefile.am: Likewise. * tests/Makefile.am: Likewise. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi: optimize mpi_rshift and mpi_lshift to avoid extra MPI copyingJussi Kivilinna2023-04-231-87/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * mpi/mpi-bit.c (_gcry_mpi_rshift): Refactor so that _gcry_mpih_rshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation. (_gcry_mpi_lshift): Refactor so that _gcry_mpih_lshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation. -- Benchmark on AMD Ryzen 9 7900X: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz rshift3 | 0.039 ns/B 24662 MiB/s 0.182 c/B 4700 lshift3 | 0.108 ns/B 8832 MiB/s 0.508 c/B 4700 rshift65 | 0.137 ns/B 6968 MiB/s 0.643 c/B 4700 lshift65 | 0.109 ns/B 8776 MiB/s 0.511 c/B 4700 After: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz rshift3 | 0.038 ns/B 25049 MiB/s 0.179 c/B 4700 lshift3 | 0.039 ns/B 24709 MiB/s 0.181 c/B 4700 rshift65 | 0.038 ns/B 24942 MiB/s 0.180 c/B 4700 lshift65 | 0.040 ns/B 23671 MiB/s 0.189 c/B 4700 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/amd64: optimize add_n and sub_nJussi Kivilinna2023-04-232-25/+136
| | | | | | | | | | | | | | | | | | | | | * mpi/amd64/mpih-add1.S (_gcry_mpih_add_n): New implementation with 4x unrolled fast-path loop. * mpi/amd64/mpih-sub1.S (_gcry_mpih_sub_n): Likewise. -- Benchmark on AMD Ryzen 9 7900X: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz add | 0.035 ns/B 27559 MiB/s 0.163 c/B 4700 sub | 0.034 ns/B 28332 MiB/s 0.158 c/B 4700 After (~26% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz add | 0.027 ns/B 35271 MiB/s 0.127 c/B 4700 sub | 0.027 ns/B 35206 MiB/s 0.127 c/B 4700 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/amd64: fix use of 'movd' for 64-bit register move in lshift&rshiftJussi Kivilinna2023-04-232-2/+2
| | | | | | | | | * mpi/amd64/mpih-lshift.S: Use 'movq' instead of 'movd' for moving value to %rax. * mpi/amd64/mpih-rshift.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: avoid MPI copy at gcry_mpi_subJussi Kivilinna2023-04-231-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | * mpi/mpi-add.c (_gcry_mpi_add): Rename function... (_gcry_mpi_add_inv_sign): ... to this and add parameter for inverting sign of second operand. (_gcry_mpi_add): New. (_gcry_mpi_sub): Remove mpi_copy and instead use new '_gcry_mpi_add_inv_sign' function with inverted sign for second operand. -- Benchmark on AMD Ryzen 9 7900X: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz add | 0.052 ns/B 18301 MiB/s 0.287 c/B 5500 sub | 0.098 ns/B 9768 MiB/s 0.537 c/B 5500 After: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz add | 0.030 ns/B 31771 MiB/s 0.165 c/B 5500 sub | 0.031 ns/B 31187 MiB/s 0.168 c/B 5500 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aarch64-asm: align functions to 16 bytesJussi Kivilinna2023-01-195-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/camellia-aarch64.S: Align functions to 16 bytes. * cipher/chacha20-aarch64.S: Likewise. * cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise. * cipher/crc-armv8-aarch64-ce.S: Likewise. * cipher/rijndael-aarch64.S: Likewise. * cipher/rijndael-armv8-aarch64-ce.S: Likewise. * cipher/sha1-armv8-aarch64-ce.S: Likewise. * cipher/sha256-armv8-aarch64-ce.S: Likewise. * cipher/sha512-armv8-aarch64-ce.S: Likewise. * cipher/sm3-aarch64.S: Likewise. * cipher/sm3-armv8-aarch64-ce.S: Likewise. * cipher/sm4-aarch64.S: Likewise. * cipher/sm4-armv8-aarch64-ce.S: Likewise. * cipher/sm4-armv9-aarch64-sve-ce.S: Likewise. * cipher/twofish-aarch64.S: Likewise. * mpi/aarch64/mpih-add1.S: Likewise. * mpi/aarch64/mpih-mul1.S: Likewise. * mpi/aarch64/mpih-mul2.S: Likewise. * mpi/aarch64/mpih-mul3.S: Likewise. * mpi/aarch64/mpih-sub1.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/amd64: align functions and inner loops to 16 bytesJussi Kivilinna2023-01-197-8/+14
| | | | | | | | | | | | | * mpi/amd64/mpih-add1.S: Align function and inner loop to 16 bytes. * mpi/amd64/mpih-lshift.S: Likewise. * mpi/amd64/mpih-mul1.S: Likewise. * mpi/amd64/mpih-mul2.S: Likewise. * mpi/amd64/mpih-mul3.S: Likewise. * mpi/amd64/mpih-rshift.S: Likewise. * mpi/amd64/mpih-sub1.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add clang support for ARM 32-bit assemblyJussi Kivilinna2022-12-145-163/+163
| | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_arm_platform_as_ok) (gcry_cv_gcc_inline_asm_neon): Remove % prefix from register names. * cipher/cipher-gcm-armv7-neon.S (vmull_p64): Prefix constant values with # character instead of $. * cipher/blowfish-arm.S: Remove % prefix from all register names. * cipher/camellia-arm.S: Likewise. * cipher/cast5-arm.S: Likewise. * cipher/rijndael-arm.S: Likewise. * cipher/rijndael-armv8-aarch32-ce.S: Likewise. * cipher/sha512-arm.S: Likewise. * cipher/sha512-armv7-neon.S: Likewise. * cipher/twofish-arm.S: Likewise. * mpi/arm/mpih-add1.S: Likewise. * mpi/arm/mpih-mul1.S: Likewise. * mpi/arm/mpih-mul2.S: Likewise. * mpi/arm/mpih-mul3.S: Likewise. * mpi/arm/mpih-sub1.S: Likewise. -- Reported-by: Dmytro Kovalov <dmytro.a.kovalov@globallogic.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: update powerpc macros from GCCJussi Kivilinna2022-10-261-131/+81
| | | | | | | | | | | * mpi/longlong.h [__powerpc__, __powerpc64__]: Update macros. -- Update longlong.h powerpc macros with more up to date versions from GCC's longlong.h. Note, GCC's version is licensed under LGPLv2.1+. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong.h: i386: use tzcnt instruction for trailing zerosJussi Kivilinna2022-10-081-1/+1
| | | | | | | | | | | | * mpi/longlong.h [__i386__] (count_trailing_zeros): Add 'rep' prefix for 'bsfq'. -- "rep;bsf" aka "tzcnt" is new instruction with well defined operation on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt" functions as old "bsf" with undefined behaviour on zero input. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong.h: x86-64: use tzcnt instruction for trailing zerosJussi Kivilinna2022-10-081-1/+1
| | | | | | | | | | | | * mpi/longlong.h [__x86_64__] (count_trailing_zeros): Add 'rep' prefix for 'bsfq'. -- "rep;bsf" aka "tzcnt" is new instruction with well defined operation on zero input and as result is faster on new CPUs. On old CPUs, "tzcnt" functions as old "bsf" with undefined behaviour on zero input. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: fix generic smul_ppmm ifdefJussi Kivilinna2022-10-081-1/+1
| | | | | | | | * mpi/longlong.h [!umul_ppmm] (smul_ppmm): Change ifdef from !defined(umul_ppmm) to !defined(smul_ppmm). -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: provide generic implementation using double word typeJussi Kivilinna2022-10-081-8/+67
| | | | | | | | | | | | | | * configure.ac: Add check for 'unsigned __int128'. * mpi/longlong.h (UDWtype): Define for 32-bit or 64-bit when 'unsigned long long' or 'unsigned __int128' is available. (add_ssaaaa, sub_ddmmss, umul_ppmm, udiv_qrnnd) [UDWtype]: New. -- New generic longlong.h implementation by using 'unsigned long long' on 32-bit and 'unsigned __int128' on 64-bit (for new architectures like RISC-V). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: remove VLA usageJussi Kivilinna2022-10-022-23/+23
| | | | | | | | | | * mpi/ec-nist.c (_gcry_mpi_ec_nist192_mod, _gcry_mpi_ec_nist224_mod) (_gcry_mpi_ec_nist256_mod, _gcry_mpi_ec_nist384_mod) (_gcry_mpi_ec_nist521_mod): Avoid VLA for arrays. * mpi/ec.c (ec_secp256k1_mod): Avoid VLA for arrays. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* More clean up.NIIBE Yutaka2022-09-161-1/+1
| | | | | | | | | | * cipher/cipher-ccm.c (_gcry_cipher_ccm_tag): Add static qualifier. * mpi/ec-ed25519.c: Include ec-internal.h. * src/secmem.c (MB_WIPE_OUT): Remove extra semicolon. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Minor clean up.NIIBE Yutaka2022-09-163-4/+4
| | | | | | | | | | | * mpi/mpi-internal.h: Remove extra semicolon from the macro. * mpi/mpih-mul.c: Likewise. * src/cipher-proto.h: Remove duplication for enum pk_encoding. * mpi/mpi-pow.c (_gcry_mpi_powm): Initialize XSIZE. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi: Allow building with --disable-asm for HPPA.NIIBE Yutaka2022-05-171-2/+2
| | | | | | | | | | * mpi/longlong.h [__hppa] (udiv_qrnnd): Only define when assembler is enabled. -- GnuPG-bug-id: 5976 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi: Fix for 64-bit for _gcry_mpih_cmp_ui.NIIBE Yutaka2022-05-101-1/+8
| | | | | | | | | | | * mpi/mpih-const-time.c (_gcry_mpih_cmp_ui): Compare 64-bit value correctly. -- Reported-by: Guido Vranken <guidovranken@gmail.com> GnuPG-bug-id: 5970 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Clean up for removal of memory guard support.NIIBE Yutaka2022-02-101-7/+0
| | | | | | | | | | | | * mpi/mpiutil.c (_gcry_mpi_m_check): Remove. * src/g10lib.h (_gcry_check_heap): Remove. * src/global.c (_gcry_check_heap): Remove. * src/mpi.h (mpi_m_check): Remove. -- GnuPG-bug-id: 5822 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi: Add missing header file to the tarballJakub Jelen2022-01-251-1/+1
| | | | | | | * mpi/Makefile.am: Add missing header file. -- Signed-off-by: Jakub Jelen <jjelen@redhat.com>
* mpi/amd64: remove extra 'ret' from assembly functionsJussi Kivilinna2022-01-117-7/+0
| | | | | | | | | | | | | | * mpi/amd64/mpih-add1.S: Remove 'ret' as it is already included by FUNC_EXIT macro. * mpi/amd64/mpih-lshift.S: Likewise. * mpi/amd64/mpih-mul1.S: Likewise. * mpi/amd64/mpih-mul2.S: Likewise. * mpi/amd64/mpih-mul3.S: Likewise. * mpi/amd64/mpih-rshift.S: Likewise. * mpi/amd64/mpih-sub1.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/config.links: merge i586 targets with rest i*86 targetsJussi Kivilinna2022-01-111-49/+15
| | | | | | | * mpi/config.links: Merge i586 targets with rest i[3467]86 targets. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: remove unused i586 and pentium4 assemblyJussi Kivilinna2022-01-1121-2645/+4
| | | | | | | | | | | | | * mpi/config.links: Remove 'i586' from paths. * mpi/i586*: Remove. * mpi/pentium4/*: Remove. -- Current x86 targets (i686) have been defaulting on mpi/i386 assembly for quite some time now. Remove mpi/i586 as it is no longer used. While at it, remove mpi/pentium4 assembly also as obsolete. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add straight-line speculation hardening for aarch64 assemblyJussi Kivilinna2022-01-115-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-aarch64.h (ret_spec_stop): New. * cipher/asm-poly1305-aarch64.h: Use 'ret_spec_stop' for 'ret' instruction. * cipher/camellia-aarch64.S: Likewise. * cipher/chacha20-aarch64.S: Likewise. * cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise. * cipher/crc-armv8-aarch64-ce.S: Likewise. * cipher/rijndael-aarch64.S: Likewise. * cipher/rijndael-armv8-aarch64-ce.S: Likewise. * cipher/sha1-armv8-aarch64-ce.S: Likewise. * cipher/sha256-armv8-aarch64-ce.S: Likewise. * cipher/sm3-aarch64.S: Likewise. * cipher/twofish-aarch64.S: Likewise. * mpi/aarch64/mpih-add1.S: Likewise. * mpi/aarch64/mpih-mul1.S: Likewise. * mpi/aarch64/mpih-mul2.S: Likewise. * mpi/aarch64/mpih-mul3.S: Likewise. * mpi/aarch64/mpih-sub1.S: Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add straight-line speculation hardening for amd64 and i386 assemblyJussi Kivilinna2022-01-1110-34/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-amd64.h (ret_spec_stop): New. * cipher/arcfour-amd64.S: Use 'ret_spec_stop' for 'ret' instruction. * cipher/blake2b-amd64-avx2.S: Likewise. * cipher/blake2s-amd64-avx.S: Likewise. * cipher/blowfish-amd64.S: Likewise. * cipher/camellia-aesni-avx-amd64.S: Likewise. * cipher/camellia-aesni-avx2-amd64.h: Likewise. * cipher/cast5-amd64.S: Likewise. * cipher/chacha20-amd64-avx2.S: Likewise. * cipher/chacha20-amd64-ssse3.S: Likewise. * cipher/des-amd64.S: Likewise. * cipher/rijndael-aarch64.S: Likewise. * cipher/rijndael-amd64.S: Likewise. * cipher/rijndael-ssse3-amd64-asm.S: Likewise. * cipher/rijndael-vaes-avx2-amd64.S: Likewise. * cipher/salsa20-amd64.S: Likewise. * cipher/serpent-avx2-amd64.S: Likewise. * cipher/serpent-sse2-amd64.S: Likewise. * cipher/sha1-avx-amd64.S: Likewise. * cipher/sha1-avx-bmi2-amd64.S: Likewise. * cipher/sha1-avx2-bmi2-amd64.S: Likewise. * cipher/sha1-ssse3-amd64.S: Likewise. * cipher/sha256-avx-amd64.S: Likewise. * cipher/sha256-avx2-bmi2-amd64.S: Likewise. * cipher/sha256-ssse3-amd64.S: Likewise. * cipher/sha512-avx-amd64.S: Likewise. * cipher/sha512-avx2-bmi2-amd64.S: Likewise. * cipher/sha512-ssse3-amd64.S: Likewise. * cipher/sm3-avx-bmi2-amd64.S: Likewise. * cipher/sm4-aesni-avx-amd64.S: Likewise. * cipher/sm4-aesni-avx2-amd64.S: Likewise. * cipher/twofish-amd64.S: Likewise. * cipher/twofish-avx2-amd64.S: Likewise. * cipher/whirlpool-sse2-amd64.S: Likewise. * mpi/amd64/func_abi.h (CFI_*): Remove, include from "asm-common-amd64.h" instead. (FUNC_EXIT): Use 'ret_spec_stop' for 'ret' instruction. * mpi/asm-common-amd64.h: New. * mpi/i386/mpih-add1.S: Use 'ret_spec_stop' for 'ret' instruction. * mpi/i386/mpih-lshift.S: Likewise. * mpi/i386/mpih-mul1.S: Likewise. * mpi/i386/mpih-mul2.S: Likewise. * mpi/i386/mpih-mul3.S: Likewise. * mpi/i386/mpih-rshift.S: Likewise. * mpi/i386/mpih-sub1.S: Likewise. * mpi/i386/syntax.h (ret_spec_stop): New. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* gcry_mpi_sub_ui: fix subtracting from negative valueJussi Kivilinna2021-12-011-0/+1
| | | | | | | | | | | | | | | * mpi/mpi-add.c (_gcry_mpi_sub_ui): Set output sign bit when 'u' is negative. * tests/mpitests.c (test_add): Additional tests for mpi_add_ui; Check test output and fail if output does not match expected. (test_sub): Additional tests for mpi_sub_ui; Check test output and fail if output does not match expected. (test_mul): Additional tests for mpi_mul_ui; Check test output and fail if output does not match expected. -- Reported-by: Guido Vranken <guidovranken@gmail.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: Allow opaque MPI with zero length.NIIBE Yutaka2021-10-291-4/+7
| | | | | | | | * mpi/mpiutil.c (_gcry_mpi_copy): Support zero length. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi/longlong: fix variable shadowing from MIPS umul_ppmm macrosJussi Kivilinna2021-08-261-9/+9
| | | | | | | | | * mpi/longlong.h [__mips__ && W_TIPE_SIZE == 32] (umul_ppmm): Rename temporary variable '_r' to '__r'. [__mips && W_TIPE_SIZE == 64] (umul_ppmm): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ec: add zSeries/s390x accelerated scalar multiplicationJussi Kivilinna2021-07-024-2/+431
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-inline-s390x.h (PCC_FUNCTION_*): New. (pcc_query, pcc_scalar_multiply): New. * mpi/Makefile.am: Add 'ec-hw-s390x.c'. * mpi/ec-hw-s390x.c: New. * mpi/ec-internal.h (_gcry_s390x_ec_hw_mul_point) (mpi_ec_hw_mul_point): New. * mpi/ec.c (_gcry_mpi_ec_mul_point): Call 'mpi_ec_hw_mul_point'. * src/g10lib.h (HWF_S390X_MSA_9): New. * src/hwf-s390x.c (s390x_features): Add MSA9. * src/hwfeatures.c (hwflist): Add 's390x-msa-9'. -- Patch adds ECC scalar multiplication acceleration using s390x's PCC instruction. Following curves are supported: - Ed25519 - Ed448 - X25519 - X448 - NIST curves P-256, P-384 and P-521 Benchmark on z15 (5.2Ghz): Before: Ed25519 | nanosecs/iter cycles/iter mult | 389791 2026916 keygen | 572017 2974487 sign | 636603 3310336 verify | 1189097 6183305 = X25519 | nanosecs/iter cycles/iter mult | 296805 1543385 = Ed448 | nanosecs/iter cycles/iter mult | 1693373 8805541 keygen | 2382473 12388858 sign | 2609562 13569725 verify | 5177606 26923552 = X448 | nanosecs/iter cycles/iter mult | 1136178 5908127 = NIST-P256 | nanosecs/iter cycles/iter mult | 792620 4121625 keygen | 4627835 24064740 sign | 1528268 7946991 verify | 1678205 8726664 = NIST-P384 | nanosecs/iter cycles/iter mult | 1766418 9185373 keygen | 10158485 52824123 sign | 3341172 17374095 verify | 3694750 19212700 = NIST-P521 | nanosecs/iter cycles/iter mult | 3172566 16497346 keygen | 18184747 94560683 sign | 6039956 31407771 verify | 6480882 33700588 After: Ed25519 | nanosecs/iter cycles/iter speed-up mult | 25913 134746 15x keygen | 44447 231124 12x sign | 106928 556028 6x verify | 164681 856341 7x = X25519 | nanosecs/iter cycles/iter speed-up mult | 17761 92358 16x = Ed448 | nanosecs/iter cycles/iter speed-up mult | 50808 264199 33x keygen | 68644 356951 34x sign | 317446 1650720 8x verify | 457115 2376997 11x = X448 | nanosecs/iter cycles/iter speed-up mult | 35637 185313 31x = NIST-P256 | nanosecs/iter cycles/iter speed-up mult | 30678 159528 25x keygen | 323722 1683356 14x sign | 114176 593713 13x verify | 169901 883487 9x = NIST-P384 | nanosecs/iter cycles/iter speed-up mult | 59966 311822 29x keygen | 607778 3160445 16x sign | 209832 1091128 16x verify | 329506 1713431 11x = NIST-P521 | nanosecs/iter cycles/iter speed-up mult | 98230 510797 32x keygen | 1131686 5884765 16x sign | 397777 2068442 15x verify | 623076 3239998 10x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: optimizations for MPI scanning and printingJussi Kivilinna2021-07-011-107/+203
| | | | | | | | | | | | | | | | | | | | | | | * mpi/mpicoder.c (mpi_read_from_buffer): Add word-size buffer reading loop using 'buf_get_be(32|64)'. (mpi_fromstr): Use look-up tables for HEX conversion; Add fast-path loop for converting 8 hex-characters at once; Add string length parameter. (do_get_buffer): Use 'buf_put_be(32|64)' instead of byte writes; Add fast-path for reversing buffer with 'buf_get_(be64|be32|le64|le32)'. (_gcry_mpi_set_buffer): Use 'buf_get_be(32|64)' instead of byte reads. (twocompl): Use _gcry_ctz instead of open-coded if-clauses to get first bit set; Add fast-path for inverting buffer with 'buf_get_(he64|he32)'. (_gcry_mpi_scan): Use 'buf_get_be32' where possible; Provide string length to 'mpi_fromstr'. (_gcry_mpi_print): Use 'buf_put_be32' where possible; Use look-up table for HEX conversion; Add fast-path loop for converting to 8 hex-characters at once. * tests/t-convert.c (check_formats): Add new tests for larger values. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: cache converted field_table MPIsJussi Kivilinna2021-07-011-6/+16
| | | | | | | | * mpi/ec.c (field_table_mpis): New. (ec_p_init): Cache converted field table MPIs. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi_ec_get_affine: fast path for Z==1 caseJussi Kivilinna2021-07-011-0/+18
| | | | | | | | * mpi/ec.c (_gcry_mpi_ec_get_affine): Return X and Y as is if Z is 1 (for Weierstrass and Edwards curves). -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ec-nist: fix 'mod p' carry adjustment and output maskingJussi Kivilinna2021-06-302-53/+99
| | | | | | | | | | | | | | | | | | * mpi/ec-inline.h (MASK_AND64, LIMB_OR64): New. [__x86_64__]: Use "rme" operand type instead of "g" to fix use of large 32-bit constants. * mpi/ec-nist.c (_gcry_mpi_ec_nist192_mod, _gcry_mpi_ec_nist224_mod) (_gcry_mpi_ec_nist256_mod, _gcry_mpi_ec_nist384_mod): At end, check if 's[]' is negative instead result of last addition, for output masks; Use 'p_mult' table entry for P instead of 'ctx->p'. (_gcry_mpi_ec_nist256_mod): Handle corner case were 2*P needs to be added after carry based subtraction. * tests/t-mpi-point.c (check_ec_mul_reduction): New. (main): Call 'check_ec_mul_reduction'. -- GnuPG-bug-id: T5510 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: add fast reduction for secp256k1Jussi Kivilinna2021-06-191-0/+62
| | | | | | | | | | | | | | | | | | | * mpi/ec.c (ec_secp256k1_mod): New. (field_table): Add 'secp256k1'. * tests/t-mpi-point.c (check_ec_mul): Add secp256k1 test vectors. -- Benchmark on Ryzen 7 5800X (x86_64): Before: secp256k1 | nanosecs/iter cycles/iter auto Mhz mult | 482336 2340443 4852 After (~20% faster): secp256k1 | nanosecs/iter cycles/iter auto Mhz mult | 392941 1906540 4852 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: add fast reduction functions for NIST curvesJussi Kivilinna2021-06-197-18/+1939
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (ASM_DISABLED): New. * mpi/Makefile.am: Add 'ec-nist.c' and 'ec-inline.h'. * mpi/ec-nist.c: New. * mpi/ec-inline.h: New. * mpi/ec-internal.h (_gcry_mpi_ec_nist192_mod) (_gcry_mpi_ec_nist224_mod, _gcry_mpi_ec_nist256_mod) (_gcry_mpi_ec_nist384_mod, _gcry_mpi_ec_nist521_mod): New. * mpi/ec.c (ec_addm, ec_subm, ec_mulm, ec_mul2): Use 'ctx->mod'. (field_table): Add 'mod' function; Add NIST reduction functions. (ec_p_init): Setup ctx->mod; Setup function pointers from field_table only if pointer is not NULL; Resize ctx->a and ctx->b only if set. * mpi/mpi-internal.h (RESIZE_AND_CLEAR_IF_NEEDED): New. * mpi/mpiutil.c (_gcry_mpi_resize): Clear all unused limbs also in realloc case. * src/ec-context.h (mpi_ec_ctx_s): Add 'mod' function. -- Benchmark on AMD Ryzen 7 5800X (x86_64): Before: NIST-P192 | nanosecs/iter cycles/iter auto Mhz mult | 283346 1369473 4833 keygen | 1688442 8185744 4848 sign | 549683 2662984 4845 verify | 615284 2984325 4850 = NIST-P224 | nanosecs/iter cycles/iter auto Mhz mult | 516443 2501173 4843 keygen | 2859746 13866802 4849 sign | 918472 4455043 4850 verify | 1057940 5131372 4850 = NIST-P256 | nanosecs/iter cycles/iter auto Mhz mult | 423536 2054040 4850 keygen | 2383097 11557572 4850 sign | 774346 3754243 4848 verify | 864934 4196315 4852 = NIST-P384 | nanosecs/iter cycles/iter auto Mhz mult | 929985 4511881 4852 keygen | 5230788 25367299 4850 sign | 1671432 8109726 4852 verify | 1902729 9228568 4850 = NIST-P521 | nanosecs/iter cycles/iter auto Mhz mult | 2123546 10300952 4851 keygen | 12019340 58297774 4850 sign | 3886988 18853054 4850 verify | 4507885 21864015 4850 After: NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 186679 905603 4851 +51% keygen | 1161423 5623822 4842 +46% sign | 389531 1887557 4846 +41% verify | 412936 2000461 4844 +49% = NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 260621 1256327 4821 +99% keygen | 1557845 7531677 4835 +84% sign | 521678 2527083 4844 +76% verify | 554084 2677949 4833 +92% = NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 319045 1542061 4833 +33% keygen | 1834822 8898950 4850 +30% sign | 612866 2972630 4850 +26% verify | 664821 3222597 4847 +30% = NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 593894 2875260 4841 +57% keygen | 3526600 17089717 4846 +48% sign | 1178098 5710151 4847 +42% verify | 1260185 6107449 4846 +51% = NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 1160220 5621946 4846 +83% keygen | 6862975 33247351 4844 +75%ยด sign | 2287366 11096711 4851 +70% verify | 2455858 11888045 4841 +84% Benchmark on AMD Ryzen 7 5800X (i386): Before: NIST-P192 | nanosecs/iter cycles/iter auto Mhz mult | 648039 3143236 4850 keygen | 3554452 17244822 4852 sign | 1163173 5641932 4850 verify | 1300076 6305673 4850 = NIST-P224 | nanosecs/iter cycles/iter auto Mhz mult | 798607 3874405 4851 keygen | 4657604 22589864 4850 sign | 1515803 7352049 4850 verify | 1635470 7935373 4852 = NIST-P256 | nanosecs/iter cycles/iter auto Mhz mult | 927033 4496283 4850 keygen | 5313601 25771983 4850 sign | 1735795 8418514 4850 verify | 1945804 9438212 4851 = NIST-P384 | nanosecs/iter cycles/iter auto Mhz mult | 2301781 11164473 4850 keygen | 12856001 62353242 4850 sign | 4161041 20180651 4850 verify | 4705961 22827478 4851 = NIST-P521 | nanosecs/iter cycles/iter auto Mhz mult | 6066635 29422721 4850 keygen | 32995868 160046407 4850 sign | 10503306 50945387 4850 verify | 12225252 59294323 4850 After: NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 413605 2007498 4854 +57% keygen | 2479429 12010926 4844 +44% sign | 825111 3997147 4844 +41% verify | 890206 4318723 4851 +46% = NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 551703 2676454 4851 +45% keygen | 3257022 15781844 4845 +43% sign | 1085678 5258894 4844 +40% verify | 1172195 5678499 4844 +40% = NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 720395 3497486 4855 +29% keygen | 4217758 20461257 4851 +26% sign | 1404350 6814131 4852 +24% verify | 1515136 7353955 4854 +28% = NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 1525742 7400771 4851 +51% keygen | 9046660 43877889 4850 +42% sign | 2974641 14408703 4844 +40% verify | 3265285 15834951 4849 +44% = NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up mult | 3289348 15968678 4855 +84% keygen | 19354174 93873531 4850 +70% sign | 6351493 30830140 4854 +65% verify | 6979292 33854215 4851 +75% Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: small optimization for ec_mulm_448Jussi Kivilinna2021-06-191-54/+22
| | | | | | | | | | | | | | | | | | | | | | | | | * mpi/ec.c (ec_addm_448, ec_subm_448): Change order of sub_n and set_cond to remove need to clear 'n'. (ec_mulm_448): Use memcpy where possible; Use mpih_rshift where possible; Use mpih_lshift for doubling a3; Remove one addition at end. -- Benchmarks on AMD Ryzen 7 5800X: Before: Ed448 | nanosecs/iter cycles/iter auto Mhz keygen | 893096 4343326 4863 sign | 988422 4795694 4852 verify | 1899706 9215952 4851 After (~5% faster): Ed448 | nanosecs/iter cycles/iter auto Mhz keygen | 822078 3987952 4851 sign | 947327 4595433 4851 verify | 1776259 8616675 4851 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/ec: small optimization for ec_mulm_25519Jussi Kivilinna2021-06-191-29/+12
| | | | | | | | | | | | | | | | | | | | | | | | | * mpi/ec.c (ec_addm_25519): Remove one addition. (ec_subm_25519): Change order of add_n and set_cond to remove need to clear 'n'. (ec_mulm_25519): Avoid extra memory copies; Use _gcry_mpih_addmul_1 for multiplying by 19 and adding; Remove one addition at end. -- Benchmarks on AMD Ryzen 7 5800X: Before: Ed25519 | nanosecs/iter cycles/iter auto Mhz keygen | 304980 1478913 4849 sign | 328657 1589657 4837 verify | 625133 3032355 4851 After (~22% faster): Ed25519 | nanosecs/iter cycles/iter auto Mhz keygen | 244288 1184862 4850 sign | 267831 1298934 4850 verify | 504745 2449106 4852 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong.h: fix missing macro parameter parenthesesJussi Kivilinna2021-06-191-7/+7
| | | | | | | | | | | | | * mpi/longlong.h [__alpha] (umul_ppmm): Add parentheses around used parameters. [__i370__] (sdiv_qrnnd): Ditto. [__mips__] (umul_ppmm): Ditto. [__vax__] (sdiv_qrnnd): Ditto. -- Noticed issue after wrong results on mips64 with new mpi/ec code. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: harden add_n_cond, sub_n_cond and abs_cond against EM leakageJussi Kivilinna2021-04-091-14/+20
| | | | | | | | | | | | * mpi/mpih-const-time.c (_gcry_mpih_add_n_cond) (_gcry_mpih_sub_n_cond): Always perform calculation with both UP and VP; Use two masks for selecting output. (_gcry_mpih_abs_cond): Always calculate absolute value of UP; Use two masks for selecting output. -- GnuPG-bug-id: T5330 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: harden set_cond functions against EM leakageJussi Kivilinna2021-04-092-12/+18
| | | | | | | | | | | * mpi/mpih-const-time.c (_gcry_mpih_set_cond): Use two masks for selecting output. * mpi/mpiutil.c (_gcry_mpi_set_cond): Use two masks for selecting output. -- GnuPG-bug-id: T5330 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: harden swap_cond functions against EM leakageJussi Kivilinna2021-04-092-16/+35
| | | | | | | | | | | * mpi/mpih-const-time.c (vzero, vone): New. (_gcry_mpih_swap_cond): Use two masks for selecting output. * mpi/mpiutil.c (vzero, vone): New. (_gcry_mpi_swap_cond): Use to masks for selecting output. -- GnuPG-bug-id: T5330 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/aarch64: use C_SYMBOL_NAME for assembly function namesJussi Kivilinna2021-04-015-20/+20
| | | | | | | | | | | | * mpi/aarch64/mpih-add1.S: Add missing C_SYMBOL_NAME. * mpi/aarch64/mpih-mul1.S: Add missing C_SYMBOL_NAME. * mpi/aarch64/mpih-mul2.S: Add missing C_SYMBOL_NAME. * mpi/aarch64/mpih-mul3.S: Add missing C_SYMBOL_NAME. * mpi/aarch64/mpih-sub1.S: Add missing C_SYMBOL_NAME. -- GnuPG-bug-id: T5370 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ecc: Fix the regression of gcry_mpi_ec_add.NIIBE Yutaka2021-03-301-12/+12
| | | | | | | | | | | | | | | | * mpi/ec.c (_gcry_mpi_ec_point_resize): Export the routine for internal use. (add_points_edwards, _gcry_mpi_ec_mul_point): Use mpi_point_resize. * src/gcrypt-int.h (_gcry_mpi_ec_point_resize): Declare. * src/visibility.c (gcry_mpi_ec_dup, gcry_mpi_ec_add): Make sure for the size of limb before calling the internal functions. (gcry_mpi_ec_sub): Likewise. -- GnuPG-bug-id: 5372 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Fix ubsan warnings for i386 buildJussi Kivilinna2021-02-031-8/+8
| | | | | | | | | * mpi/mpicoder.c (_gcry_mpi_set_buffer) [BYTES_PER_MPI_LIMB == 4]: Cast "*p--" values to mpi_limb_t before left shifting. * tests/t-lock.c (main): Cast 'time(NULL)' to unsigned type. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi: Fix _gcry_mpih_mod implementation.NIIBE Yutaka2021-01-271-2/+3
| | | | | | | | | | | * mpi/mpih-const-time.c (_gcry_mpih_mod): Handle the overflow. -- GnuPG-bug-id: 5269 Reported-by: Guido Vranken <guidovranken@gmail.com> Fixes-commit: 95bdfd9ce9e114f447f3639e551e8f4f63d024fe Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* mpi/longlong: make use of compiler provided __builtin_ctz/__builtin_clzJussi Kivilinna2021-01-201-0/+20
| | | | | | | | | | | * configure.ac (gcry_cv_have_builtin_ctzl, gcry_cv_have_builtin_clz) (gcry_cv_have_builtin_clzl): New checks. * mpi/longlong.h (count_leading_zeros, count_trailing_zeros): Use __buildin_clz[l]/__builtin_ctz[l] if available and bit counting macros not yet provided by inline assembly. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* mpi/longlong: add s390x/zSeries macrosJussi Kivilinna2020-12-301-0/+48
| | | | | | | | * mpi/longlong.h [__s390x__] (add_ssaaaa, sub_ddmmss, UTItype) (umul_ppmm, udiv_qrnnd): New. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* hwf: add detection of s390x/zSeries hardware featuresJussi Kivilinna2020-12-181-0/+5
| | | | | | | | | | | | | | * configure.ac (gcry_cv_gcc_inline_asm_s390x) (HAVE_CPU_ARCH_S390X): Add s390x detection support. * mpi/config.links: Add setup for s390x links. * src/Makefile.am: Add 'hwf-s390x.c'. * src/g10lib.h (HWF_S390X_MSA, HWF_S390X_MSA_4, HWF_S390X_8): New. * src/hwf_common.h (_gcry_hwf_detect_s390x): New. * src/hwf-s390x.c: New. * src/hwfeatures.c: Add "s390x-msa", "s390x-msa-4" and "s390x-msa-8". -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aarch64: mpi/longlong.h: fix operand size mismatchJussi Kivilinna2020-12-181-3/+7
| | | | | | | | | | | * mpi/longlong.h [__aarch64__] (count_leading_zeros): Use correctly sized temporary variable for asm output. -- Patch fixes clang-8 warning about differently sized inline assembly operands seen on aarch64. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>