| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (--enable-random-daemon): Fix the message.
* random/random-csprng.c [USE_RANDOM_DAEMON] (initialize_basics):
Remove the dependency to random daemon.
* random/random.h [USE_RANDOM_DAEMON]: Likewise.
--
GnuPG-bug-id: 5706
Fixes-commit: 754ad5815b5bb7462260414f2bc5f449bee0b1c6
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'sm3-avx-bmi2-amd64.S'.
* cipher/sm3-avx-bmi2-amd64.S: New.
* cipher/sm3.c (USE_AVX_BMI2, ASM_FUNC_ABI, ASM_EXTRA_STACK): New.
(SM3_CONTEXT): Define 'h' as array instead of separate fields 'h1',
'h2', etc.
[USE_AVX_BMI2] (_gcry_sm3_transform_amd64_avx_bmi2)
(do_sm3_transform_amd64_avx_bmi2): New.
(sm3_init): Select AVX/BMI2 transform function if support by HW; Update
to use 'hd->h' as array.
(transform_blk, sm3_final): Update to use 'hd->h' as array.
* configure.ac: Add 'sm3-avx-bmi2-amd64.lo'.
--
Benchmark on AMD Zen3:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 2.18 ns/B 436.6 MiB/s 10.59 c/B 4850
After (~43% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 1.52 ns/B 627.4 MiB/s 7.37 c/B 4850
Benchmark on Intel Skylake:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 4.35 ns/B 219.2 MiB/s 13.48 c/B 3098
After (~34% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 3.24 ns/B 294.4 MiB/s 10.04 c/B 3098
Benchmark on AMD Zen2:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 2.73 ns/B 348.9 MiB/s 11.86 c/B 4339
After (~38% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SM3 | 1.97 ns/B 483.0 MiB/s 8.52 c/B 4318
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Collect assembly implementation *.lo files under
GCRYPT_ASM_CIPHERS and GCRYPT_ASM_DIGEST for --disable-asm
selection.
--
GnuPG-bug-id: 5694
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac [host=s390x-*-*]: Add 'poly1305-s390x.lo'.
* cipher/Makefile.am: Move 'poly1305-s390x.S' to
'EXTRA_libcipher_la_SOURCES'.
--
GnuPG-bug-id: 5694
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* configure.ac (DEF_HMAC_BINARY_CHECK): Fix quatation.
--
GnuPG-bug-id: 5550
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add getentropy random module.
* random/Makefile.am (EXTRA_librandom_la_SOURCES): Add.
--
GnuPG-bug-id: 5636
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (DEF_HMAC_BINARY_CHECK): New SUBSTITUTION.
(DL_LIBS): Fix the condition.
* src/Makefile.am (libgcrypt_la_CFLAGS): Use DEF_HMAC_BINARY_CHECK.
(hmac256_CFLAGS): Likewise.
--
GnuPG-bug-id: 5550
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (FALLBACK_SOCKLEN_T): Remove.
* src/gcrypt.h.in: Remove FALLBACK_SOCKLEN_T.
--
GnuPG-bug-id: 5637
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (INSERT_SYS_SELECT_H): Remove.
Remove checking sys/select.h.
* src/gcrypt.h.in: Remove INSERT_SYS_SELECT_H.
--
It has been no use any more.
GnuPG-bug-id: 5637
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* README: Document new --with-fips-module-version=version switch
* configure.ac: Implementation of the --with-fips-module-version
* src/global.c (print_config): Print FIPS module version from above
--
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Moved the module version to a 3rd field to keep the semantics of that
line.
Signed-off-by: Werner Koch <wk@gnupg.org>
GnuPG-bug-id: 1600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac [ENABLE_HMAC_BINARY_CHECK]: Check objcopy.
(USE_HMAC_BINARY_CHECK): New Automake conditional.
* src/Makefile.am (libgcrypt.la.done): New target.
[USE_HMAC_BINARY_CHECK] (libgcrypt.so.hmac): Compute the hash.
[USE_HMAC_BINARY_CHECK] (libgcrypt.la.done): Add .hmac section.
--
GnuPG-bug-id: 5550
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
| |
* configure.ac (AC_CHECK_HEADERS): Remove sys/msg.h.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (ASM_DISABLED): New.
* mpi/Makefile.am: Add 'ec-nist.c' and 'ec-inline.h'.
* mpi/ec-nist.c: New.
* mpi/ec-inline.h: New.
* mpi/ec-internal.h (_gcry_mpi_ec_nist192_mod)
(_gcry_mpi_ec_nist224_mod, _gcry_mpi_ec_nist256_mod)
(_gcry_mpi_ec_nist384_mod, _gcry_mpi_ec_nist521_mod): New.
* mpi/ec.c (ec_addm, ec_subm, ec_mulm, ec_mul2): Use
'ctx->mod'.
(field_table): Add 'mod' function; Add NIST reduction
functions.
(ec_p_init): Setup ctx->mod; Setup function pointers
from field_table only if pointer is not NULL; Resize
ctx->a and ctx->b only if set.
* mpi/mpi-internal.h (RESIZE_AND_CLEAR_IF_NEEDED): New.
* mpi/mpiutil.c (_gcry_mpi_resize): Clear all unused
limbs also in realloc case.
* src/ec-context.h (mpi_ec_ctx_s): Add 'mod' function.
--
Benchmark on AMD Ryzen 7 5800X (x86_64):
Before:
NIST-P192 | nanosecs/iter cycles/iter auto Mhz
mult | 283346 1369473 4833
keygen | 1688442 8185744 4848
sign | 549683 2662984 4845
verify | 615284 2984325 4850
=
NIST-P224 | nanosecs/iter cycles/iter auto Mhz
mult | 516443 2501173 4843
keygen | 2859746 13866802 4849
sign | 918472 4455043 4850
verify | 1057940 5131372 4850
=
NIST-P256 | nanosecs/iter cycles/iter auto Mhz
mult | 423536 2054040 4850
keygen | 2383097 11557572 4850
sign | 774346 3754243 4848
verify | 864934 4196315 4852
=
NIST-P384 | nanosecs/iter cycles/iter auto Mhz
mult | 929985 4511881 4852
keygen | 5230788 25367299 4850
sign | 1671432 8109726 4852
verify | 1902729 9228568 4850
=
NIST-P521 | nanosecs/iter cycles/iter auto Mhz
mult | 2123546 10300952 4851
keygen | 12019340 58297774 4850
sign | 3886988 18853054 4850
verify | 4507885 21864015 4850
After:
NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 186679 905603 4851 +51%
keygen | 1161423 5623822 4842 +46%
sign | 389531 1887557 4846 +41%
verify | 412936 2000461 4844 +49%
=
NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 260621 1256327 4821 +99%
keygen | 1557845 7531677 4835 +84%
sign | 521678 2527083 4844 +76%
verify | 554084 2677949 4833 +92%
=
NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 319045 1542061 4833 +33%
keygen | 1834822 8898950 4850 +30%
sign | 612866 2972630 4850 +26%
verify | 664821 3222597 4847 +30%
=
NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 593894 2875260 4841 +57%
keygen | 3526600 17089717 4846 +48%
sign | 1178098 5710151 4847 +42%
verify | 1260185 6107449 4846 +51%
=
NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 1160220 5621946 4846 +83%
keygen | 6862975 33247351 4844 +75%´
sign | 2287366 11096711 4851 +70%
verify | 2455858 11888045 4841 +84%
Benchmark on AMD Ryzen 7 5800X (i386):
Before:
NIST-P192 | nanosecs/iter cycles/iter auto Mhz
mult | 648039 3143236 4850
keygen | 3554452 17244822 4852
sign | 1163173 5641932 4850
verify | 1300076 6305673 4850
=
NIST-P224 | nanosecs/iter cycles/iter auto Mhz
mult | 798607 3874405 4851
keygen | 4657604 22589864 4850
sign | 1515803 7352049 4850
verify | 1635470 7935373 4852
=
NIST-P256 | nanosecs/iter cycles/iter auto Mhz
mult | 927033 4496283 4850
keygen | 5313601 25771983 4850
sign | 1735795 8418514 4850
verify | 1945804 9438212 4851
=
NIST-P384 | nanosecs/iter cycles/iter auto Mhz
mult | 2301781 11164473 4850
keygen | 12856001 62353242 4850
sign | 4161041 20180651 4850
verify | 4705961 22827478 4851
=
NIST-P521 | nanosecs/iter cycles/iter auto Mhz
mult | 6066635 29422721 4850
keygen | 32995868 160046407 4850
sign | 10503306 50945387 4850
verify | 12225252 59294323 4850
After:
NIST-P192 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 413605 2007498 4854 +57%
keygen | 2479429 12010926 4844 +44%
sign | 825111 3997147 4844 +41%
verify | 890206 4318723 4851 +46%
=
NIST-P224 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 551703 2676454 4851 +45%
keygen | 3257022 15781844 4845 +43%
sign | 1085678 5258894 4844 +40%
verify | 1172195 5678499 4844 +40%
=
NIST-P256 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 720395 3497486 4855 +29%
keygen | 4217758 20461257 4851 +26%
sign | 1404350 6814131 4852 +24%
verify | 1515136 7353955 4854 +28%
=
NIST-P384 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 1525742 7400771 4851 +51%
keygen | 9046660 43877889 4850 +42%
sign | 2974641 14408703 4844 +40%
verify | 3265285 15834951 4849 +44%
=
NIST-P521 | nanosecs/iter cycles/iter auto Mhz speed-up
mult | 3289348 15968678 4855 +84%
keygen | 19354174 93873531 4850 +70%
sign | 6351493 30830140 4854 +65%
verify | 6979292 33854215 4851 +75%
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (*-apple-darwin*): Set _DARWIN_C_SOURCE 1.
--
GnuPG-bug-id: 5440
Reported-by: Jay Freeman
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
| |
--
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Move arch specific 'cipher-gcm-*.[cS]' files
from libcipher_la_SOURCES to EXTRA_libcipher_la_SOURCES.
* configure.ac: Add 'cipher-gcm-intel-pclmul.lo' and
'cipher-gcm-arm*.lo'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* configure.ac: Add 'crc-arm*.lo', 'crc-ppc.lo', 'sha*-ppc.lo' to
GCRYPT_DIGESTS instead of GCRYPT_CIPHERS.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-ppc.c'.
* cipher/cipher-gcm-ppc.c: New.
* cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum)
(_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New.
(setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if
HW feature "ppc-vcrypto" is available.
* cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New.
(gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align
'gcm_table' to 16 bytes.
* configure.ac: Add 'cipher-gcm-ppc.lo'.
* tests/basic.c (_check_gcm_cipher): New AES256 test vector.
* AUTHORS: Add 'CRYPTOGAMS'.
* LICENSES: Add original license to 3-clause-BSD section.
--
https://dev.gnupg.org/D501:
10-20X speed.
However this Power 9 machine is faster than the last Power 9 benchmarks
on the optimized versions, so while better than the last patch, it is
not all due to the code.
Before:
GCM enc | 4.23 ns/B 225.3 MiB/s - c/B
GCM dec | 3.58 ns/B 266.2 MiB/s - c/B
GCM auth | 3.34 ns/B 285.3 MiB/s - c/B
After:
GCM enc | 0.370 ns/B 2578 MiB/s - c/B
GCM dec | 0.371 ns/B 2571 MiB/s - c/B
GCM auth | 0.159 ns/B 6003 MiB/s - c/B
Signed-off-by: Shawn Landden <shawn@git.icu>
[jk: coding style fixes, Makefile.am integration, patch from Differential
to git, commit changelog, fixed few compiler warnings]
GnuPG-bug-id: 5040
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (HAVE_GCC_INLINE_ASM_VAES_VPCLMUL): New.
* src/g10lib.h (HWF_INTEL_VAES_VPCLMUL): New.
* src/hwf-x86.c (detect_x86_gnuc): Check for VAES and VPCLMUL.
* src/hwfeatures.c (hwflist): Add "intel-vaes-vpclmul".
--
Detect support for VAES and VPCLMUL instruction sets, which allow
use of AES and PCLMUL instruction with 256-bit and 512-bit vector
registers.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
| |
--
|
| |
|
|
|
|
| |
--
|
|
|
|
| |
* configure.ac: Bump LT version to C23/A3/R1.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add check for spawn.h.
* tests/random.c: Only use posix_spawn if available.
--
Since older version doesn't have SIP or it is not enabled, no problem
using system(3).
GnuPG-bug-id: 5159
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_platform_as_ok_for_intel_syntax): Remove
assembler macro check from Intel syntax assembly support check.
* cipher/sha256-avx-amd64.S: Replace assembler macros with C
preprocessor counterparts.
* cipher/sha256-avx2-bmi2-amd64.S: Ditto.
* cipher/sha256-ssse3-amd64.S: Ditto.
* cipher/sha512-avx-amd64.S: Ditto.
* cipher/sha512-avx2-bmi2-amd64.S: Ditto.
* cipher/sha512-ssse3-amd64.S: Ditto.
--
Removing GNU assembler macros allows building these implementations with
clang.
GnuPG-bug-id: 5255
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_arm_platform_as_ok)
(gcry_cv_gcc_aarch64_platform_as_ok)
(gcry_cv_gcc_inline_asm_ssse3, gcry_cv_gcc_inline_asm_pclmul)
(gcry_cv_gcc_inline_asm_shaext, gcry_cv_gcc_inline_asm_sse41)
(gcry_cv_gcc_inline_asm_avx, gcry_cv_gcc_inline_asm_avx2)
(gcry_cv_gcc_inline_asm_bmi2, gcry_cv_gcc_as_const_division_ok)
(gcry_cv_gcc_as_const_division_with_wadivide_ok)
(gcry_cv_gcc_amd64_platform_as_ok, gcry_cv_gcc_win64_platform_as_ok)
(gcry_cv_gcc_platform_as_ok_for_intel_syntax)
(gcry_cv_gcc_inline_asm_neon, gcry_cv_gcc_inline_asm_aarch32_crypto)
(gcry_cv_gcc_inline_asm_aarch64_neon)
(gcry_cv_gcc_inline_asm_aarch64_crypto)
(gcry_cv_gcc_inline_asm_ppc_altivec)
(gcry_cv_gcc_inline_asm_ppc_arch_3_00)
(gcry_cv_gcc_inline_asm_s390x, gcry_cv_gcc_inline_asm_s390x): Use
AC_LINK_IFELSE check instead of AC_COMPILE_IFELSE.
--
LTO may defer assembly checking to linker stage, thus we need to use
AC_LINK_IFELSE instead of AC_COMPILE_IFELSE for these checks.
GnuPG-bug-id: 5255
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (force_soft_hwfeatures)
(ENABLE_FORCE_SOFT_HWFEATURES): New.
* src/hwf-x86.c (detect_x86_gnuc): Enable HWF_INTEL_FAST_SHLD
and HWF_INTEL_FAST_VPGATHER if ENABLE_FORCE_SOFT_HWFEATURES enabled.
--
Patch allows enabling HW features, that are fast only select CPU models,
on all CPUs. For example, SHLD instruction is fast on only select Intel
processors and should not be used on others. This configuration option
allows enabling these 'soft' HW features for testing purposes on all
CPUs.
Current 'soft' HW features are:
- "intel-fast-shld": supported by all x86 (but very slow on most)
- "intel-fast-vpgather": supported by all x86 with AVX2 (but slow on
most)
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|\
| |
| |
| |
| |
| |
| | |
--
Master is missing latest NEWS and some other last minute changes from
the 1.9.0 release.
|
| |
| |
| |
| | |
--
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* configure.ac (gcry_cv_have_builtin_ctzl, gcry_cv_have_builtin_clz)
(gcry_cv_have_builtin_clzl): New checks.
* mpi/longlong.h (count_leading_zeros, count_trailing_zeros): Use
__buildin_clz[l]/__builtin_ctz[l] if available and bit counting
macros not yet provided by inline assembly.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* cipher/Makefile.am: Add 'poly1305-s390x.S' and
'asm-poly1305-s390x.h'.
* cipher/asm-poly1305-s390x.h: New
* cipher/chacha20-s390x.S (_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New, stitched
chacha20-poly1305 implementation.
* cipher/chacha20.c (USE_S390X_VX_POLY1305): New.
(_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New prototypes.
(_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add
s390x/VX stitched chacha20-poly1305 code-path.
* cipher/poly1305-s390x.S: New.
* cipher/poly1305.c (USE_S390X_ASM, HAVE_ASM_POLY1305_BLOCKS): New.
[USE_S390X_ASM] (_gcry_poly1305_s390x_blocks1, poly1305_blocks): New.
* configure.ac (gcry_cv_gcc_inline_asm_s390x): Check for 'risbgn' and
'algrk' instructions.
* tests/basic.c (_check_poly1305_cipher): Add large chacha20-poly1305
test vector.
--
Patch adds Poly1305 and stitched ChaCha20-Poly1305 implementation
for zSeries. Stitched implementation interleaves ChaCha20 and Poly1305
processing for higher instruction level parallelism and better
utilization of execution units.
Benchmark on z15 (4504 Mhz):
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.16 ns/B 823.2 MiB/s 5.22 c/B
POLY1305 dec | 1.16 ns/B 823.2 MiB/s 5.22 c/B
POLY1305 auth | 0.736 ns/B 1295 MiB/s 3.32 c/B
After (chacha20-poly1305 ~71% faster, poly1305 ~29% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.677 ns/B 1409 MiB/s 3.05 c/B
POLY1305 dec | 0.655 ns/B 1456 MiB/s 2.95 c/B
POLY1305 auth | 0.569 ns/B 1675 MiB/s 2.56 c/B
GnuPG-bug-id: 5202
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* cipher/Makefile.am: Add 'asm-common-s390x.h' and 'chacha20-s390x.S'.
* cipher/asm-common-s390x.h: New.
* cipher/chacha20-s390x.S: New.
* cipher/chacha20.c (USE_S390X_VX): New.
(CHACHA20_context_t): Change 'use_*' bit-field to unsigned type; Add
'use_s390x'.
(_gcry_chacha20_s390x_vx_blocks8)
(_gcry_chacha20_s390x_vx_blocks4_2_1): New.
(chacha20_do_setkey): Add HW feature detect for s390x/VX.
(chacha20_blocks, do_chacha20_encrypt_stream_tail): Add s390x/VX
code-path.
* configure.ac: Add 'chacha20-s390x.lo'.
--
Patch adds VX vector instruction set accelerated ChaCha20
implementation for zSeries.
Benchmark on z15 (4504 Mhz):
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 2.62 ns/B 364.0 MiB/s 11.80 c/B
STREAM dec | 2.62 ns/B 363.8 MiB/s 11.81 c/B
After (~5x faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.505 ns/B 1888 MiB/s 2.28 c/B
STREAM dec | 0.506 ns/B 1887 MiB/s 2.28 c/B
GnuPG-bug-id: 5201
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* configure.ac (gcry_cv_gcc_inline_asm_s390x_vx): New check.
* src/g10lib.h (HWF_S390X_VX): New.
* src/hwf-s390x.c (HWCAP_S390_VXRS): New.
(s390x_features) [HAVE_GCC_INLINE_ASM_S390X_VX]: Add VX feature check.
* src/hwfeatures.c (hwlist): Add "s390x-vx".
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* configure.ac: Add 'rijndael-s390x.lo'.
* cipher/Makefile.am: Add 'rijndael-s390x.c'.
* cipher/rijndael-internal.c (USE_S390X_CRYPTO): New.
(RIJNDAEL_context_s) [USE_S390X_CRYPTO]: New 'km*_func' members.
* cipher/rijndael-s390x.c: New.
* cipher/rijndael.c (_gcry_aes_s390x_setup_acceleration)
(_gcry_aes_s390x_setup_setkey)
(_gcry_aes_s390x_setup_prepare_decryption, _gcry_aes_s390x_encrypt)
(_gcry_aes_s390x_decrypt): New.
(do_setkey) [USE_S390X_CRYPTO]: Add s390x acceleration setup.
--
Patchs adds acceleration for single-block AES and following modes:
- CBC, CBC-MAC, CFB, OFB, CTR, XTS and OCB
Benchmarks (z15, 5.2Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.81 ns/B 250.2 MiB/s 19.82 c/B
ECB dec | 4.13 ns/B 231.1 MiB/s 21.46 c/B
CBC enc | 3.69 ns/B 258.5 MiB/s 19.19 c/B
CBC dec | 3.71 ns/B 257.1 MiB/s 19.29 c/B
CFB enc | 3.69 ns/B 258.7 MiB/s 19.17 c/B
CFB dec | 3.56 ns/B 267.8 MiB/s 18.52 c/B
OFB enc | 3.85 ns/B 247.8 MiB/s 20.01 c/B
OFB dec | 3.85 ns/B 247.9 MiB/s 20.01 c/B
CTR enc | 3.65 ns/B 261.6 MiB/s 18.96 c/B
CTR dec | 3.64 ns/B 261.6 MiB/s 18.95 c/B
XTS enc | 3.66 ns/B 260.8 MiB/s 19.02 c/B
XTS dec | 3.75 ns/B 254.2 MiB/s 19.51 c/B
CCM enc | 7.34 ns/B 129.9 MiB/s 38.19 c/B
CCM dec | 7.34 ns/B 129.9 MiB/s 38.19 c/B
CCM auth | 3.70 ns/B 257.6 MiB/s 19.25 c/B
EAX enc | 7.34 ns/B 129.8 MiB/s 38.19 c/B
EAX dec | 7.35 ns/B 129.8 MiB/s 38.20 c/B
EAX auth | 3.70 ns/B 257.8 MiB/s 19.24 c/B
GCM enc | 6.22 ns/B 153.3 MiB/s 32.36 c/B
GCM dec | 6.23 ns/B 153.0 MiB/s 32.42 c/B
GCM auth | 2.59 ns/B 368.9 MiB/s 13.44 c/B
OCB enc | 3.82 ns/B 249.7 MiB/s 19.86 c/B
OCB dec | 3.90 ns/B 244.2 MiB/s 20.31 c/B
OCB auth | 3.88 ns/B 245.5 MiB/s 20.20 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 2.10 ns/B 453.1 MiB/s 10.94 c/B
ECB dec | 2.11 ns/B 453.0 MiB/s 10.95 c/B
CBC enc | 0.182 ns/B 5240 MiB/s 0.946 c/B
CBC dec | 0.044 ns/B 21581 MiB/s 0.230 c/B
CFB enc | 0.206 ns/B 4623 MiB/s 1.07 c/B
CFB dec | 0.140 ns/B 6826 MiB/s 0.727 c/B
OFB enc | 0.183 ns/B 5222 MiB/s 0.950 c/B
OFB dec | 0.182 ns/B 5252 MiB/s 0.944 c/B
CTR enc | 0.059 ns/B 16095 MiB/s 0.308 c/B
CTR dec | 0.059 ns/B 16045 MiB/s 0.309 c/B
XTS enc | 0.043 ns/B 21998 MiB/s 0.225 c/B
XTS dec | 0.043 ns/B 22012 MiB/s 0.225 c/B
CCM enc | 0.239 ns/B 3989 MiB/s 1.24 c/B
CCM dec | 0.239 ns/B 3987 MiB/s 1.24 c/B
CCM auth | 0.180 ns/B 5288 MiB/s 0.938 c/B
EAX enc | 0.242 ns/B 3940 MiB/s 1.26 c/B
EAX dec | 0.243 ns/B 3926 MiB/s 1.26 c/B
EAX auth | 0.183 ns/B 5218 MiB/s 0.950 c/B
GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B
GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B
GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B
OCB enc | 0.186 ns/B 5132 MiB/s 0.966 c/B
OCB dec | 0.176 ns/B 5414 MiB/s 0.916 c/B
OCB auth | 0.149 ns/B 6394 MiB/s 0.776 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_inline_asm_s390x)
(HAVE_CPU_ARCH_S390X): Add s390x detection support.
* mpi/config.links: Add setup for s390x links.
* src/Makefile.am: Add 'hwf-s390x.c'.
* src/g10lib.h (HWF_S390X_MSA, HWF_S390X_MSA_4, HWF_S390X_8): New.
* src/hwf_common.h (_gcry_hwf_detect_s390x): New.
* src/hwf-s390x.c: New.
* src/hwfeatures.c: Add "s390x-msa", "s390x-msa-4" and "s390x-msa-8".
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_asm_elf_directives): New check.
(HAVE_GCC_ASM_ELF_DIRECTIVES): New 'config.h' macro.
* cipher/asm-common-aarch64.h (ELF): Change feature macro check from
__ELF__ to HAVE_GCC_ASM_ELF_DIRECTIVES.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac [*-apple-darwin*] (USE_POSIX_SPAWN_FOR_TESTS): New.
* tests/random.c [USE_POSIX_SPAWN_FOR_TESTS] (run_all_rng_tests): New.
--
GnuPG-bug-id: 5159
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* acinclude.m4 (GNUPG_SYS_SYMBOL_UNDERSCORE): Use AS_MESSAGE_LOG_FD
instead of AC_FD_CC.
(GNUPG_CHECK_MLOCK): Use AC_LINK_IFELSE instead of AC_TRY_LINK.
Use AC_RUN_IFELSE instead of AC_TRY_RUN.
* configure.ac (AC_ISC_POSIX): Replace by AC_SEARCH_LIBS.
Use AC_USE_SYSTEM_EXTENSIONS instead of AC_GNU_SOURCE.
Use AS_HELP_STRING instead of AC_HELP_STRING.
(AC_TYPE_SIGNAL): Remove.
(AC_DECL_SYS_SIGLIST): Remove.
* m4/Makefile.am (EXTRA_DIST): Update.
* m4/onceonly.m4: Remove.
* m4/socklen.m4: Update from gnulib.
* m4/libtool.m4: Update from libgpg-error.
* m4/gpg-error.m4: Update from libgpg-error.
* m4/noexecstack.m4: Use AS_HELP_STRING instead of AC_HELP_STRING.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (byte, ushort, us6, u32, u64): Use AC_CHECK_TYPES.
* cipher/poly1305.c: Use HAVE_TYPE_U64.
* src/hmac256.c: HAVE_TYPE_U32.
* src/types.h: Use HAVE_TYPE_BYTE, HAVE_TYPE_USHORT, HAVE_TYPE_U16,
HAVE_TYPE_U32, and HAVE_TYPE_U64.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_have_asm_volatile_memory): Check also if
assembly memory barrier with input/output register is supported.
* tests/bench-slope.c (auto_ghz_bench): Change to use base operation
that takes two CPU cycles and unroll loop by 1024 operations.
--
CPU frequency is now correctly detected on AWS Graviton CPU (2.3Ghz).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Do not force jentsupport to "n/a" on non-x86
architectures.
--
GnuPG-bug-id: 4966
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'sm4-aesni-avx2-amd64.S'.
* cipher/sm4-aesni-avx2-amd64.S: New.
* cipher/sm4.c (USE_AESNI_AVX2): New.
(SM4_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'.
[USE_AESNI_AVX2] (_gcry_sm4_aesni_avx2_ctr_enc)
(_gcry_sm4_aesni_avx2_cbc_dec, _gcry_sm4_aesni_avx2_cfb_dec)
(_gcry_sm4_aesni_avx2_ocb_enc, _gcry_sm4_aesni_avx2_ocb_dec)
(_gcry_sm4_aesni_avx_ocb_auth): New.
(sm4_setkey): Enable AES-NI/AVX2 if supported by HW.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AESNI_AVX2]: Add
AES-NI/AVX2 bulk functions.
* configure.ac: Add ''sm4-aesni-avx2-amd64.lo'.
--
This patch adds x86-64/AES-NI/AVX2 bulk encryption/decryption. Bulk
functions process 16 blocks in parallel.
Benchmark on AMD Ryzen 7 3700X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.98 ns/B 106.2 MiB/s 38.62 c/B 4300
CBC dec | 1.55 ns/B 613.7 MiB/s 6.64 c/B 4275
CFB enc | 8.96 ns/B 106.4 MiB/s 38.52 c/B 4300
CFB dec | 1.54 ns/B 617.4 MiB/s 6.60 c/B 4275
CTR enc | 1.57 ns/B 607.8 MiB/s 6.75 c/B 4300
CTR dec | 1.57 ns/B 608.9 MiB/s 6.74 c/B 4300
OCB enc | 1.58 ns/B 603.8 MiB/s 6.75 c/B 4275
OCB dec | 1.57 ns/B 605.7 MiB/s 6.73 c/B 4275
OCB auth | 1.53 ns/B 624.5 MiB/s 6.57 c/B 4300
After (~56% faster than AES-NI/AVX impl.):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.93 ns/B 106.8 MiB/s 38.61 c/B 4326
CBC dec | 0.984 ns/B 969.5 MiB/s 4.23 c/B 4300
CFB enc | 8.93 ns/B 106.8 MiB/s 38.62 c/B 4325
CFB dec | 0.983 ns/B 970.3 MiB/s 4.23 c/B 4300
CTR enc | 0.998 ns/B 955.1 MiB/s 4.29 c/B 4300
CTR dec | 0.996 ns/B 957.4 MiB/s 4.28 c/B 4300
OCB enc | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300
OCB dec | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300
OCB auth | 0.993 ns/B 960.2 MiB/s 4.28 c/B 4304±2
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'sm4-aesni-avx-amd64.S'.
* cipher/sm4-aesni-avx-amd64.S: New.
* cipher/sm4.c (USE_AESNI_AVX, ASM_FUNC_ABI): New.
(SM4_context) [USE_AESNI_AVX]: Add 'use_aesni_avx'.
[USE_AESNI_AVX] (_gcry_sm4_aesni_avx_expand_key)
(_gcry_sm4_aesni_avx_crypt_blk1_8, _gcry_sm4_aesni_avx_ctr_enc)
(_gcry_sm4_aesni_avx_cbc_dec, _gcry_sm4_aesni_avx_cfb_dec)
(_gcry_sm4_aesni_avx_ocb_enc, _gcry_sm4_aesni_avx_ocb_dec)
(_gcry_sm4_aesni_avx_ocb_auth, sm4_aesni_avx_crypt_blk1_8): New.
(sm4_expand_key) [USE_AESNI_AVX]: Use AES-NI/AVX key setup.
(sm4_setkey): Enable AES-NI/AVX if supported by HW.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AESNI_AVX]: Add
AES-NI/AVX bulk functions.
* configure.ac: Add ''sm4-aesni-avx-amd64.lo'.
--
This patch adds x86-64/AES-NI/AVX bulk encryption/decryption and key
setup for SM4 cipher. Bulk functions process eight blocks in parallel.
Benchmark on AMD Ryzen 7 3700X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.94 ns/B 106.7 MiB/s 38.66 c/B 4325
CBC dec | 4.78 ns/B 199.7 MiB/s 20.42 c/B 4275
CFB enc | 8.95 ns/B 106.5 MiB/s 38.72 c/B 4325
CFB dec | 4.81 ns/B 198.2 MiB/s 20.57 c/B 4275
CTR enc | 4.81 ns/B 198.2 MiB/s 20.69 c/B 4300
CTR dec | 4.80 ns/B 198.8 MiB/s 20.63 c/B 4300
GCM auth | 0.116 ns/B 8232 MiB/s 0.504 c/B 4351
OCB enc | 4.88 ns/B 195.5 MiB/s 20.86 c/B 4275
OCB dec | 4.85 ns/B 196.6 MiB/s 20.86 c/B 4301
OCB auth | 4.80 ns/B 198.9 MiB/s 20.62 c/B 4301
After (~3.0x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.98 ns/B 106.2 MiB/s 38.62 c/B 4300
CBC dec | 1.55 ns/B 613.7 MiB/s 6.64 c/B 4275
CFB enc | 8.96 ns/B 106.4 MiB/s 38.52 c/B 4300
CFB dec | 1.54 ns/B 617.4 MiB/s 6.60 c/B 4275
CTR enc | 1.57 ns/B 607.8 MiB/s 6.75 c/B 4300
CTR dec | 1.57 ns/B 608.9 MiB/s 6.74 c/B 4300
OCB enc | 1.58 ns/B 603.8 MiB/s 6.75 c/B 4275
OCB dec | 1.57 ns/B 605.7 MiB/s 6.73 c/B 4275
OCB auth | 1.53 ns/B 624.5 MiB/s 6.57 c/B 4300
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
sm4 avx fix
sm4 avx fix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add sm4.c.
* cipher/cipher.c (cipher_list, cipher_list_algo301): Add
_gcry_cipher_spec_sm4.
* cipher/mac-cmac.c (map_mac_algo_to_cipher): Add cmac SM4.
(_gcry_mac_type_spec_cmac_sm4): Add cmac SM4.
* cipher/mac-internal.h: Declare spec_cmac_sm4.
* cipher/mac.c (mac_list, mac_list_algo201): Add cmac SM4.
* cipher/sm4.c: New.
* configure.ac (available_ciphers): Add sm4.
* doc/gcrypt.texi: Add SM4 document.
* src/cipher.h: Add declarations for SM4 and cmac SM4.
* src/gcrypt.h.in (gcry_cipher_algos): Add algorithm ID for SM4.
--
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
[jk: add missing mapping in mac-cmac.c:map_mac_algo_to_cipher]
[jk: add GCRY_MAC_CMAC_SM4 to gcrypt.texi]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (try_asm_modules): Update description,
"MPI" => "MPI and cipher".
(gcry_cv_gcc_arm_platform_as_ok, gcry_cv_gcc_aarch64_platform_as_ok)
(gcry_cv_gcc_inline_asm_ssse3, gcry_cv_gcc_inline_asm_pclmul)
(gcry_cv_gcc_inline_asm_shaext, gcry_cv_gcc_inline_asm_sse41)
(gcry_cv_gcc_inline_asm_avx, gcry_cv_gcc_inline_asm_avx2)
(gcry_cv_gcc_inline_asm_bmi2, gcry_cv_gcc_amd64_platform_as_ok)
(gcry_cv_gcc_platform_as_ok_for_intel_syntax)
(gcry_cv_cc_arm_arch_is_v6, gcry_cv_gcc_inline_asm_neon)
(gcry_cv_gcc_inline_asm_aarch32_crypto)
(gcry_cv_gcc_inline_asm_aarch64_neon)
(gcry_cv_gcc_inline_asm_aarch64_crypto)
(gcry_cv_cc_ppc_altivec, gcry_cv_gcc_inline_asm_ppc_altivec)
(gcry_cv_gcc_inline_asm_ppc_arch_3_00): Check for "try_asm_modules".
* mpi/config.links: Set "mpi_cpu_arch" to "disabled"
with --disable-asm.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/crc-ppc.c (CRC_VEC_U64_LOAD, CRC_VEC_U64_LOAD_LE)
(CRC_VEC_U64_LOAD_BE): Remove vec_vsx_ld usage.
(asm_vec_u64_load, asm_vec_u64_load_le): New.
* cipher/sha512-ppc.c (vec_vshasigma_u64): Use '__asm__' instead of
'asm' for assembly block.
(vec_u64_load, vec_u64_store): New.
(_gcry_sha512_transform_ppc8): Use vec_u64_load/store instead of
vec_vsx_ld/vec_vsx_st.
* configure.ac (gcy_cv_cc_ppc_altivec)
(gcy_cv_cc_ppc_altivec_cflags): Add check for vec_vsx_ld with
'unsigned int *' pointer type.
--
GCC 7.5 and clang 8.0 do not support vec_vsx_ld with 'unsigned long long *'
pointer type. Switch code to use inline assembly instead. As vec_vsx_ld
is still used with 'unsigned int *' pointers, add new check for this in
configure.ac.
GnuPG-bug-id: 4906
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* configure.ac (LIBGCRYPT_CONFIG_LIBS): Remove DL_LIBS.
* src/libgcrypt.c.in: Distinguish static link use case.
* tests/Makefile.am: Fix use of -lgpg-error.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add 'rijndael-ppc9le.lo'.
* cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h'
and 'rijndael-ppc-functions.h'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New.
(RIJNDAEL_context_s): Add 'use_ppc9le_crypto'.
* cipher/rijndael.c (_gcry_aes_ppc9le_encrypt)
(_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc)
(_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc)
(_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec)
(_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth)
(_gcry_aes_ppc9le_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt)
[USE_PPC_CRYPTO_WITH_PPC9LE]: New.
* cipher/rijndael-ppc.c: Split common code to headers
'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'.
* cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'.
(asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New.
* cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'.
(CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2.
(XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm
instruction.
* cipher/rijndael-ppc9le.c: New.
--
Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.
Benchmark on POWER9 (~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B
CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B
CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B
CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B
CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B
CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B
XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B
XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B
OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B
OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B
OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B
After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B
CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B
CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B
CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B
CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B
CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B
XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B
XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B
OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B
OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B
OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|