| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
* src/types.h: Use macros defined by configure script.
* src/hmac256.c: Fix for HAVE_U32.
* cipher/poly1305.c: Fix for HAVE_U64.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/chacha20.c (chacha20_do_setkey) [USE_PPC_VEC]: Enable
P10 assembly for HWF_PPC_ARCH_3_00 if ENABLE_FORCE_SOFT_HWFEATURES is
defined.
* cipher/poly1305.c (poly1305_init) [POLY1305_USE_PPC_VEC]: Likewise.
* cipher/rijndael.c (do_setkey) [USE_PPC_CRYPTO_WITH_PPC9LE]: Likewise.
---
This change allows testing P10 implementations with P9 and with QEMU-PPC.
GnuPG-bug-id: 6006
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Added chacha20 and poly1305 assembly implementations.
* cipher/chacha20-p10le-8x.s: (New) - support 8 blocks (512 bytes)
unrolling.
* cipher/poly1305-p10le.s: (New) - support 4 blocks (128 bytes)
unrolling.
* cipher/Makefile.am: Added new chacha20 and poly1305 files.
* cipher/chacha20.c: Added PPC p10 le support for 8x chacha20.
* cipher/poly1305.c: Added PPC p10 le support for 4x poly1305.
* cipher/poly1305-internal.h: Added PPC p10 le support for poly1305.
---
GnuPG-bug-id: 6006
Signed-off-by: Danny Tsen <dtsen@us.ibm.com>
[jk: cosmetic changes to C code]
[jk: fix building on ppc64be]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S.
* cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'.
* cipher/poly1305-amd64-avx512.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX512): New.
(poly1305_context_s): Add 'use_avx512'.
* cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New.
[POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks)
(poly1305_amd64_avx512_blocks): New.
(poly1305_init): Use AVX512 is HW feature available (set use_avx512).
[USE_MPI_64BIT] (poly1305_blocks): Rename to ...
[USE_MPI_64BIT] (poly1305_blocks_generic): ... this.
[USE_MPI_64BIT] (poly1305_blocks): New.
--
Patch adds AMD64 AVX512-FMA52 implementation for Poly1305.
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090
After (5.0x faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c [HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS]
(ADD_1305_32): Reduce number of register operands.
--
Ubuntu 21.10 arm-linux-gnueabihf-gcc gave following error with -O3:
poly1305.c: In function '_gcry_poly1305_update_burn':
cipher/poly1305.c:425:7: error: 'asm' operand has impossible constraints
425 | ADD_1305_32(h4, h3, h2, h1, h0, m4, m3, m2, m1, m0);
| ^
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c [__aarch64__] (ADD_1305_64): Check for
HAVE_CPU_ARCH_ARM.
[__x86_64__] (ADD_1305_64): Check for HAVE_CPU_ARCH_X86.
[__powerpc__] (ADD_1305_64): Check for HAVE_CPU_ARCH_PPC.
[__i386__] (ADD_1305_32): Check for HAVE_CPU_ARCH_X86.
--
Reported-by: Horst Wente <horst.wente@posteo.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c [__i386__]: Limit i386 variant of ADD_1305_32 to
GCC-5 or newer.
--
Reported-by: Horst Wente <horst.wente@posteo.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'poly1305-s390x.S' and
'asm-poly1305-s390x.h'.
* cipher/asm-poly1305-s390x.h: New
* cipher/chacha20-s390x.S (_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New, stitched
chacha20-poly1305 implementation.
* cipher/chacha20.c (USE_S390X_VX_POLY1305): New.
(_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New prototypes.
(_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add
s390x/VX stitched chacha20-poly1305 code-path.
* cipher/poly1305-s390x.S: New.
* cipher/poly1305.c (USE_S390X_ASM, HAVE_ASM_POLY1305_BLOCKS): New.
[USE_S390X_ASM] (_gcry_poly1305_s390x_blocks1, poly1305_blocks): New.
* configure.ac (gcry_cv_gcc_inline_asm_s390x): Check for 'risbgn' and
'algrk' instructions.
* tests/basic.c (_check_poly1305_cipher): Add large chacha20-poly1305
test vector.
--
Patch adds Poly1305 and stitched ChaCha20-Poly1305 implementation
for zSeries. Stitched implementation interleaves ChaCha20 and Poly1305
processing for higher instruction level parallelism and better
utilization of execution units.
Benchmark on z15 (4504 Mhz):
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.16 ns/B 823.2 MiB/s 5.22 c/B
POLY1305 dec | 1.16 ns/B 823.2 MiB/s 5.22 c/B
POLY1305 auth | 0.736 ns/B 1295 MiB/s 3.32 c/B
After (chacha20-poly1305 ~71% faster, poly1305 ~29% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.677 ns/B 1409 MiB/s 3.05 c/B
POLY1305 dec | 0.655 ns/B 1456 MiB/s 2.95 c/B
POLY1305 auth | 0.569 ns/B 1675 MiB/s 2.56 c/B
GnuPG-bug-id: 5202
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (byte, ushort, us6, u32, u64): Use AC_CHECK_TYPES.
* cipher/poly1305.c: Use HAVE_TYPE_U64.
* src/hmac256.c: HAVE_TYPE_U32.
* src/types.h: Use HAVE_TYPE_BYTE, HAVE_TYPE_USHORT, HAVE_TYPE_U16,
HAVE_TYPE_U32, and HAVE_TYPE_U64.
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c [USE_MPI_64BIT && __powerpc__] (ADD_1305_64): New.
--
Benchmark on POWER8 (ppc64le, ~3.8Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.547 ns/B 1742 MiB/s 2.08 c/B
After (~8% faster):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.502 ns/B 1901 MiB/s 1.91 c/B
Benchmark on POWER9 (ppc64le, ~3.8Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.493 ns/B 1934 MiB/s 1.87 c/B
After (~7% faster):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.459 ns/B 2077 MiB/s 1.74 c/B
GnuPG-bug-id: 4460
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer setting loop with memset call.
* cipher/cipher-gcm.c (do_ghash_buf): Ditto.
* cipher/poly1305.c (poly1305_final): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-poly1305-amd64.h: New.
* cipher/Makefile.am: Add 'asm-poly1305-amd64.h'.
* cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New.
* cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes.
(chacha20_encrypt_stream): Split tail to...
(do_chacha20_encrypt_stream_tail): ... new function.
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New.
* cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New prototypes.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call
'_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20.
(_gcry_cipher_poly1305_decrypt): Call
'_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20.
* cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New
prototype.
* cipher/poly1305.c (poly1305_blocks): Make static.
(_gcry_poly1305_update): Split main function body to ...
(_gcry_poly1305_update_burn): ... new function.
--
Benchmark on Intel Skylake (i5-6500, 3200 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.378 ns/B 2526 MiB/s 1.21 c/B
STREAM dec | 0.373 ns/B 2560 MiB/s 1.19 c/B
POLY1305 enc | 0.685 ns/B 1392 MiB/s 2.19 c/B
POLY1305 dec | 0.686 ns/B 1390 MiB/s 2.20 c/B
POLY1305 auth | 0.315 ns/B 3031 MiB/s 1.01 c/B
After, 8-way AVX2 (~36% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.503 ns/B 1896 MiB/s 1.61 c/B
POLY1305 dec | 0.485 ns/B 1965 MiB/s 1.55 c/B
Benchmark on Intel Haswell (i7-4790K, 3998 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.318 ns/B 2999 MiB/s 1.27 c/B
STREAM dec | 0.317 ns/B 3004 MiB/s 1.27 c/B
POLY1305 enc | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 dec | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 auth | 0.271 ns/B 3524 MiB/s 1.08 c/B
After, 8-way AVX2 (~30% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.452 ns/B 2108 MiB/s 1.81 c/B
POLY1305 dec | 0.440 ns/B 2167 MiB/s 1.76 c/B
Before, 4-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.627 ns/B 1521 MiB/s 2.51 c/B
STREAM dec | 0.626 ns/B 1523 MiB/s 2.50 c/B
POLY1305 enc | 0.895 ns/B 1065 MiB/s 3.58 c/B
POLY1305 dec | 0.896 ns/B 1064 MiB/s 3.58 c/B
POLY1305 auth | 0.271 ns/B 3521 MiB/s 1.08 c/B
After, 4-way SSSE3 (~20% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.733 ns/B 1301 MiB/s 2.93 c/B
POLY1305 dec | 0.726 ns/B 1314 MiB/s 2.90 c/B
Before, 1-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.56 ns/B 609.6 MiB/s 6.25 c/B
POLY1305 dec | 1.56 ns/B 609.4 MiB/s 6.26 c/B
After, 1-way SSSE3 (~18% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.31 ns/B 725.4 MiB/s 5.26 c/B
POLY1305 dec | 1.31 ns/B 727.3 MiB/s 5.24 c/B
For comparison to other libraries (on Intel i7-4790K, 3998 Mhz):
bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018
Cipher:
chacha20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.301 ns/B 3166.4 MiB/s 1.20 c/B
STREAM dec | 0.300 ns/B 3174.7 MiB/s 1.20 c/B
POLY1305 enc | 0.463 ns/B 2060.6 MiB/s 1.85 c/B
POLY1305 dec | 0.462 ns/B 2063.8 MiB/s 1.85 c/B
POLY1305 auth | 0.162 ns/B 5899.3 MiB/s 0.646 c/B
bench-slope-nettle: Nettle 3.4
Cipher:
chacha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.65 ns/B 578.2 MiB/s 6.59 c/B
STREAM dec | 1.65 ns/B 578.2 MiB/s 6.59 c/B
POLY1305 enc | 2.05 ns/B 464.8 MiB/s 8.20 c/B
POLY1305 dec | 2.05 ns/B 464.7 MiB/s 8.20 c/B
POLY1305 auth | 0.404 ns/B 2359.1 MiB/s 1.62 c/B
bench-slope-botan: Botan 2.6.0
Cipher:
ChaCha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc/dec | 0.855 ns/B 1116.0 MiB/s 3.42 c/B
POLY1305 enc | 1.60 ns/B 595.4 MiB/s 6.40 c/B
POLY1305 dec | 1.60 ns/B 595.8 MiB/s 6.40 c/B
POLY1305 auth | 0.752 ns/B 1268.3 MiB/s 3.01 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c (MUL_MOD_1305_64): cast zero constant to 64-bits.
--
This patch fixes "value size does not match register size specified
by the constraint and modifier [-Wasm-operand-widths]" warnings when
building with clang/aarch64.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Include '../mpi' for 'longlong.h'; Remove
'poly1305-sse2-amd64.S', 'poly1305-avx2-amd64.S' and
'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: Remove.
* cipher/poly1305-avx2-amd64.S: Remove.
* cipher/poly1305-sse2-amd64.S: Remove.
* cipher/poly1305-internal.h (POLY1305_BLOCKSIZE)
(POLY1305_STATE): New.
(POLY1305_SYSV_FUNC_ABI, POLY1305_REF_BLOCKSIZE)
(POLY1305_REF_STATESIZE, POLY1305_REF_ALIGNMENT)
(POLY1305_USE_SSE2, POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT, POLY1305_USE_AVX2, POLY1305_AVX2_BLOCKSIZE)
(POLY1305_AVX2_STATESIZE, POLY1305_AVX2_ALIGNMENT)
(POLY1305_USE_NEON, POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT, POLY1305_LARGEST_BLOCKSIZE)
(POLY1305_LARGEST_STATESIZE, POLY1305_LARGEST_ALIGNMENT)
(POLY1305_STATE_BLOCKSIZE, POLY1305_STATE_STATESIZE)
(POLY1305_STATE_ALIGNMENT, OPS_FUNC_ABI, poly1305_key_s)
(poly1305_ops_s): Remove.
(poly1305_context_s): Rewrite.
* cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops)
(poly1305_init_ext_ref32, poly1305_blocks_ref32)
(poly1305_finish_ext_ref32, poly1305_default_ops)
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks)
(poly1305_amd64_avx2_ops, poly1305_get_state): Remove.
(poly1305_init): Rewrite.
(USE_MPI_64BIT, USE_MPI_32BIT): New.
[USE_MPI_64BIT] (ADD_1305_64, MUL_MOD_1305_64, poly1305_blocks)
(poly1305_final): New implementation using 64-bit limbs.
[USE_MPI_32BIT] (UMUL_ADD_32, ADD_1305_32, MUL_MOD_1305_32)
(poly1305_blocks): New implementation using 32-bit limbs.
(_gcry_poly1305_update, _gcry_poly1305_finish)
(_gcry_poly1305_init): Adapt to new implementation.
* configure.ac: Remove 'poly1305-sse2-amd64.lo',
'poly1305-avx2-amd64.lo' and 'poly1305-armv7-neon.lo'.
--
Intel Core i7-4790K CPU @ 4.00GHz (x86_64):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.284 ns/B 3358.6 MiB/s 1.14 c/B
Intel Core i7-4790K CPU @ 4.00GHz (i386):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.888 ns/B 1073.9 MiB/s 3.55 c/B
Cortex-A53 @ 1152Mhz (armv7):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 4.40 ns/B 216.7 MiB/s 5.07 c/B
Cortex-A53 @ 1152Mhz (aarch64):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.60 ns/B 367.0 MiB/s 2.99 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305.c (poly1305_default_ops): Move to the top. Add
prototypes and compile only if USE_SSE2 is not defined.
(poly1305_init_ext_ref32): Compile only if USE_SSE2 is not defined.
(poly1305_blocks_ref32): Ditto.
(poly1305_finish_ext_ref32): Ditto.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (available_digests_64): Merge with available_digests.
(available_kdfs_64): Merge with available_kdfs.
<64 bit datatype test>: Bail out if no such type is available.
* src/types.h: Emit #error if no u64 can be defined.
(PROPERLY_ALIGNED_TYPE): Always add u64 type.
* cipher/bithelp.h: Remove all code paths which handle the
case of !HAVE_U64_TYPEDEF.
* cipher/bufhelp.h: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-internal.h: Ditto.
* cipher/cipher.c: Ditto.
* cipher/hash-common.h: Ditto.
* cipher/md.c: Ditto.
* cipher/poly1305.c: Ditto.
* cipher/scrypt.c: Ditto.
* cipher/tiger.c: Ditto.
* src/g10lib.h: Ditto.
* tests/basic.c: Ditto.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
--
Given that SHA-2 and some other algorithms require a 64 bit type it
does not make anymore sense to conditionally compile some part when
the platform does not provide such a type.
GnuPG-bug-id: 1815.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/poly1305-avx2-amd64.S: Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(ELF): New macro to mask lines with ELF specific commands.
* cipher/poly1305-sse2-amd64.S: Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(ELF): New macro to mask lines with ELF specific commands.
* cipher/poly1305-internal.h (POLY1305_SYSV_FUNC_ABI): New.
(POLY1305_USE_SSE2, POLY1305_USE_AVX2): Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(OPS_FUNC_ABI): New.
(poly1305_ops_t): Use OPS_FUNC_ABI.
* cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, _gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, _gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_init_ext_ref32)
(poly1305_blocks_ref32, poly1305_finish_ext_ref32)
(poly1305_init_ext_ref8, poly1305_blocks_ref8)
(poly1305_finish_ext_ref8): Use OPS_FUNC_ABI.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_NEON)
(POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT): New.
* cipher/poly1305.c [POLY1305_USE_NEON]
(_gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
if HWF_ARM_NEON set.
* configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
--
Add Andrew Moon's public domain NEON implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmark on Cortex-A8 (--cpu-mhz 1008):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'.
* cipher/poly1305-avx2-amd64.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX2)
(POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE)
(POLY1305_AVX2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed.
* cipher/poly1305.c [POLY1305_USE_AVX2]
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if
AVX2 supported by CPU.
* configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'.
--
Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.
--
Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt
Benchmarks on Intel i5-4570 (haswell):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B
Benchmarks on Intel i5-2450M (sandy-bridge):
Old:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B
New:
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and
'poly1305-internal.h'.
* cipher/mac-internal.h (poly1305mac_context_s): New.
(gcry_mac_handle): Add 'u.poly1305mac'.
(_gcry_mac_type_spec_poly1305mac): New.
* cipher/mac-poly1305.c: New.
* cipher/mac.c (mac_list): Add Poly1305.
* cipher/poly1305-internal.h: New.
* cipher/poly1305.c: New.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'.
* tests/basic.c (check_mac): Add Poly1035 test vectors; Allow
overriding lengths of data and key buffers.
* tests/bench-slope.c (mac_bench): Increase max algo number from 500 to
600.
* tests/benchmark.c (mac_bench): Ditto.
--
Patch adds Bernstein's Poly1305 message authentication code to libgcrypt.
Implementation is based on Andrew Moon's public domain implementation
from: https://github.com/floodyberry/poly1305-opt
The algorithm added by this patch is the plain Poly1305 without AES and
takes 32-bit key that must not be reused.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|