summaryrefslogtreecommitdiff
path: root/cipher/poly1305.c
Commit message (Collapse)AuthorAgeFilesLines
* Cleanup for type definitions of byte, ushort, u32, and u64.NIIBE Yutaka2022-07-211-1/+1
| | | | | | | | | | * src/types.h: Use macros defined by configure script. * src/hmac256.c: Fix for HAVE_U32. * cipher/poly1305.c: Fix for HAVE_U64. -- Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* ppc: enable P10 assembly with ENABLE_FORCE_SOFT_HWFEATURES on arch-3.00Jussi Kivilinna2022-06-121-2/+10
| | | | | | | | | | | | | | * cipher/chacha20.c (chacha20_do_setkey) [USE_PPC_VEC]: Enable P10 assembly for HWF_PPC_ARCH_3_00 if ENABLE_FORCE_SOFT_HWFEATURES is defined. * cipher/poly1305.c (poly1305_init) [POLY1305_USE_PPC_VEC]: Likewise. * cipher/rijndael.c (do_setkey) [USE_PPC_CRYPTO_WITH_PPC9LE]: Likewise. --- This change allows testing P10 implementations with P9 and with QEMU-PPC. GnuPG-bug-id: 6006 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Chacha20/poly1305 - Optimized chacha20/poly1305 for P10 operationDanny Tsen2022-06-121-2/+31
| | | | | | | | | | | | | | | | | | | * configure.ac: Added chacha20 and poly1305 assembly implementations. * cipher/chacha20-p10le-8x.s: (New) - support 8 blocks (512 bytes) unrolling. * cipher/poly1305-p10le.s: (New) - support 4 blocks (128 bytes) unrolling. * cipher/Makefile.am: Added new chacha20 and poly1305 files. * cipher/chacha20.c: Added PPC p10 le support for 8x chacha20. * cipher/poly1305.c: Added PPC p10 le support for 4x poly1305. * cipher/poly1305-internal.h: Added PPC p10 le support for poly1305. --- GnuPG-bug-id: 6006 Signed-off-by: Danny Tsen <dtsen@us.ibm.com> [jk: cosmetic changes to C code] [jk: fix building on ppc64be] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: add AVX512 implementationJussi Kivilinna2022-04-061-2/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S. * cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'. * cipher/poly1305-amd64-avx512.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX512): New. (poly1305_context_s): Add 'use_avx512'. * cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New. [POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks) (poly1305_amd64_avx512_blocks): New. (poly1305_init): Use AVX512 is HW feature available (set use_avx512). [USE_MPI_64BIT] (poly1305_blocks): Rename to ... [USE_MPI_64BIT] (poly1305_blocks_generic): ... this. [USE_MPI_64BIT] (poly1305_blocks): New. -- Patch adds AMD64 AVX512-FMA52 implementation for Poly1305. Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090 After (5.0x faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: fix building with 'arm-linux-gnueabihf-gcc-11 -O3'Jussi Kivilinna2021-10-251-5/+27
| | | | | | | | | | | | | | | * cipher/poly1305.c [HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS] (ADD_1305_32): Reduce number of register operands. -- Ubuntu 21.10 arm-linux-gnueabihf-gcc gave following error with -O3: poly1305.c: In function '_gcry_poly1305_update_burn': cipher/poly1305.c:425:7: error: 'asm' operand has impossible constraints 425 | ADD_1305_32(h4, h3, h2, h1, h0, m4, m3, m2, m1, m0); | ^ Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: make --disable-asm work on x86, aarch64 and ppcJussi Kivilinna2021-03-031-4/+4
| | | | | | | | | | | | * cipher/poly1305.c [__aarch64__] (ADD_1305_64): Check for HAVE_CPU_ARCH_ARM. [__x86_64__] (ADD_1305_64): Check for HAVE_CPU_ARCH_X86. [__powerpc__] (ADD_1305_64): Check for HAVE_CPU_ARCH_PPC. [__i386__] (ADD_1305_32): Check for HAVE_CPU_ARCH_X86. -- Reported-by: Horst Wente <horst.wente@posteo.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: fix compiling on i386 gcc-4.7Jussi Kivilinna2021-03-031-1/+2
| | | | | | | | | * cipher/poly1305.c [__i386__]: Limit i386 variant of ADD_1305_32 to GCC-5 or newer. -- Reported-by: Horst Wente <horst.wente@posteo.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add s390x/zSeries implementation of Poly1305cipher-s390x-optimizationsJussi Kivilinna2020-12-301-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'poly1305-s390x.S' and 'asm-poly1305-s390x.h'. * cipher/asm-poly1305-s390x.h: New * cipher/chacha20-s390x.S (_gcry_chacha20_poly1305_s390x_vx_blocks8) (_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New, stitched chacha20-poly1305 implementation. * cipher/chacha20.c (USE_S390X_VX_POLY1305): New. (_gcry_chacha20_poly1305_s390x_vx_blocks8) (_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New prototypes. (_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add s390x/VX stitched chacha20-poly1305 code-path. * cipher/poly1305-s390x.S: New. * cipher/poly1305.c (USE_S390X_ASM, HAVE_ASM_POLY1305_BLOCKS): New. [USE_S390X_ASM] (_gcry_poly1305_s390x_blocks1, poly1305_blocks): New. * configure.ac (gcry_cv_gcc_inline_asm_s390x): Check for 'risbgn' and 'algrk' instructions. * tests/basic.c (_check_poly1305_cipher): Add large chacha20-poly1305 test vector. -- Patch adds Poly1305 and stitched ChaCha20-Poly1305 implementation for zSeries. Stitched implementation interleaves ChaCha20 and Poly1305 processing for higher instruction level parallelism and better utilization of execution units. Benchmark on z15 (4504 Mhz): Before: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.16 ns/B 823.2 MiB/s 5.22 c/B POLY1305 dec | 1.16 ns/B 823.2 MiB/s 5.22 c/B POLY1305 auth | 0.736 ns/B 1295 MiB/s 3.32 c/B After (chacha20-poly1305 ~71% faster, poly1305 ~29% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.677 ns/B 1409 MiB/s 3.05 c/B POLY1305 dec | 0.655 ns/B 1456 MiB/s 2.95 c/B POLY1305 auth | 0.569 ns/B 1675 MiB/s 2.56 c/B GnuPG-bug-id: 5202 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* build: Use modern Autoconf check for type.NIIBE Yutaka2020-11-181-1/+1
| | | | | | | | | | * configure.ac (byte, ushort, us6, u32, u64): Use AC_CHECK_TYPES. * cipher/poly1305.c: Use HAVE_TYPE_U64. * src/hmac256.c: HAVE_TYPE_U32. * src/types.h: Use HAVE_TYPE_BYTE, HAVE_TYPE_USHORT, HAVE_TYPE_U16, HAVE_TYPE_U32, and HAVE_TYPE_U64. Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* poly1305: add fast addition macro for ppc64Jussi Kivilinna2019-09-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/poly1305.c [USE_MPI_64BIT && __powerpc__] (ADD_1305_64): New. -- Benchmark on POWER8 (ppc64le, ~3.8Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.547 ns/B 1742 MiB/s 2.08 c/B After (~8% faster): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.502 ns/B 1901 MiB/s 1.91 c/B Benchmark on POWER9 (ppc64le, ~3.8Ghz): Before: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.493 ns/B 1934 MiB/s 1.87 c/B After (~7% faster): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.459 ns/B 2077 MiB/s 1.74 c/B GnuPG-bug-id: 4460 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Use memset instead of setting buffers byte by byteJussi Kivilinna2019-03-231-4/+12
| | | | | | | | | * cipher/cipher-ccm.c (do_cbc_mac): Replace buffer setting loop with memset call. * cipher/cipher-gcm.c (do_ghash_buf): Ditto. * cipher/poly1305.c (poly1305_final): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementationsJussi Kivilinna2019-01-271-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-poly1305-amd64.h: New. * cipher/Makefile.am: Add 'asm-poly1305-amd64.h'. * cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave operators. (_gcry_chacha20_poly1305_amd64_avx2_blocks8): New. * cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave operators. (_gcry_chacha20_poly1305_amd64_ssse3_blocks4) (_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New. * cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4) (_gcry_chacha20_poly1305_amd64_ssse3_blocks1) (_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes. (chacha20_encrypt_stream): Split tail to... (do_chacha20_encrypt_stream_tail): ... new function. (_gcry_chacha20_poly1305_encrypt) (_gcry_chacha20_poly1305_decrypt): New. * cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt) (_gcry_chacha20_poly1305_decrypt): New prototypes. * cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call '_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20. (_gcry_cipher_poly1305_decrypt): Call '_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20. * cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New prototype. * cipher/poly1305.c (poly1305_blocks): Make static. (_gcry_poly1305_update): Split main function body to ... (_gcry_poly1305_update_burn): ... new function. -- Benchmark on Intel Skylake (i5-6500, 3200 Mhz): Before, 8-way AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.378 ns/B 2526 MiB/s 1.21 c/B STREAM dec | 0.373 ns/B 2560 MiB/s 1.19 c/B POLY1305 enc | 0.685 ns/B 1392 MiB/s 2.19 c/B POLY1305 dec | 0.686 ns/B 1390 MiB/s 2.20 c/B POLY1305 auth | 0.315 ns/B 3031 MiB/s 1.01 c/B After, 8-way AVX2 (~36% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.503 ns/B 1896 MiB/s 1.61 c/B POLY1305 dec | 0.485 ns/B 1965 MiB/s 1.55 c/B Benchmark on Intel Haswell (i7-4790K, 3998 Mhz): Before, 8-way AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.318 ns/B 2999 MiB/s 1.27 c/B STREAM dec | 0.317 ns/B 3004 MiB/s 1.27 c/B POLY1305 enc | 0.586 ns/B 1627 MiB/s 2.34 c/B POLY1305 dec | 0.586 ns/B 1627 MiB/s 2.34 c/B POLY1305 auth | 0.271 ns/B 3524 MiB/s 1.08 c/B After, 8-way AVX2 (~30% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.452 ns/B 2108 MiB/s 1.81 c/B POLY1305 dec | 0.440 ns/B 2167 MiB/s 1.76 c/B Before, 4-way SSSE3: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.627 ns/B 1521 MiB/s 2.51 c/B STREAM dec | 0.626 ns/B 1523 MiB/s 2.50 c/B POLY1305 enc | 0.895 ns/B 1065 MiB/s 3.58 c/B POLY1305 dec | 0.896 ns/B 1064 MiB/s 3.58 c/B POLY1305 auth | 0.271 ns/B 3521 MiB/s 1.08 c/B After, 4-way SSSE3 (~20% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.733 ns/B 1301 MiB/s 2.93 c/B POLY1305 dec | 0.726 ns/B 1314 MiB/s 2.90 c/B Before, 1-way SSSE3: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.56 ns/B 609.6 MiB/s 6.25 c/B POLY1305 dec | 1.56 ns/B 609.4 MiB/s 6.26 c/B After, 1-way SSSE3 (~18% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.31 ns/B 725.4 MiB/s 5.26 c/B POLY1305 dec | 1.31 ns/B 727.3 MiB/s 5.24 c/B For comparison to other libraries (on Intel i7-4790K, 3998 Mhz): bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018 Cipher: chacha20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.301 ns/B 3166.4 MiB/s 1.20 c/B STREAM dec | 0.300 ns/B 3174.7 MiB/s 1.20 c/B POLY1305 enc | 0.463 ns/B 2060.6 MiB/s 1.85 c/B POLY1305 dec | 0.462 ns/B 2063.8 MiB/s 1.85 c/B POLY1305 auth | 0.162 ns/B 5899.3 MiB/s 0.646 c/B bench-slope-nettle: Nettle 3.4 Cipher: chacha | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.65 ns/B 578.2 MiB/s 6.59 c/B STREAM dec | 1.65 ns/B 578.2 MiB/s 6.59 c/B POLY1305 enc | 2.05 ns/B 464.8 MiB/s 8.20 c/B POLY1305 dec | 2.05 ns/B 464.7 MiB/s 8.20 c/B POLY1305 auth | 0.404 ns/B 2359.1 MiB/s 1.62 c/B bench-slope-botan: Botan 2.6.0 Cipher: ChaCha | nanosecs/byte mebibytes/sec cycles/byte STREAM enc/dec | 0.855 ns/B 1116.0 MiB/s 3.42 c/B POLY1305 enc | 1.60 ns/B 595.4 MiB/s 6.40 c/B POLY1305 dec | 1.60 ns/B 595.8 MiB/s 6.40 c/B POLY1305 auth | 0.752 ns/B 1268.3 MiB/s 3.01 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: silence compiler warning on clang/aarch64Jussi Kivilinna2018-03-281-1/+1
| | | | | | | | | | | * cipher/poly1305.c (MUL_MOD_1305_64): cast zero constant to 64-bits. -- This patch fixes "value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]" warnings when building with clang/aarch64. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* New Poly1305 implementationsJussi Kivilinna2018-01-091-341/+365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Include '../mpi' for 'longlong.h'; Remove 'poly1305-sse2-amd64.S', 'poly1305-avx2-amd64.S' and 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: Remove. * cipher/poly1305-avx2-amd64.S: Remove. * cipher/poly1305-sse2-amd64.S: Remove. * cipher/poly1305-internal.h (POLY1305_BLOCKSIZE) (POLY1305_STATE): New. (POLY1305_SYSV_FUNC_ABI, POLY1305_REF_BLOCKSIZE) (POLY1305_REF_STATESIZE, POLY1305_REF_ALIGNMENT) (POLY1305_USE_SSE2, POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT, POLY1305_USE_AVX2, POLY1305_AVX2_BLOCKSIZE) (POLY1305_AVX2_STATESIZE, POLY1305_AVX2_ALIGNMENT) (POLY1305_USE_NEON, POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT, POLY1305_LARGEST_BLOCKSIZE) (POLY1305_LARGEST_STATESIZE, POLY1305_LARGEST_ALIGNMENT) (POLY1305_STATE_BLOCKSIZE, POLY1305_STATE_STATESIZE) (POLY1305_STATE_ALIGNMENT, OPS_FUNC_ABI, poly1305_key_s) (poly1305_ops_s): Remove. (poly1305_context_s): Rewrite. * cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops) (poly1305_init_ext_ref32, poly1305_blocks_ref32) (poly1305_finish_ext_ref32, poly1305_default_ops) (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks) (poly1305_amd64_avx2_ops, poly1305_get_state): Remove. (poly1305_init): Rewrite. (USE_MPI_64BIT, USE_MPI_32BIT): New. [USE_MPI_64BIT] (ADD_1305_64, MUL_MOD_1305_64, poly1305_blocks) (poly1305_final): New implementation using 64-bit limbs. [USE_MPI_32BIT] (UMUL_ADD_32, ADD_1305_32, MUL_MOD_1305_32) (poly1305_blocks): New implementation using 32-bit limbs. (_gcry_poly1305_update, _gcry_poly1305_finish) (_gcry_poly1305_init): Adapt to new implementation. * configure.ac: Remove 'poly1305-sse2-amd64.lo', 'poly1305-avx2-amd64.lo' and 'poly1305-armv7-neon.lo'. -- Intel Core i7-4790K CPU @ 4.00GHz (x86_64): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.284 ns/B 3358.6 MiB/s 1.14 c/B Intel Core i7-4790K CPU @ 4.00GHz (i386): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.888 ns/B 1073.9 MiB/s 3.55 c/B Cortex-A53 @ 1152Mhz (armv7): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 4.40 ns/B 216.7 MiB/s 5.07 c/B Cortex-A53 @ 1152Mhz (aarch64): | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 2.60 ns/B 367.0 MiB/s 2.99 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: Fix compiler warnings.Werner Koch2017-05-231-8/+24
| | | | | | | | | | * cipher/poly1305.c (poly1305_default_ops): Move to the top. Add prototypes and compile only if USE_SSE2 is not defined. (poly1305_init_ext_ref32): Compile only if USE_SSE2 is not defined. (poly1305_blocks_ref32): Ditto. (poly1305_finish_ext_ref32): Ditto. Signed-off-by: Werner Koch <wk@gnupg.org>
* Always require a 64 bit integer typeWerner Koch2016-03-181-214/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (available_digests_64): Merge with available_digests. (available_kdfs_64): Merge with available_kdfs. <64 bit datatype test>: Bail out if no such type is available. * src/types.h: Emit #error if no u64 can be defined. (PROPERLY_ALIGNED_TYPE): Always add u64 type. * cipher/bithelp.h: Remove all code paths which handle the case of !HAVE_U64_TYPEDEF. * cipher/bufhelp.h: Ditto. * cipher/cipher-ccm.c: Ditto. * cipher/cipher-gcm.c: Ditto. * cipher/cipher-internal.h: Ditto. * cipher/cipher.c: Ditto. * cipher/hash-common.h: Ditto. * cipher/md.c: Ditto. * cipher/poly1305.c: Ditto. * cipher/scrypt.c: Ditto. * cipher/tiger.c: Ditto. * src/g10lib.h: Ditto. * tests/basic.c: Ditto. * tests/bench-slope.c: Ditto. * tests/benchmark.c: Ditto. -- Given that SHA-2 and some other algorithms require a 64 bit type it does not make anymore sense to conditionally compile some part when the platform does not provide such a type. GnuPG-bug-id: 1815. Signed-off-by: Werner Koch <wk@gnupg.org>
* Enable AMD64 Poly1305 implementations on WIN64Jussi Kivilinna2015-05-141-15/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/poly1305-avx2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-sse2-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/poly1305-internal.h (POLY1305_SYSV_FUNC_ABI): New. (POLY1305_USE_SSE2, POLY1305_USE_AVX2): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (OPS_FUNC_ABI): New. (poly1305_ops_t): Use OPS_FUNC_ABI. * cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, _gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, _gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_init_ext_ref32) (poly1305_blocks_ref32, poly1305_finish_ext_ref32) (poly1305_init_ext_ref8, poly1305_blocks_ref8) (poly1305_finish_ext_ref8): Use OPS_FUNC_ABI. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARM/NEON implementation of Poly1305Jussi Kivilinna2014-11-021-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: New. * cipher/poly1305-internal.h (POLY1305_USE_NEON) (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT): New. * cipher/poly1305.c [POLY1305_USE_NEON] (_gcry_poly1305_armv7_neon_init_ext) (_gcry_poly1305_armv7_neon_finish_ext) (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New. (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation if HWF_ARM_NEON set. * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'. -- Add Andrew Moon's public domain NEON implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmark on Cortex-A8 (--cpu-mhz 1008): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 12.34 ns/B 77.27 MiB/s 12.44 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 2.12 ns/B 450.7 MiB/s 2.13 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: add AMD64/AVX2 optimized implementationJussi Kivilinna2014-05-161-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'. * cipher/poly1305-avx2-amd64.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX2) (POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE) (POLY1305_AVX2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed. * cipher/poly1305.c [POLY1305_USE_AVX2] (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New. (_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if AVX2 supported by CPU. * configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'. -- Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.205 ns/B 4643.5 MiB/s 0.657 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: add AMD64/SSE2 optimized implementationJussi Kivilinna2014-05-121-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'. * cipher/poly1305-internal.h (POLY1305_USE_SSE2) (POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT): New. (POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE) (POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed. * cipher/poly1305-sse2-amd64.S: New. * cipher/poly1305.c [POLY1305_USE_SSE2] (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New. (_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version. * configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'. -- Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original source is available at: https://github.com/floodyberry/poly1305-opt Benchmarks on Intel i5-4570 (haswell): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B Benchmarks on Intel i5-2450M (sandy-bridge): Old: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B New: | nanosecs/byte mebibytes/sec cycles/byte POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add Poly1305 MACJussi Kivilinna2014-05-121-0/+766
* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and 'poly1305-internal.h'. * cipher/mac-internal.h (poly1305mac_context_s): New. (gcry_mac_handle): Add 'u.poly1305mac'. (_gcry_mac_type_spec_poly1305mac): New. * cipher/mac-poly1305.c: New. * cipher/mac.c (mac_list): Add Poly1305. * cipher/poly1305-internal.h: New. * cipher/poly1305.c: New. * src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'. * tests/basic.c (check_mac): Add Poly1035 test vectors; Allow overriding lengths of data and key buffers. * tests/bench-slope.c (mac_bench): Increase max algo number from 500 to 600. * tests/benchmark.c (mac_bench): Ditto. -- Patch adds Bernstein's Poly1305 message authentication code to libgcrypt. Implementation is based on Andrew Moon's public domain implementation from: https://github.com/floodyberry/poly1305-opt The algorithm added by this patch is the plain Poly1305 without AES and takes 32-bit key that must not be reused. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>