| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 'cipher/keccak-amd64-avx512.S'.
* configure.ac: Add 'keccak-amd64-avx512.lo'.
* cipher/Makefile.am: Add 'keccak-amd64-avx512.S'.
* cipher/keccak-amd64-avx512.S: New.
* cipher/keccak.c (USE_64BIT_AVX512, ASM_FUNC_ABI): New.
[USE_64BIT_AVX512] (_gcry_keccak_f1600_state_permute64_avx512)
(_gcry_keccak_absorb_blocks_avx512, keccak_f1600_state_permute64_avx512)
(keccak_absorb_lanes64_avx512, keccak_avx512_64_ops): New.
(keccak_init) [USE_64BIT_AVX512]: Enable x86-64 AVX512 implementation
if supported by HW features.
--
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before (BMI2 instructions):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA3-224 | 1.77 ns/B 540.3 MiB/s 7.22 c/B 4088
SHA3-256 | 1.86 ns/B 514.0 MiB/s 7.59 c/B 4089
SHA3-384 | 2.43 ns/B 393.1 MiB/s 9.92 c/B 4089
SHA3-512 | 3.49 ns/B 273.2 MiB/s 14.27 c/B 4088
SHAKE128 | 1.52 ns/B 629.1 MiB/s 6.20 c/B 4089
SHAKE256 | 1.86 ns/B 511.6 MiB/s 7.62 c/B 4089
After (~33% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA3-224 | 1.32 ns/B 721.8 MiB/s 5.40 c/B 4089
SHA3-256 | 1.40 ns/B 681.7 MiB/s 5.72 c/B 4089
SHA3-384 | 1.83 ns/B 522.5 MiB/s 7.46 c/B 4089
SHA3-512 | 2.63 ns/B 362.1 MiB/s 10.77 c/B 4088
SHAKE128 | 1.13 ns/B 840.4 MiB/s 4.64 c/B 4089
SHAKE256 | 1.40 ns/B 682.1 MiB/s 5.72 c/B 4089
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S.
* cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'.
* cipher/poly1305-amd64-avx512.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX512): New.
(poly1305_context_s): Add 'use_avx512'.
* cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New.
[POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks)
(poly1305_amd64_avx512_blocks): New.
(poly1305_init): Use AVX512 is HW feature available (set use_avx512).
[USE_MPI_64BIT] (poly1305_blocks): Rename to ...
[USE_MPI_64BIT] (poly1305_blocks_generic): ... this.
[USE_MPI_64BIT] (poly1305_blocks): New.
--
Patch adds AMD64 AVX512-FMA52 implementation for Poly1305.
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090
After (5.0x faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 'cipher/sha512-avx512-amd64.S'.
* cipher/Makefile.am: Add 'sha512-avx512-amd64.S'.
* cipher/sha512-avx512-amd64.S: New.
* cipher/sha512.c (USE_AVX512): New.
(do_sha512_transform_amd64_ssse3, do_sha512_transform_amd64_avx)
(do_sha512_transform_amd64_avx2): Add ASM_EXTRA_STACK to return value
only if assembly routine returned non-zero value.
[USE_AVX512] (_gcry_sha512_transform_amd64_avx512)
(do_sha512_transform_amd64_avx512): New.
(sha512_init_common) [USE_AVX512]: Use AVX512 implementation if HW
feature supported.
---
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA512 | 1.51 ns/B 631.6 MiB/s 6.17 c/B 4089
After (~29% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA512 | 1.16 ns/B 819.0 MiB/s 4.76 c/B 4090
GnuPG-bug-id: T4460
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--
In the files of the implementation (*.h, *.c), it says:
License: see LICENSE file in root directory
I think that user may look the LICENSES file instead easily.
GnuPG-bug-id 5523
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-ppc.c'.
* cipher/cipher-gcm-ppc.c: New.
* cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum)
(_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New.
(setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if
HW feature "ppc-vcrypto" is available.
* cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New.
(gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align
'gcm_table' to 16 bytes.
* configure.ac: Add 'cipher-gcm-ppc.lo'.
* tests/basic.c (_check_gcm_cipher): New AES256 test vector.
* AUTHORS: Add 'CRYPTOGAMS'.
* LICENSES: Add original license to 3-clause-BSD section.
--
https://dev.gnupg.org/D501:
10-20X speed.
However this Power 9 machine is faster than the last Power 9 benchmarks
on the optimized versions, so while better than the last patch, it is
not all due to the code.
Before:
GCM enc | 4.23 ns/B 225.3 MiB/s - c/B
GCM dec | 3.58 ns/B 266.2 MiB/s - c/B
GCM auth | 3.34 ns/B 285.3 MiB/s - c/B
After:
GCM enc | 0.370 ns/B 2578 MiB/s - c/B
GCM dec | 0.371 ns/B 2571 MiB/s - c/B
GCM auth | 0.159 ns/B 6003 MiB/s - c/B
Signed-off-by: Shawn Landden <shawn@git.icu>
[jk: coding style fixes, Makefile.am integration, patch from Differential
to git, commit changelog, fixed few compiler warnings]
GnuPG-bug-id: 5040
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 'sha512-ssse3-i386.c'.
* configure.ac: Add 'sha512-ssse3-i386.lo'.
* cipher/Makefile.am: Add 'sha512-ssse3-i386.c'.
* cipher/sha512-ssse3-i386.c: New.
* cipher/sha512.c (USE_SSSE3_I386, _gcry_sha512_transform_i386_ssse3)
(do_sha512_transform_i386_ssse3): New.
(_gcry_sha512_transform_arm) [USE_SSSE3_I386]: Use i386/SSSE3 transform
function if supported by CPU.
--
Benchmark on AMD Ryzen 7 3700X:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA512 | 12.58 ns/B 75.79 MiB/s 55.06 c/B 4375
After (~4.5x faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA512 | 2.78 ns/B 343.3 MiB/s 12.09 c/B 4351
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* random/rndjent.c: Change license to the one used by jitterentropy.h.
(jent_init_statistic): New.
(jent_bit_count): New.
(jent_statistic_copy_stat): new.
(jent_calc_statistic): New.
--
New code taken from Stephan's jitterentropy-stat.c. This does now
build with CONFIG_CRYPTO_CPU_JITTERENTROPY_STAT defined; not sure
whether this is already useful. Changed the license due to the new
code.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* random/jitterentropy-base-user.h: New.
* random/jitterentropy-base.c: New.
* random/jitterentropy.h: New.
--
Signed-off-by: Werner Koch <wk@gnupg.org>
- Tabs and trailing white spaces removed from original source.
- Source received by mail dated Fri, 27 Jan 2017 17:52:38 +0100 from
Stephan
|
|
|
|
|
|
|
| |
--
Add license and copyright statement for cipher/arcfour-amd64.S (public
domain) and cipher/cipher-ocb.c (OCB license 1)
|
|
|
|
|
|
|
|
| |
* LICENSES: Remove 'Simple permissive' and 'IETF permissive' licenses
for 'cipher/crc.c' as result of rewrite of CRC implementations.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* LICENSES: Add 'cipher/sha256-avx-amd64.S' and
'cipher/sha256-avx2-bmi2-amd64.S'.
* cipher/Makefile.am: Add 'sha256-avx-amd64.S' and
'sha256-avx2-bmi2-amd64.S'.
* cipher/sha256-avx-amd64.S: New.
* cipher/sha256-avx2-bmi2-amd64.S: New.
* cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few
places for tiny speed improvement.
* cipher/sha256.c (USE_AVX, USE_AVX2): New.
(SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'.
(sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above
new context members.
[USE_AVX] (_gcry_sha256_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Use AVX2 assembly if enabled.
(transform) [USE_AVX]: Use AVX assembly if enabled.
* configure.ac: Add 'sha256-avx-amd64.lo' and
'sha256-avx2-bmi2-amd64.lo'.
--
Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel
Corporation. The assembly source is licensed under 3-clause BSD license,
thus compatible with LGPL2.1+. Original source can be accessed at:
http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs
Implementation is described in white paper
"Fast SHA - 256 Implementations on Intel® Architecture Processors"
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html
Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
slower than RORQ, so therefore AVX implementation is (for now) limited
to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
HWF flag.
Benchmarks:
cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2
vs SSSE3
Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x
Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* LICENSES: New.
* Makefile.am (EXTRA_DIST): Add LICENSES.
* AUTHORS: Add list of copyright holders.
* README: Reference AUTHORS.
Signed-off-by: Werner Koch <wk@gnupg.org>
|