summaryrefslogtreecommitdiff
path: root/LICENSES
Commit message (Collapse)AuthorAgeFilesLines
* sha3: Add x86-64 AVX512 accelerated implementationJussi Kivilinna2022-07-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 'cipher/keccak-amd64-avx512.S'. * configure.ac: Add 'keccak-amd64-avx512.lo'. * cipher/Makefile.am: Add 'keccak-amd64-avx512.S'. * cipher/keccak-amd64-avx512.S: New. * cipher/keccak.c (USE_64BIT_AVX512, ASM_FUNC_ABI): New. [USE_64BIT_AVX512] (_gcry_keccak_f1600_state_permute64_avx512) (_gcry_keccak_absorb_blocks_avx512, keccak_f1600_state_permute64_avx512) (keccak_absorb_lanes64_avx512, keccak_avx512_64_ops): New. (keccak_init) [USE_64BIT_AVX512]: Enable x86-64 AVX512 implementation if supported by HW features. -- Benchmark on Intel Core i3-1115G4 (tigerlake): Before (BMI2 instructions): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA3-224 | 1.77 ns/B 540.3 MiB/s 7.22 c/B 4088 SHA3-256 | 1.86 ns/B 514.0 MiB/s 7.59 c/B 4089 SHA3-384 | 2.43 ns/B 393.1 MiB/s 9.92 c/B 4089 SHA3-512 | 3.49 ns/B 273.2 MiB/s 14.27 c/B 4088 SHAKE128 | 1.52 ns/B 629.1 MiB/s 6.20 c/B 4089 SHAKE256 | 1.86 ns/B 511.6 MiB/s 7.62 c/B 4089 After (~33% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA3-224 | 1.32 ns/B 721.8 MiB/s 5.40 c/B 4089 SHA3-256 | 1.40 ns/B 681.7 MiB/s 5.72 c/B 4089 SHA3-384 | 1.83 ns/B 522.5 MiB/s 7.46 c/B 4089 SHA3-512 | 2.63 ns/B 362.1 MiB/s 10.77 c/B 4088 SHAKE128 | 1.13 ns/B 840.4 MiB/s 4.64 c/B 4089 SHAKE256 | 1.40 ns/B 682.1 MiB/s 5.72 c/B 4089 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* poly1305: add AVX512 implementationJussi Kivilinna2022-04-061-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S. * cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'. * cipher/poly1305-amd64-avx512.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX512): New. (poly1305_context_s): Add 'use_avx512'. * cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New. [POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks) (poly1305_amd64_avx512_blocks): New. (poly1305_init): Use AVX512 is HW feature available (set use_avx512). [USE_MPI_64BIT] (poly1305_blocks): Rename to ... [USE_MPI_64BIT] (poly1305_blocks_generic): ... this. [USE_MPI_64BIT] (poly1305_blocks): New. -- Patch adds AMD64 AVX512-FMA52 implementation for Poly1305. Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090 After (5.0x faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* SHA512: Add AVX512 implementationJussi Kivilinna2022-03-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 'cipher/sha512-avx512-amd64.S'. * cipher/Makefile.am: Add 'sha512-avx512-amd64.S'. * cipher/sha512-avx512-amd64.S: New. * cipher/sha512.c (USE_AVX512): New. (do_sha512_transform_amd64_ssse3, do_sha512_transform_amd64_avx) (do_sha512_transform_amd64_avx2): Add ASM_EXTRA_STACK to return value only if assembly routine returned non-zero value. [USE_AVX512] (_gcry_sha512_transform_amd64_avx512) (do_sha512_transform_amd64_avx512): New. (sha512_init_common) [USE_AVX512]: Use AVX512 implementation if HW feature supported. --- Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA512 | 1.51 ns/B 631.6 MiB/s 6.17 c/B 4089 After (~29% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA512 | 1.16 ns/B 819.0 MiB/s 4.76 c/B 4090 GnuPG-bug-id: T4460 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* doc: Update LICENSES for jitterentropy 3.3.0.NIIBE Yutaka2021-11-171-36/+43
| | | | | | | | | | | | | -- In the files of the implementation (*.h, *.c), it says: License: see LICENSE file in root directory I think that user may look the LICENSES file instead easily. GnuPG-bug-id 5523 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* VPMSUMD acceleration for GCM mode on PPCShawn Landden2021-03-071-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-ppc.c'. * cipher/cipher-gcm-ppc.c: New. * cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum) (_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New. (setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if HW feature "ppc-vcrypto" is available. * cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New. (gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align 'gcm_table' to 16 bytes. * configure.ac: Add 'cipher-gcm-ppc.lo'. * tests/basic.c (_check_gcm_cipher): New AES256 test vector. * AUTHORS: Add 'CRYPTOGAMS'. * LICENSES: Add original license to 3-clause-BSD section. -- https://dev.gnupg.org/D501: 10-20X speed. However this Power 9 machine is faster than the last Power 9 benchmarks on the optimized versions, so while better than the last patch, it is not all due to the code. Before: GCM enc | 4.23 ns/B 225.3 MiB/s - c/B GCM dec | 3.58 ns/B 266.2 MiB/s - c/B GCM auth | 3.34 ns/B 285.3 MiB/s - c/B After: GCM enc | 0.370 ns/B 2578 MiB/s - c/B GCM dec | 0.371 ns/B 2571 MiB/s - c/B GCM auth | 0.159 ns/B 6003 MiB/s - c/B Signed-off-by: Shawn Landden <shawn@git.icu> [jk: coding style fixes, Makefile.am integration, patch from Differential to git, commit changelog, fixed few compiler warnings] GnuPG-bug-id: 5040 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add i386/SSSE3 implementation of SHA512Jussi Kivilinna2019-11-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 'sha512-ssse3-i386.c'. * configure.ac: Add 'sha512-ssse3-i386.lo'. * cipher/Makefile.am: Add 'sha512-ssse3-i386.c'. * cipher/sha512-ssse3-i386.c: New. * cipher/sha512.c (USE_SSSE3_I386, _gcry_sha512_transform_i386_ssse3) (do_sha512_transform_i386_ssse3): New. (_gcry_sha512_transform_arm) [USE_SSSE3_I386]: Use i386/SSSE3 transform function if supported by CPU. -- Benchmark on AMD Ryzen 7 3700X: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA512 | 12.58 ns/B 75.79 MiB/s 55.06 c/B 4375 After (~4.5x faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA512 | 2.78 ns/B 343.3 MiB/s 12.09 c/B 4351 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* random: Allow building rndjent.c with stats collecting enabled.Werner Koch2017-06-161-1/+1
| | | | | | | | | | | | | | | | * random/rndjent.c: Change license to the one used by jitterentropy.h. (jent_init_statistic): New. (jent_bit_count): New. (jent_statistic_copy_stat): new. (jent_calc_statistic): New. -- New code taken from Stephan's jitterentropy-stat.c. This does now build with CONFIG_CRYPTO_CPU_JITTERENTROPY_STAT defined; not sure whether this is already useful. Changed the license due to the new code. Signed-off-by: Werner Koch <wk@gnupg.org>
* random: Add original Jitter RNG implementationStephan Mueller2017-06-131-0/+45
| | | | | | | | | | | | | * random/jitterentropy-base-user.h: New. * random/jitterentropy-base.c: New. * random/jitterentropy.h: New. -- Signed-off-by: Werner Koch <wk@gnupg.org> - Tabs and trailing white spaces removed from original source. - Source received by mail dated Fri, 27 Jan 2017 17:52:38 +0100 from Stephan
* Document more non LGPL-licensed code.Andreas Metzler2016-02-121-0/+107
| | | | | | | -- Add license and copyright statement for cipher/arcfour-amd64.S (public domain) and cipher/cipher-ocb.c (OCB license 1)
* Update license information for CRCJussi Kivilinna2015-11-181-50/+0
| | | | | | | | * LICENSES: Remove 'Simple permissive' and 'IETF permissive' licenses for 'cipher/crc.c' as result of rewrite of CRC implementations. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add AVX and AVX2/BMI implementations for SHA-256Jussi Kivilinna2013-12-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSES: Add 'cipher/sha256-avx-amd64.S' and 'cipher/sha256-avx2-bmi2-amd64.S'. * cipher/Makefile.am: Add 'sha256-avx-amd64.S' and 'sha256-avx2-bmi2-amd64.S'. * cipher/sha256-avx-amd64.S: New. * cipher/sha256-avx2-bmi2-amd64.S: New. * cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few places for tiny speed improvement. * cipher/sha256.c (USE_AVX, USE_AVX2): New. (SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'. (sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above new context members. [USE_AVX] (_gcry_sha256_transform_amd64_avx): New. [USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New. (transform) [USE_AVX2]: Use AVX2 assembly if enabled. (transform) [USE_AVX]: Use AVX assembly if enabled. * configure.ac: Add 'sha256-avx-amd64.lo' and 'sha256-avx2-bmi2-amd64.lo'. -- Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel Corporation. The assembly source is licensed under 3-clause BSD license, thus compatible with LGPL2.1+. Original source can be accessed at: http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs Implementation is described in white paper "Fast SHA - 256 Implementations on Intel® Architecture Processors" http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much slower than RORQ, so therefore AVX implementation is (for now) limited to Intel CPUs. Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional HWF flag. Benchmarks: cpu C-lang SSSE3 AVX/AVX2 C vs AVX/AVX2 vs SSSE3 Intel i5-4570 13.86 c/B 10.27 c/B 8.70 c/B 1.59x 1.18x Intel i5-2450M 17.25 c/B 12.36 c/B 10.31 c/B 1.67x 1.19x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Update license informationWerner Koch2013-12-131-0/+134
* LICENSES: New. * Makefile.am (EXTRA_DIST): Add LICENSES. * AUTHORS: Add list of copyright holders. * README: Reference AUTHORS. Signed-off-by: Werner Koch <wk@gnupg.org>