| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16
before and after vpopcntb.
* cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear
YMM16-YMM31 registers instead of XMM16-XMM31.
* cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise.
* cipher/keccak-amd64-avx512.S (clear_regs): Likewise.
(clear_avx512_4regs): Clear all 4 registers with XOR.
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
(_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of
ZMM16-ZMM19.
* cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31
registers after vector processing instead of XMM16-XMM31.
* cipher/sha512-avx512-amd64.S
(_gcry_sha512_transform_amd64_avx512): Likewise.
--
Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit
as this is better for AMD Zen4. Also clear xmm16 register
after vpopcnt in avx512 spec-stop so we do not leave any zmm
register state which might end up unnecessarily using CPU
resources.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c: Use xmm registers for AVX512
spec stop.
* cipher/asm-common-amd64.h (spec_stop_avx512): New.
* cipher/blake2b-amd64-avx512.S: Use spec_stop_avx512.
* cipher/blake2s-amd64-avx512.S: Likewise.
* cipher/camellia-gfni-avx512-amd64.S: Likewise.
* cipher/chacha20-avx512-amd64.S: Likewise.
* cipher/keccak-amd64-avx512.S: Likewise.
* cipher/poly1305-amd64-avx512.S: Likewise.
* cipher/sha512-avx512-amd64.S: Likewise.
* cipher/sm4-gfni-avx512-amd64.S: Likewise.
---
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* LICENSES: Add 'cipher/keccak-amd64-avx512.S'.
* configure.ac: Add 'keccak-amd64-avx512.lo'.
* cipher/Makefile.am: Add 'keccak-amd64-avx512.S'.
* cipher/keccak-amd64-avx512.S: New.
* cipher/keccak.c (USE_64BIT_AVX512, ASM_FUNC_ABI): New.
[USE_64BIT_AVX512] (_gcry_keccak_f1600_state_permute64_avx512)
(_gcry_keccak_absorb_blocks_avx512, keccak_f1600_state_permute64_avx512)
(keccak_absorb_lanes64_avx512, keccak_avx512_64_ops): New.
(keccak_init) [USE_64BIT_AVX512]: Enable x86-64 AVX512 implementation
if supported by HW features.
--
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before (BMI2 instructions):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA3-224 | 1.77 ns/B 540.3 MiB/s 7.22 c/B 4088
SHA3-256 | 1.86 ns/B 514.0 MiB/s 7.59 c/B 4089
SHA3-384 | 2.43 ns/B 393.1 MiB/s 9.92 c/B 4089
SHA3-512 | 3.49 ns/B 273.2 MiB/s 14.27 c/B 4088
SHAKE128 | 1.52 ns/B 629.1 MiB/s 6.20 c/B 4089
SHAKE256 | 1.86 ns/B 511.6 MiB/s 7.62 c/B 4089
After (~33% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
SHA3-224 | 1.32 ns/B 721.8 MiB/s 5.40 c/B 4089
SHA3-256 | 1.40 ns/B 681.7 MiB/s 5.72 c/B 4089
SHA3-384 | 1.83 ns/B 522.5 MiB/s 7.46 c/B 4089
SHA3-512 | 2.63 ns/B 362.1 MiB/s 10.77 c/B 4088
SHAKE128 | 1.13 ns/B 840.4 MiB/s 4.64 c/B 4089
SHAKE256 | 1.40 ns/B 682.1 MiB/s 5.72 c/B 4089
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|