summaryrefslogtreecommitdiff
path: root/cipher/keccak-amd64-avx512.S
Commit message (Collapse)AuthorAgeFilesLines
* avx512: tweak zmm16-zmm31 register clearingJussi Kivilinna2023-01-171-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16 before and after vpopcntb. * cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear YMM16-YMM31 registers instead of XMM16-XMM31. * cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise. * cipher/keccak-amd64-avx512.S (clear_regs): Likewise. (clear_avx512_4regs): Clear all 4 registers with XOR. * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) (_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of ZMM16-ZMM19. * cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31 registers after vector processing instead of XMM16-XMM31. * cipher/sha512-avx512-amd64.S (_gcry_sha512_transform_amd64_avx512): Likewise. -- Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit as this is better for AMD Zen4. Also clear xmm16 register after vpopcnt in avx512 spec-stop so we do not leave any zmm register state which might end up unnecessarily using CPU resources. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* avx512: tweak AVX512 spec stop, use common macro in assemblyJussi Kivilinna2022-12-121-0/+4
| | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c: Use xmm registers for AVX512 spec stop. * cipher/asm-common-amd64.h (spec_stop_avx512): New. * cipher/blake2b-amd64-avx512.S: Use spec_stop_avx512. * cipher/blake2s-amd64-avx512.S: Likewise. * cipher/camellia-gfni-avx512-amd64.S: Likewise. * cipher/chacha20-avx512-amd64.S: Likewise. * cipher/keccak-amd64-avx512.S: Likewise. * cipher/poly1305-amd64-avx512.S: Likewise. * cipher/sha512-avx512-amd64.S: Likewise. * cipher/sm4-gfni-avx512-amd64.S: Likewise. --- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* sha3: Add x86-64 AVX512 accelerated implementationJussi Kivilinna2022-07-251-0/+583
* LICENSES: Add 'cipher/keccak-amd64-avx512.S'. * configure.ac: Add 'keccak-amd64-avx512.lo'. * cipher/Makefile.am: Add 'keccak-amd64-avx512.S'. * cipher/keccak-amd64-avx512.S: New. * cipher/keccak.c (USE_64BIT_AVX512, ASM_FUNC_ABI): New. [USE_64BIT_AVX512] (_gcry_keccak_f1600_state_permute64_avx512) (_gcry_keccak_absorb_blocks_avx512, keccak_f1600_state_permute64_avx512) (keccak_absorb_lanes64_avx512, keccak_avx512_64_ops): New. (keccak_init) [USE_64BIT_AVX512]: Enable x86-64 AVX512 implementation if supported by HW features. -- Benchmark on Intel Core i3-1115G4 (tigerlake): Before (BMI2 instructions): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA3-224 | 1.77 ns/B 540.3 MiB/s 7.22 c/B 4088 SHA3-256 | 1.86 ns/B 514.0 MiB/s 7.59 c/B 4089 SHA3-384 | 2.43 ns/B 393.1 MiB/s 9.92 c/B 4089 SHA3-512 | 3.49 ns/B 273.2 MiB/s 14.27 c/B 4088 SHAKE128 | 1.52 ns/B 629.1 MiB/s 6.20 c/B 4089 SHAKE256 | 1.86 ns/B 511.6 MiB/s 7.62 c/B 4089 After (~33% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz SHA3-224 | 1.32 ns/B 721.8 MiB/s 5.40 c/B 4089 SHA3-256 | 1.40 ns/B 681.7 MiB/s 5.72 c/B 4089 SHA3-384 | 1.83 ns/B 522.5 MiB/s 7.46 c/B 4089 SHA3-512 | 2.63 ns/B 362.1 MiB/s 10.77 c/B 4088 SHAKE128 | 1.13 ns/B 840.4 MiB/s 4.64 c/B 4089 SHAKE256 | 1.40 ns/B 682.1 MiB/s 5.72 c/B 4089 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>