| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/crc-ppc.c (CRC_VEC_U64_LOAD, CRC_VEC_U64_LOAD_LE)
(CRC_VEC_U64_LOAD_BE): Remove vec_vsx_ld usage.
(asm_vec_u64_load, asm_vec_u64_load_le): New.
* cipher/sha512-ppc.c (vec_vshasigma_u64): Use '__asm__' instead of
'asm' for assembly block.
(vec_u64_load, vec_u64_store): New.
(_gcry_sha512_transform_ppc8): Use vec_u64_load/store instead of
vec_vsx_ld/vec_vsx_st.
* configure.ac (gcy_cv_cc_ppc_altivec)
(gcy_cv_cc_ppc_altivec_cflags): Add check for vec_vsx_ld with
'unsigned int *' pointer type.
--
GCC 7.5 and clang 8.0 do not support vec_vsx_ld with 'unsigned long long *'
pointer type. Switch code to use inline assembly instead. As vec_vsx_ld
is still used with 'unsigned int *' pointers, add new check for this in
configure.ac.
GnuPG-bug-id: 4906
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/crc-ppc.c (CRC_VEC_U64_LOAD_BE): Move implementation to...
(asm_vec_u64_load_be): ...here; Add "r0" to clobber list for load
instruction when offset is not zero; Add zero offset path.
--
Register r0 must not be used for RA input for vector load/store
instructions as r0 is not read as register but as value '0'.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'crc-ppc.c'.
* cipher/crc-armv8-ce.c: Remove 'USE_INTEL_PCLMUL' comment.
* cipher/crc-ppc.c: New.
* cipher/crc.c (USE_PPC_VPMSUM): New.
(CRC_CONTEXT): Add 'use_vpmsum'.
(_gcry_crc32_ppc8_vpmsum, _gcry_crc24rfc2440_ppc8_vpmsum): New.
(crc32_init, crc24rfc2440_init): Add HWF check for 'use_vpmsum'.
(crc32_write, crc24rfc2440_write): Add 'use_vpmsum' code-path.
* configure.ac: Add 'vpmsumd' instruction to PowerPC VSX inline
assembly check; Add 'crc-ppc.lo'.
--
Benchmark on POWER8 (ppc64le, ~3.8Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.978 ns/B 975.0 MiB/s 3.72 c/B
CRC24RFC2440 | 0.974 ns/B 978.8 MiB/s 3.70 c/B
After(~22x faster):
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.044 ns/B 21878 MiB/s 0.166 c/B
CRC24RFC2440 | 0.043 ns/B 22077 MiB/s 0.164 c/B
Benchmark on POWER9 (ppc64le, ~3.8Ghz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 1.01 ns/B 943.7 MiB/s 3.84 c/B
CRC24RFC2440 | 0.993 ns/B 960.6 MiB/s 3.77 c/B
After (~20x faster):
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.046 ns/B 20675 MiB/s 0.175 c/B
CRC24RFC2440 | 0.048 ns/B 19691 MiB/s 0.184 c/B
GnuPG-bug-id: 4460
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|