| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Remove.
(bcast_u32_to_vec, u32_from_vec): New.
(_gcry_aes_ppc8_setkey): Use vectors for round key calculation
variables.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_clang_attribute_ppc_target): New.
* cipher/chacha20-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
* cipher/rijndael-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FPC_OPT_ATTR): New.
* cipher/rijndael-ppc9le.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FPC_OPT_ATTR): New.
* cipher/sha256-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
* cipher/sha512-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET]
(FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New.
(ror64): Remove unused function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h: Add PPC_OPT_ATTR attribute
macro for all functions.
* cipher/rijndael-ppc.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New.
(_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption): Add
PPC_OPT_ATTR attribute macro.
* cipher/rijndael-ppc9le.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New.
--
This change makes sure that PPC accelerated AES gets compiled
with proper optimization level and right target setting.
Benchmark on POWER9:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.305 ns/B 3129 MiB/s 0.701 c/B
ECB dec | 0.305 ns/B 3127 MiB/s 0.701 c/B
CBC enc | 1.66 ns/B 575.3 MiB/s 3.81 c/B
CBC dec | 0.318 ns/B 2997 MiB/s 0.732 c/B
CFB enc | 1.66 ns/B 574.7 MiB/s 3.82 c/B
CFB dec | 0.319 ns/B 2987 MiB/s 0.734 c/B
OFB enc | 2.15 ns/B 443.4 MiB/s 4.95 c/B
OFB dec | 2.15 ns/B 443.3 MiB/s 4.95 c/B
CTR enc | 0.328 ns/B 2907 MiB/s 0.754 c/B
CTR dec | 0.328 ns/B 2906 MiB/s 0.755 c/B
XTS enc | 0.516 ns/B 1849 MiB/s 1.19 c/B
XTS dec | 0.515 ns/B 1850 MiB/s 1.19 c/B
CCM enc | 1.98 ns/B 480.6 MiB/s 4.56 c/B
CCM dec | 1.98 ns/B 480.5 MiB/s 4.56 c/B
CCM auth | 1.66 ns/B 574.9 MiB/s 3.82 c/B
EAX enc | 1.99 ns/B 480.2 MiB/s 4.57 c/B
EAX dec | 1.99 ns/B 480.2 MiB/s 4.57 c/B
EAX auth | 1.66 ns/B 575.2 MiB/s 3.81 c/B
GCM enc | 0.552 ns/B 1727 MiB/s 1.27 c/B
GCM dec | 0.552 ns/B 1728 MiB/s 1.27 c/B
GCM auth | 0.225 ns/B 4240 MiB/s 0.517 c/B
OCB enc | 0.381 ns/B 2504 MiB/s 0.876 c/B
OCB dec | 0.385 ns/B 2477 MiB/s 0.886 c/B
OCB auth | 0.356 ns/B 2682 MiB/s 0.818 c/B
SIV enc | 1.98 ns/B 480.9 MiB/s 4.56 c/B
SIV dec | 2.11 ns/B 452.9 MiB/s 4.84 c/B
SIV auth | 1.66 ns/B 575.4 MiB/s 3.81 c/B
GCM-SIV enc | 0.726 ns/B 1314 MiB/s 1.67 c/B
GCM-SIV dec | 0.843 ns/B 1131 MiB/s 1.94 c/B
GCM-SIV auth | 0.377 ns/B 2527 MiB/s 0.868 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h (CTR32LE_ENC_FUNC): New.
* cipher/rijndael-ppc.c (_gcry_aes_ppc8_ctr32le_enc): New.
* cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ctr32le_enc): New.
* cipher/rijndael.c (_gcry_aes_ppc8_ctr32le_enc)
(_gcry_aes_ppc9le_ctr32le_enc): New.
(do_setkey): Setup _gcry_aes_ppc8_ctr32le_enc for POWER8 and
_gcry_aes_ppc9le_ctr32le_enc for POWER9.
--
Benchmark on POWER9:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM-SIV enc | 1.42 ns/B 672.2 MiB/s 3.26 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM-SIV enc | 0.725 ns/B 1316 MiB/s 1.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-functions.h (ECB_CRYPT_FUNC): New.
* cipher/rijndael-ppc.c (_gcry_aes_ppc8_ecb_crypt): New.
* cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_ppc8_ecb_crypt)
(_gcry_aes_ppc9le_ecb_crypt): New.
(do_setkey): Set up _gcry_aes_ppc8_ecb_crypt for POWER8 and
_gcry_aes_ppc9le_ecb_crypt for POWER9.
--
Benchmark on POWER9:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.875 ns/B 1090 MiB/s 2.01 c/B
ECB dec | 1.06 ns/B 899.8 MiB/s 2.44 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 0.305 ns/B 3126 MiB/s 0.702 c/B
ECB dec | 0.305 ns/B 3126 MiB/s 0.702 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-common.h (asm_sbox_be): New.
* cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Use 'asm_sbox_be'
instead of 'vec_sbox_be' since this instrinsics has different
prototype definition on GCC and Clang ('vector uchar' vs 'vector
ulong long').
* cipher/sha256-ppc.c (vec_ror_u32): Remove unused function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_setkey): New key
schedule with simplified structure and less stack usage.
* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add
'keyschedule32b'.
(keyschenc32b): New.
* cipher/rijndael-ppc-common.h (vec_u32): New.
* cipher/rijndael-ppc.c (vec_bswap32_const): Remove.
(_gcry_aes_sbox4_ppc8): Optimize for less instructions emitted.
(keysched_idx): New.
(_gcry_aes_ppc8_setkey): New key schedule with simplified structure.
* cipher/rijndael-tables.h (rcon): Remove.
* cipher/rijndael.c (sbox4): New.
(do_setkey): New key schedule with simplified structure and less
stack usage.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc-common.h (asm_aligned_ld, asm_aligned_st): Use
zero offset instruction variant when input offset is constant zero.
* cipher/rijndael-ppc.c (asm_load_be_noswap)
(asm_store_be_noswap): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add 'rijndael-ppc9le.lo'.
* cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h'
and 'rijndael-ppc-functions.h'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New.
(RIJNDAEL_context_s): Add 'use_ppc9le_crypto'.
* cipher/rijndael.c (_gcry_aes_ppc9le_encrypt)
(_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc)
(_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc)
(_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec)
(_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth)
(_gcry_aes_ppc9le_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt)
[USE_PPC_CRYPTO_WITH_PPC9LE]: New.
* cipher/rijndael-ppc.c: Split common code to headers
'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'.
* cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'.
(asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New.
* cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'.
(CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2.
(XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm
instruction.
* cipher/rijndael-ppc9le.c: New.
--
Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.
Benchmark on POWER9 (~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B
CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B
CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B
CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B
CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B
CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B
XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B
XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B
OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B
OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B
OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B
After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B
CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B
CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B
CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B
CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B
CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B
XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B
XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B
OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B
OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B
OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE)
(VEC_STORE_BE): Rewrite.
(VEC_BE_SWAP, VEC_LOAD_BE_NOSWAP, VEC_STORE_BE_NOSWAP): New.
(PRELOAD_ROUND_KEYS, AES_ENCRYPT, AES_DECRYPT): Adjust to new
input parameters for vector load macros.
(ROUND_KEY_VARIABLES_ALL, PRELOAD_ROUND_KEYS_ALL)
(AES_ENCRYPT_ALL): New.
(vec_bswap32_const_neg): New.
(vec_aligned_ld, vec_aligned_st, vec_load_be_const): Rename to...
(asm_aligned_ls, asm_aligned_st, asm_load_be_const): ...these.
(asm_be_swap, asm_vperm1, asm_load_be_noswap)
(asm_store_be_noswap): New.
(vec_add_uint128): Rename to...
(asm_add_uint128): ...this.
(asm_xor, asm_cipher_be, asm_cipherlast_be, asm_ncipher_be)
(asm_ncipherlast_be): New inline assembly functions with volatile
keyword to allow manual instruction ordering.
(_gcry_aes_ppc8_setkey, aes_ppc8_prepare_decryption)
(_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt)
(_gcry_aes_ppc8_cfb_enc, _gcry_aes_ppc8_cbc_enc)
(_gcry_aes_ppc8_ocb_auth): Update to use new&rewritten helper macros.
(_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_dec)
(_gcry_aes_ppc8_ctr_enc, _gcry_aes_ppc8_ocb_crypt)
(_gcry_aes_ppc8_xts_crypt): Update to use new&rewritten helper
macros; Tune 8-block parallel paths with manual instruction ordering.
--
Benchmarks on POWER8 (ppc64le, ~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.06 ns/B 902.2 MiB/s 4.02 c/B
CBC dec | 0.208 ns/B 4585 MiB/s 0.790 c/B
CFB enc | 1.06 ns/B 900.4 MiB/s 4.02 c/B
CFB dec | 0.208 ns/B 4588 MiB/s 0.790 c/B
CTR enc | 0.238 ns/B 4007 MiB/s 0.904 c/B
CTR dec | 0.238 ns/B 4009 MiB/s 0.904 c/B
XTS enc | 0.492 ns/B 1937 MiB/s 1.87 c/B
XTS dec | 0.488 ns/B 1955 MiB/s 1.85 c/B
OCB enc | 0.243 ns/B 3928 MiB/s 0.922 c/B
OCB dec | 0.247 ns/B 3858 MiB/s 0.939 c/B
OCB auth | 0.213 ns/B 4482 MiB/s 0.809 c/B
After (cbc-dec & cfb-dec & xts & ocb ~6% faster, ctr ~11% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.06 ns/B 902.1 MiB/s 4.02 c/B
CBC dec | 0.196 ns/B 4877 MiB/s 0.743 c/B
CFB enc | 1.06 ns/B 902.2 MiB/s 4.02 c/B
CFB dec | 0.195 ns/B 4889 MiB/s 0.741 c/B
CTR enc | 0.214 ns/B 4448 MiB/s 0.815 c/B
CTR dec | 0.214 ns/B 4452 MiB/s 0.814 c/B
XTS enc | 0.461 ns/B 2067 MiB/s 1.75 c/B
XTS dec | 0.456 ns/B 2092 MiB/s 1.73 c/B
OCB enc | 0.227 ns/B 4200 MiB/s 0.863 c/B
OCB dec | 0.234 ns/B 4072 MiB/s 0.890 c/B
OCB auth | 0.207 ns/B 4604 MiB/s 0.787 c/B
Benchmarks on POWER9 (ppc64le, ~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B
CBC dec | 0.240 ns/B 3982 MiB/s 0.910 c/B
CFB enc | 1.04 ns/B 917.6 MiB/s 3.95 c/B
CFB dec | 0.241 ns/B 3963 MiB/s 0.914 c/B
CTR enc | 0.249 ns/B 3835 MiB/s 0.945 c/B
CTR dec | 0.252 ns/B 3787 MiB/s 0.957 c/B
XTS enc | 0.505 ns/B 1889 MiB/s 1.92 c/B
XTS dec | 0.495 ns/B 1926 MiB/s 1.88 c/B
OCB enc | 0.303 ns/B 3152 MiB/s 1.15 c/B
OCB dec | 0.305 ns/B 3129 MiB/s 1.16 c/B
OCB auth | 0.265 ns/B 3595 MiB/s 1.01 c/B
After (cbc-dec & cfb-dec ~6% faster, ctr ~11% faster, ocb ~4% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 917.3 MiB/s 3.95 c/B
CBC dec | 0.225 ns/B 4234 MiB/s 0.856 c/B
CFB enc | 1.04 ns/B 917.8 MiB/s 3.95 c/B
CFB dec | 0.226 ns/B 4214 MiB/s 0.860 c/B
CTR enc | 0.221 ns/B 4306 MiB/s 0.842 c/B
CTR dec | 0.223 ns/B 4271 MiB/s 0.848 c/B
XTS enc | 0.503 ns/B 1897 MiB/s 1.91 c/B
XTS dec | 0.495 ns/B 1928 MiB/s 1.88 c/B
OCB enc | 0.288 ns/B 3309 MiB/s 1.10 c/B
OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B
OCB auth | 0.267 ns/B 3570 MiB/s 1.02 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (vec_aligned_ld, vec_load_be, vec_aligned_st)
(vec_store_be): Add "r0" to clobber list for load/store instructions.
--
Register r0 must not be used for RA input for vector load/store
instructions as r0 is not read as register but as value '0'.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (vec_add_uint128, _gcry_aes_ppc8_cfb_enc)
(_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_enc)
(_gcry_aes_ppc8_cbc_dec, _gcry_aes_ppc8_ctr_enc)
(_gcry_aes_ppc8_xts_crypt): New.
* cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_cfb_enc)
(_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_enc)
(_gcry_aes_ppc8_cbc_dec, _gcry_aes_ppc8_ctr_enc)
(_gcry_aes_ppc8_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_enc)
(_gcry_aes_cbc_dec, _gcry_aes_ctr_enc)
(_gcry_aes_xts_crypto) [USE_PPC_CRYPTO]: Enable PowerPC AES
CFB/CBC/CTR/XTS bulk implementations.
* configure.ac (gcry_cv_gcc_inline_asm_ppc_altivec): Add 'vadduwm'
instruction.
--
Benchmark on POWER8 ~3.8Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 2.13 ns/B 447.2 MiB/s 8.10 c/B
CBC dec | 1.13 ns/B 843.4 MiB/s 4.30 c/B
CFB enc | 2.20 ns/B 433.9 MiB/s 8.35 c/B
CFB dec | 2.22 ns/B 429.7 MiB/s 8.43 c/B
CTR enc | 2.18 ns/B 438.2 MiB/s 8.27 c/B
CTR dec | 2.18 ns/B 437.4 MiB/s 8.28 c/B
XTS enc | 2.31 ns/B 412.8 MiB/s 8.78 c/B
XTS dec | 2.30 ns/B 414.3 MiB/s 8.75 c/B
CCM enc | 4.33 ns/B 220.1 MiB/s 16.47 c/B
CCM dec | 4.34 ns/B 219.9 MiB/s 16.48 c/B
CCM auth | 2.16 ns/B 440.6 MiB/s 8.22 c/B
EAX enc | 4.34 ns/B 219.8 MiB/s 16.49 c/B
EAX dec | 4.34 ns/B 219.8 MiB/s 16.49 c/B
EAX auth | 2.16 ns/B 440.5 MiB/s 8.23 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.06 ns/B 903.1 MiB/s 4.01 c/B
CBC dec | 0.211 ns/B 4511 MiB/s 0.803 c/B
CFB enc | 1.06 ns/B 896.7 MiB/s 4.04 c/B
CFB dec | 0.209 ns/B 4563 MiB/s 0.794 c/B
CTR enc | 0.237 ns/B 4026 MiB/s 0.900 c/B
CTR dec | 0.237 ns/B 4029 MiB/s 0.900 c/B
XTS enc | 0.496 ns/B 1922 MiB/s 1.89 c/B
XTS dec | 0.496 ns/B 1924 MiB/s 1.88 c/B
CCM enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B
CCM dec | 1.29 ns/B 737.8 MiB/s 4.91 c/B
CCM auth | 1.06 ns/B 903.3 MiB/s 4.01 c/B
EAX enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B
EAX dec | 1.29 ns/B 737.2 MiB/s 4.92 c/B
GnuPG-bug-id: 4529
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (_gcry_aes_ppc8_ocb_auth): New.
* cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_ocb_auth): New
prototype.
(do_setkey, _gcry_aes_ocb_auth) [USE_PPC_CRYPTO]: Add PowerPC AES
ocb_auth.
--
Benchmark on POWER8 ~3.8Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.250 ns/B 3818 MiB/s 0.949 c/B
OCB dec | 0.250 ns/B 3820 MiB/s 0.949 c/B
OCB auth | 2.31 ns/B 412.5 MiB/s 8.79 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.252 ns/B 3779 MiB/s 0.959 c/B
OCB dec | 0.245 ns/B 3891 MiB/s 0.931 c/B
OCB auth | 0.223 ns/B 4283 MiB/s 0.846 c/B
GnuPG-bug-id: 4529
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-ppc.c (ROUND_KEY_VARIABLES, PRELOAD_ROUND_KEYS)
(AES_ENCRYPT, AES_DECRYPT): New.
(_gcry_aes_ppc8_prepare_decryption): Rename to...
(aes_ppc8_prepare_decryption): ... this.
(_gcry_aes_ppc8_prepare_decryption): New.
(aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): Remove.
(_gcry_aes_ppc8_encrypt): Use AES_ENCRYPT macro.
(_gcry_aes_ppc8_decrypt): Use AES_DECRYPT macro.
(_gcry_aes_ppc8_ocb_crypt): Uncomment; Optimizations for OCB offset
calculations, etc; Use new load/store and encryption/decryption macros.
* cipher/rijndaelc [USE_PPC_CRYPTO] (_gcry_aes_ppc8_ocb_crypt): New
prototype.
(do_setkey, _gcry_aes_ocb_crypt) [USE_PPC_CRYPTO]: Add PowerPC AES OCB
encryption/decryption.
--
Benchmark on POWER8 ~3.8Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B
OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B
OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B
After:
OCB enc | 0.250 ns/B 3818 MiB/s 0.949 c/B
OCB dec | 0.250 ns/B 3820 MiB/s 0.949 c/B
OCB auth | 2.31 ns/B 412.5 MiB/s 8.79 c/B
GnuPG-bug-id: 4529
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-ppc.c'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO): New.
(RIJNDAEL_context): Add 'use_ppc_crypto'.
* cipher/rijndael-ppc.c (backwards, swap_if_le): Remove.
(u128_t, ALWAYS_INLINE, NO_INLINE, NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE, ASM_FUNC_ATTR_NOINLINE)
(ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE, VEC_STORE_BE)
(vec_bswap32_const, vec_aligned_ld, vec_load_be_const)
(vec_load_be, vec_aligned_st, vec_store_be, _gcry_aes_sbox4_ppc8)
(_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption)
(aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): New.
(_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt): Rewrite.
(_gcry_aes_ppc8_ocb_crypt): Comment out.
* cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_setkey)
(_gcry_aes_ppc8_prepare_decryption, _gcry_aes_ppc8_encrypt)
(_gcry_aes_ppc8_decrypt): New prototypes.
(do_setkey) [USE_PPC_CRYPTO]: Add setup for PowerPC AES.
(prepare_decryption) [USE_PPC_CRYPTO]: Ditto.
* configure.ac: Add 'rijndael-ppc.lo'.
(gcry_cv_ppc_altivec, gcry_cv_cc_ppc_altivec_cflags)
(gcry_cv_gcc_inline_asm_ppc_altivec)
(gcry_cv_gcc_inline_asm_ppc_arch_3_00): New checks.
--
Benchmark on POWER8 ~3.8Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 7.27 ns/B 131.2 MiB/s 27.61 c/B
ECB dec | 7.70 ns/B 123.8 MiB/s 29.28 c/B
CBC enc | 6.38 ns/B 149.5 MiB/s 24.24 c/B
CBC dec | 6.17 ns/B 154.5 MiB/s 23.45 c/B
CFB enc | 6.45 ns/B 147.9 MiB/s 24.51 c/B
CFB dec | 6.20 ns/B 153.8 MiB/s 23.57 c/B
OFB enc | 7.36 ns/B 129.6 MiB/s 27.96 c/B
OFB dec | 7.36 ns/B 129.6 MiB/s 27.96 c/B
CTR enc | 6.22 ns/B 153.2 MiB/s 23.65 c/B
CTR dec | 6.22 ns/B 153.3 MiB/s 23.65 c/B
XTS enc | 6.67 ns/B 142.9 MiB/s 25.36 c/B
XTS dec | 6.70 ns/B 142.3 MiB/s 25.46 c/B
CCM enc | 12.61 ns/B 75.60 MiB/s 47.93 c/B
CCM dec | 12.62 ns/B 75.56 MiB/s 47.96 c/B
CCM auth | 6.41 ns/B 148.8 MiB/s 24.36 c/B
EAX enc | 12.62 ns/B 75.55 MiB/s 47.96 c/B
EAX dec | 12.62 ns/B 75.55 MiB/s 47.97 c/B
EAX auth | 6.39 ns/B 149.2 MiB/s 24.30 c/B
GCM enc | 9.81 ns/B 97.24 MiB/s 37.27 c/B
GCM dec | 9.81 ns/B 97.20 MiB/s 37.28 c/B
GCM auth | 3.59 ns/B 265.8 MiB/s 13.63 c/B
OCB enc | 6.39 ns/B 149.3 MiB/s 24.27 c/B
OCB dec | 6.38 ns/B 149.5 MiB/s 24.25 c/B
OCB auth | 6.35 ns/B 150.2 MiB/s 24.13 c/B
After:
ECB enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B
ECB dec | 1.34 ns/B 711.1 MiB/s 5.10 c/B
CBC enc | 2.13 ns/B 448.5 MiB/s 8.08 c/B
CBC dec | 1.05 ns/B 908.0 MiB/s 3.99 c/B
CFB enc | 2.17 ns/B 439.9 MiB/s 8.24 c/B
CFB dec | 2.22 ns/B 429.8 MiB/s 8.43 c/B
OFB enc | 1.49 ns/B 640.1 MiB/s 5.66 c/B
OFB dec | 1.49 ns/B 640.1 MiB/s 5.66 c/B
CTR enc | 2.21 ns/B 432.5 MiB/s 8.38 c/B
CTR dec | 2.20 ns/B 432.5 MiB/s 8.38 c/B
XTS enc | 2.32 ns/B 410.6 MiB/s 8.83 c/B
XTS dec | 2.33 ns/B 409.7 MiB/s 8.85 c/B
CCM enc | 4.36 ns/B 218.7 MiB/s 16.57 c/B
CCM dec | 4.36 ns/B 218.8 MiB/s 16.56 c/B
CCM auth | 2.17 ns/B 440.4 MiB/s 8.23 c/B
EAX enc | 4.37 ns/B 218.3 MiB/s 16.60 c/B
EAX dec | 4.36 ns/B 218.7 MiB/s 16.57 c/B
EAX auth | 2.16 ns/B 440.7 MiB/s 8.22 c/B
GCM enc | 5.78 ns/B 165.0 MiB/s 21.96 c/B
GCM dec | 5.78 ns/B 165.0 MiB/s 21.96 c/B
GCM auth | 3.59 ns/B 265.9 MiB/s 13.63 c/B
OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B
OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B
OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B
GnuPG-bug-id: 4529
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/rijndael-ppc.c: New implementation of single-block mode, and
implementation of OCB mode.
--
GnuPG-bug-id: 4529
[jk: split rijndael-ppc.c from patch 'rijndael/ppc: reimplement
single-block mode, and implement OCB block cipher' for basis
of new PowerPC vector crypto implementation of AES:
https://lists.gnupg.org/pipermail/gcrypt-devel/2019-July/004765.html]
[jk: coding-style fixes]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|