summaryrefslogtreecommitdiff
path: root/cipher/rijndael-ppc.c
Commit message (Collapse)AuthorAgeFilesLines
* rijndael-ppc: use vector registers for key schedule calculationsJussi Kivilinna2023-03-061-29/+39
| | | | | | | | | | * cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Remove. (bcast_u32_to_vec, u32_from_vec): New. (_gcry_aes_ppc8_setkey): Use vectors for round key calculation variables. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ppc: add support for clang target attributeJussi Kivilinna2023-02-261-1/+3
| | | | | | | | | | | | | | | | | | * configure.ac (gcry_cv_clang_attribute_ppc_target): New. * cipher/chacha20-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET] (FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New. * cipher/rijndael-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET] (FPC_OPT_ATTR): New. * cipher/rijndael-ppc9le.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET] (FPC_OPT_ATTR): New. * cipher/sha256-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET] (FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New. * cipher/sha512-ppc.c [HAVE_CLANG_ATTRIBUTE_PPC_TARGET] (FUNC_ATTR_TARGET_P8, FUNC_ATTR_TARGET_P9): New. (ror64): Remove unused function. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aes-ppc: use target and optimize attributes for P8 and P9Jussi Kivilinna2023-02-261-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc-functions.h: Add PPC_OPT_ATTR attribute macro for all functions. * cipher/rijndael-ppc.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New. (_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption): Add PPC_OPT_ATTR attribute macro. * cipher/rijndael-ppc9le.c (FUNC_ATTR_OPT, PPC_OPT_ATTR): New. -- This change makes sure that PPC accelerated AES gets compiled with proper optimization level and right target setting. Benchmark on POWER9: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 0.305 ns/B 3129 MiB/s 0.701 c/B ECB dec | 0.305 ns/B 3127 MiB/s 0.701 c/B CBC enc | 1.66 ns/B 575.3 MiB/s 3.81 c/B CBC dec | 0.318 ns/B 2997 MiB/s 0.732 c/B CFB enc | 1.66 ns/B 574.7 MiB/s 3.82 c/B CFB dec | 0.319 ns/B 2987 MiB/s 0.734 c/B OFB enc | 2.15 ns/B 443.4 MiB/s 4.95 c/B OFB dec | 2.15 ns/B 443.3 MiB/s 4.95 c/B CTR enc | 0.328 ns/B 2907 MiB/s 0.754 c/B CTR dec | 0.328 ns/B 2906 MiB/s 0.755 c/B XTS enc | 0.516 ns/B 1849 MiB/s 1.19 c/B XTS dec | 0.515 ns/B 1850 MiB/s 1.19 c/B CCM enc | 1.98 ns/B 480.6 MiB/s 4.56 c/B CCM dec | 1.98 ns/B 480.5 MiB/s 4.56 c/B CCM auth | 1.66 ns/B 574.9 MiB/s 3.82 c/B EAX enc | 1.99 ns/B 480.2 MiB/s 4.57 c/B EAX dec | 1.99 ns/B 480.2 MiB/s 4.57 c/B EAX auth | 1.66 ns/B 575.2 MiB/s 3.81 c/B GCM enc | 0.552 ns/B 1727 MiB/s 1.27 c/B GCM dec | 0.552 ns/B 1728 MiB/s 1.27 c/B GCM auth | 0.225 ns/B 4240 MiB/s 0.517 c/B OCB enc | 0.381 ns/B 2504 MiB/s 0.876 c/B OCB dec | 0.385 ns/B 2477 MiB/s 0.886 c/B OCB auth | 0.356 ns/B 2682 MiB/s 0.818 c/B SIV enc | 1.98 ns/B 480.9 MiB/s 4.56 c/B SIV dec | 2.11 ns/B 452.9 MiB/s 4.84 c/B SIV auth | 1.66 ns/B 575.4 MiB/s 3.81 c/B GCM-SIV enc | 0.726 ns/B 1314 MiB/s 1.67 c/B GCM-SIV dec | 0.843 ns/B 1131 MiB/s 1.94 c/B GCM-SIV auth | 0.377 ns/B 2527 MiB/s 0.868 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aes-ppc: add CTR32LE bulk accelerationJussi Kivilinna2023-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc-functions.h (CTR32LE_ENC_FUNC): New. * cipher/rijndael-ppc.c (_gcry_aes_ppc8_ctr32le_enc): New. * cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ctr32le_enc): New. * cipher/rijndael.c (_gcry_aes_ppc8_ctr32le_enc) (_gcry_aes_ppc9le_ctr32le_enc): New. (do_setkey): Setup _gcry_aes_ppc8_ctr32le_enc for POWER8 and _gcry_aes_ppc9le_ctr32le_enc for POWER9. -- Benchmark on POWER9: Before: AES | nanosecs/byte mebibytes/sec cycles/byte GCM-SIV enc | 1.42 ns/B 672.2 MiB/s 3.26 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte GCM-SIV enc | 0.725 ns/B 1316 MiB/s 1.67 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* aes-ppc: add ECB bulk acceleration for benchmarking purposesJussi Kivilinna2023-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc-functions.h (ECB_CRYPT_FUNC): New. * cipher/rijndael-ppc.c (_gcry_aes_ppc8_ecb_crypt): New. * cipher/rijndael-ppc9le.c (_gcry_aes_ppc9le_ecb_crypt): New. * cipher/rijndael.c (_gcry_aes_ppc8_ecb_crypt) (_gcry_aes_ppc9le_ecb_crypt): New. (do_setkey): Set up _gcry_aes_ppc8_ecb_crypt for POWER8 and _gcry_aes_ppc9le_ecb_crypt for POWER9. -- Benchmark on POWER9: Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 0.875 ns/B 1090 MiB/s 2.01 c/B ECB dec | 1.06 ns/B 899.8 MiB/s 2.44 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 0.305 ns/B 3126 MiB/s 0.702 c/B ECB dec | 0.305 ns/B 3126 MiB/s 0.702 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Fix compiler warnings seen with clang-powerpc64le targetJussi Kivilinna2023-01-041-2/+2
| | | | | | | | | | | | * cipher/rijndael-ppc-common.h (asm_sbox_be): New. * cipher/rijndael-ppc.c (_gcry_aes_sbox4_ppc8): Use 'asm_sbox_be' instead of 'vec_sbox_be' since this instrinsics has different prototype definition on GCC and Clang ('vector uchar' vs 'vector ulong long'). * cipher/sha256-ppc.c (vec_ror_u32): Remove unused function. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Simplify AES key schedule implementationJussi Kivilinna2022-07-311-108/+52
| | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_setkey): New key schedule with simplified structure and less stack usage. * cipher/rijndael-internal.h (RIJNDAEL_context_s): Add 'keyschedule32b'. (keyschenc32b): New. * cipher/rijndael-ppc-common.h (vec_u32): New. * cipher/rijndael-ppc.c (vec_bswap32_const): Remove. (_gcry_aes_sbox4_ppc8): Optimize for less instructions emitted. (keysched_idx): New. (_gcry_aes_ppc8_setkey): New key schedule with simplified structure. * cipher/rijndael-tables.h (rcon): Remove. * cipher/rijndael.c (sbox4): New. (do_setkey): New key schedule with simplified structure and less stack usage. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rinjdael-aes: use zero offset vector load/store when possibleJussi Kivilinna2020-02-021-8/+24
| | | | | | | | | | * cipher/rijndael-ppc-common.h (asm_aligned_ld, asm_aligned_st): Use zero offset instruction variant when input offset is constant zero. * cipher/rijndael-ppc.c (asm_load_be_noswap) (asm_store_be_noswap): Likewise. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add POWER9 little-endian variant of PPC AES implementationJussi Kivilinna2020-02-021-2249/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac: Add 'rijndael-ppc9le.lo'. * cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New. (RIJNDAEL_context_s): Add 'use_ppc9le_crypto'. * cipher/rijndael.c (_gcry_aes_ppc9le_encrypt) (_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc) (_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc) (_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec) (_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth) (_gcry_aes_ppc9le_xts_crypt): New. (do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt) [USE_PPC_CRYPTO_WITH_PPC9LE]: New. * cipher/rijndael-ppc.c: Split common code to headers 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'. (asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New. * cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'. (CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2. (XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm instruction. * cipher/rijndael-ppc9le.c: New. -- Provide POWER9 little-endian optimized variant of PPC vcrypto AES implementation. This implementation uses 'lxvb16x' and 'stxvb16x' instructions to load/store vectors directly in big-endian order. Benchmark on POWER9 (~3.8Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: performance improvementsJussi Kivilinna2019-12-231-727/+1112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc.c (ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE) (VEC_STORE_BE): Rewrite. (VEC_BE_SWAP, VEC_LOAD_BE_NOSWAP, VEC_STORE_BE_NOSWAP): New. (PRELOAD_ROUND_KEYS, AES_ENCRYPT, AES_DECRYPT): Adjust to new input parameters for vector load macros. (ROUND_KEY_VARIABLES_ALL, PRELOAD_ROUND_KEYS_ALL) (AES_ENCRYPT_ALL): New. (vec_bswap32_const_neg): New. (vec_aligned_ld, vec_aligned_st, vec_load_be_const): Rename to... (asm_aligned_ls, asm_aligned_st, asm_load_be_const): ...these. (asm_be_swap, asm_vperm1, asm_load_be_noswap) (asm_store_be_noswap): New. (vec_add_uint128): Rename to... (asm_add_uint128): ...this. (asm_xor, asm_cipher_be, asm_cipherlast_be, asm_ncipher_be) (asm_ncipherlast_be): New inline assembly functions with volatile keyword to allow manual instruction ordering. (_gcry_aes_ppc8_setkey, aes_ppc8_prepare_decryption) (_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt) (_gcry_aes_ppc8_cfb_enc, _gcry_aes_ppc8_cbc_enc) (_gcry_aes_ppc8_ocb_auth): Update to use new&rewritten helper macros. (_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_dec) (_gcry_aes_ppc8_ctr_enc, _gcry_aes_ppc8_ocb_crypt) (_gcry_aes_ppc8_xts_crypt): Update to use new&rewritten helper macros; Tune 8-block parallel paths with manual instruction ordering. -- Benchmarks on POWER8 (ppc64le, ~3.8Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.06 ns/B 902.2 MiB/s 4.02 c/B CBC dec | 0.208 ns/B 4585 MiB/s 0.790 c/B CFB enc | 1.06 ns/B 900.4 MiB/s 4.02 c/B CFB dec | 0.208 ns/B 4588 MiB/s 0.790 c/B CTR enc | 0.238 ns/B 4007 MiB/s 0.904 c/B CTR dec | 0.238 ns/B 4009 MiB/s 0.904 c/B XTS enc | 0.492 ns/B 1937 MiB/s 1.87 c/B XTS dec | 0.488 ns/B 1955 MiB/s 1.85 c/B OCB enc | 0.243 ns/B 3928 MiB/s 0.922 c/B OCB dec | 0.247 ns/B 3858 MiB/s 0.939 c/B OCB auth | 0.213 ns/B 4482 MiB/s 0.809 c/B After (cbc-dec & cfb-dec & xts & ocb ~6% faster, ctr ~11% faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.06 ns/B 902.1 MiB/s 4.02 c/B CBC dec | 0.196 ns/B 4877 MiB/s 0.743 c/B CFB enc | 1.06 ns/B 902.2 MiB/s 4.02 c/B CFB dec | 0.195 ns/B 4889 MiB/s 0.741 c/B CTR enc | 0.214 ns/B 4448 MiB/s 0.815 c/B CTR dec | 0.214 ns/B 4452 MiB/s 0.814 c/B XTS enc | 0.461 ns/B 2067 MiB/s 1.75 c/B XTS dec | 0.456 ns/B 2092 MiB/s 1.73 c/B OCB enc | 0.227 ns/B 4200 MiB/s 0.863 c/B OCB dec | 0.234 ns/B 4072 MiB/s 0.890 c/B OCB auth | 0.207 ns/B 4604 MiB/s 0.787 c/B Benchmarks on POWER9 (ppc64le, ~3.8Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B CBC dec | 0.240 ns/B 3982 MiB/s 0.910 c/B CFB enc | 1.04 ns/B 917.6 MiB/s 3.95 c/B CFB dec | 0.241 ns/B 3963 MiB/s 0.914 c/B CTR enc | 0.249 ns/B 3835 MiB/s 0.945 c/B CTR dec | 0.252 ns/B 3787 MiB/s 0.957 c/B XTS enc | 0.505 ns/B 1889 MiB/s 1.92 c/B XTS dec | 0.495 ns/B 1926 MiB/s 1.88 c/B OCB enc | 0.303 ns/B 3152 MiB/s 1.15 c/B OCB dec | 0.305 ns/B 3129 MiB/s 1.16 c/B OCB auth | 0.265 ns/B 3595 MiB/s 1.01 c/B After (cbc-dec & cfb-dec ~6% faster, ctr ~11% faster, ocb ~4% faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 917.3 MiB/s 3.95 c/B CBC dec | 0.225 ns/B 4234 MiB/s 0.856 c/B CFB enc | 1.04 ns/B 917.8 MiB/s 3.95 c/B CFB dec | 0.226 ns/B 4214 MiB/s 0.860 c/B CTR enc | 0.221 ns/B 4306 MiB/s 0.842 c/B CTR dec | 0.223 ns/B 4271 MiB/s 0.848 c/B XTS enc | 0.503 ns/B 1897 MiB/s 1.91 c/B XTS dec | 0.495 ns/B 1928 MiB/s 1.88 c/B OCB enc | 0.288 ns/B 3309 MiB/s 1.10 c/B OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B OCB auth | 0.267 ns/B 3570 MiB/s 1.02 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: fix bad register used for vector load/store assemblyJussi Kivilinna2019-12-231-4/+4
| | | | | | | | | | | * cipher/rijndael-ppc.c (vec_aligned_ld, vec_load_be, vec_aligned_st) (vec_store_be): Add "r0" to clobber list for load/store instructions. -- Register r0 must not be used for RA input for vector load/store instructions as r0 is not read as register but as value '0'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: add bulk modes for CBC, CFB, CTR and XTSJussi Kivilinna2019-08-261-0/+1010
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc.c (vec_add_uint128, _gcry_aes_ppc8_cfb_enc) (_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_enc) (_gcry_aes_ppc8_cbc_dec, _gcry_aes_ppc8_ctr_enc) (_gcry_aes_ppc8_xts_crypt): New. * cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_cfb_enc) (_gcry_aes_ppc8_cfb_dec, _gcry_aes_ppc8_cbc_enc) (_gcry_aes_ppc8_cbc_dec, _gcry_aes_ppc8_ctr_enc) (_gcry_aes_ppc8_xts_crypt): New. (do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_enc) (_gcry_aes_cbc_dec, _gcry_aes_ctr_enc) (_gcry_aes_xts_crypto) [USE_PPC_CRYPTO]: Enable PowerPC AES CFB/CBC/CTR/XTS bulk implementations. * configure.ac (gcry_cv_gcc_inline_asm_ppc_altivec): Add 'vadduwm' instruction. -- Benchmark on POWER8 ~3.8Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 2.13 ns/B 447.2 MiB/s 8.10 c/B CBC dec | 1.13 ns/B 843.4 MiB/s 4.30 c/B CFB enc | 2.20 ns/B 433.9 MiB/s 8.35 c/B CFB dec | 2.22 ns/B 429.7 MiB/s 8.43 c/B CTR enc | 2.18 ns/B 438.2 MiB/s 8.27 c/B CTR dec | 2.18 ns/B 437.4 MiB/s 8.28 c/B XTS enc | 2.31 ns/B 412.8 MiB/s 8.78 c/B XTS dec | 2.30 ns/B 414.3 MiB/s 8.75 c/B CCM enc | 4.33 ns/B 220.1 MiB/s 16.47 c/B CCM dec | 4.34 ns/B 219.9 MiB/s 16.48 c/B CCM auth | 2.16 ns/B 440.6 MiB/s 8.22 c/B EAX enc | 4.34 ns/B 219.8 MiB/s 16.49 c/B EAX dec | 4.34 ns/B 219.8 MiB/s 16.49 c/B EAX auth | 2.16 ns/B 440.5 MiB/s 8.23 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.06 ns/B 903.1 MiB/s 4.01 c/B CBC dec | 0.211 ns/B 4511 MiB/s 0.803 c/B CFB enc | 1.06 ns/B 896.7 MiB/s 4.04 c/B CFB dec | 0.209 ns/B 4563 MiB/s 0.794 c/B CTR enc | 0.237 ns/B 4026 MiB/s 0.900 c/B CTR dec | 0.237 ns/B 4029 MiB/s 0.900 c/B XTS enc | 0.496 ns/B 1922 MiB/s 1.89 c/B XTS dec | 0.496 ns/B 1924 MiB/s 1.88 c/B CCM enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B CCM dec | 1.29 ns/B 737.8 MiB/s 4.91 c/B CCM auth | 1.06 ns/B 903.3 MiB/s 4.01 c/B EAX enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B EAX dec | 1.29 ns/B 737.2 MiB/s 4.92 c/B GnuPG-bug-id: 4529 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: add bulk mode for ocb_authJussi Kivilinna2019-08-261-0/+209
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc.c (_gcry_aes_ppc8_ocb_auth): New. * cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_ocb_auth): New prototype. (do_setkey, _gcry_aes_ocb_auth) [USE_PPC_CRYPTO]: Add PowerPC AES ocb_auth. -- Benchmark on POWER8 ~3.8Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 0.250 ns/B 3818 MiB/s 0.949 c/B OCB dec | 0.250 ns/B 3820 MiB/s 0.949 c/B OCB auth | 2.31 ns/B 412.5 MiB/s 8.79 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 0.252 ns/B 3779 MiB/s 0.959 c/B OCB dec | 0.245 ns/B 3891 MiB/s 0.931 c/B OCB auth | 0.223 ns/B 4283 MiB/s 0.846 c/B GnuPG-bug-id: 4529 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: enable PowerPC AES-OCB implementionJussi Kivilinna2019-08-261-529/+462
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-ppc.c (ROUND_KEY_VARIABLES, PRELOAD_ROUND_KEYS) (AES_ENCRYPT, AES_DECRYPT): New. (_gcry_aes_ppc8_prepare_decryption): Rename to... (aes_ppc8_prepare_decryption): ... this. (_gcry_aes_ppc8_prepare_decryption): New. (aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): Remove. (_gcry_aes_ppc8_encrypt): Use AES_ENCRYPT macro. (_gcry_aes_ppc8_decrypt): Use AES_DECRYPT macro. (_gcry_aes_ppc8_ocb_crypt): Uncomment; Optimizations for OCB offset calculations, etc; Use new load/store and encryption/decryption macros. * cipher/rijndaelc [USE_PPC_CRYPTO] (_gcry_aes_ppc8_ocb_crypt): New prototype. (do_setkey, _gcry_aes_ocb_crypt) [USE_PPC_CRYPTO]: Add PowerPC AES OCB encryption/decryption. -- Benchmark on POWER8 ~3.8Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B After: OCB enc | 0.250 ns/B 3818 MiB/s 0.949 c/B OCB dec | 0.250 ns/B 3820 MiB/s 0.949 c/B OCB auth | 2.31 ns/B 412.5 MiB/s 8.79 c/B GnuPG-bug-id: 4529 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: add key setup and enable single block PowerPC AESJussi Kivilinna2019-08-261-89/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-ppc.c'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO): New. (RIJNDAEL_context): Add 'use_ppc_crypto'. * cipher/rijndael-ppc.c (backwards, swap_if_le): Remove. (u128_t, ALWAYS_INLINE, NO_INLINE, NO_INSTRUMENT_FUNCTION) (ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE, ASM_FUNC_ATTR_NOINLINE) (ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE, VEC_STORE_BE) (vec_bswap32_const, vec_aligned_ld, vec_load_be_const) (vec_load_be, vec_aligned_st, vec_store_be, _gcry_aes_sbox4_ppc8) (_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption) (aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): New. (_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt): Rewrite. (_gcry_aes_ppc8_ocb_crypt): Comment out. * cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_setkey) (_gcry_aes_ppc8_prepare_decryption, _gcry_aes_ppc8_encrypt) (_gcry_aes_ppc8_decrypt): New prototypes. (do_setkey) [USE_PPC_CRYPTO]: Add setup for PowerPC AES. (prepare_decryption) [USE_PPC_CRYPTO]: Ditto. * configure.ac: Add 'rijndael-ppc.lo'. (gcry_cv_ppc_altivec, gcry_cv_cc_ppc_altivec_cflags) (gcry_cv_gcc_inline_asm_ppc_altivec) (gcry_cv_gcc_inline_asm_ppc_arch_3_00): New checks. -- Benchmark on POWER8 ~3.8Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 7.27 ns/B 131.2 MiB/s 27.61 c/B ECB dec | 7.70 ns/B 123.8 MiB/s 29.28 c/B CBC enc | 6.38 ns/B 149.5 MiB/s 24.24 c/B CBC dec | 6.17 ns/B 154.5 MiB/s 23.45 c/B CFB enc | 6.45 ns/B 147.9 MiB/s 24.51 c/B CFB dec | 6.20 ns/B 153.8 MiB/s 23.57 c/B OFB enc | 7.36 ns/B 129.6 MiB/s 27.96 c/B OFB dec | 7.36 ns/B 129.6 MiB/s 27.96 c/B CTR enc | 6.22 ns/B 153.2 MiB/s 23.65 c/B CTR dec | 6.22 ns/B 153.3 MiB/s 23.65 c/B XTS enc | 6.67 ns/B 142.9 MiB/s 25.36 c/B XTS dec | 6.70 ns/B 142.3 MiB/s 25.46 c/B CCM enc | 12.61 ns/B 75.60 MiB/s 47.93 c/B CCM dec | 12.62 ns/B 75.56 MiB/s 47.96 c/B CCM auth | 6.41 ns/B 148.8 MiB/s 24.36 c/B EAX enc | 12.62 ns/B 75.55 MiB/s 47.96 c/B EAX dec | 12.62 ns/B 75.55 MiB/s 47.97 c/B EAX auth | 6.39 ns/B 149.2 MiB/s 24.30 c/B GCM enc | 9.81 ns/B 97.24 MiB/s 37.27 c/B GCM dec | 9.81 ns/B 97.20 MiB/s 37.28 c/B GCM auth | 3.59 ns/B 265.8 MiB/s 13.63 c/B OCB enc | 6.39 ns/B 149.3 MiB/s 24.27 c/B OCB dec | 6.38 ns/B 149.5 MiB/s 24.25 c/B OCB auth | 6.35 ns/B 150.2 MiB/s 24.13 c/B After: ECB enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B ECB dec | 1.34 ns/B 711.1 MiB/s 5.10 c/B CBC enc | 2.13 ns/B 448.5 MiB/s 8.08 c/B CBC dec | 1.05 ns/B 908.0 MiB/s 3.99 c/B CFB enc | 2.17 ns/B 439.9 MiB/s 8.24 c/B CFB dec | 2.22 ns/B 429.8 MiB/s 8.43 c/B OFB enc | 1.49 ns/B 640.1 MiB/s 5.66 c/B OFB dec | 1.49 ns/B 640.1 MiB/s 5.66 c/B CTR enc | 2.21 ns/B 432.5 MiB/s 8.38 c/B CTR dec | 2.20 ns/B 432.5 MiB/s 8.38 c/B XTS enc | 2.32 ns/B 410.6 MiB/s 8.83 c/B XTS dec | 2.33 ns/B 409.7 MiB/s 8.85 c/B CCM enc | 4.36 ns/B 218.7 MiB/s 16.57 c/B CCM dec | 4.36 ns/B 218.8 MiB/s 16.56 c/B CCM auth | 2.17 ns/B 440.4 MiB/s 8.23 c/B EAX enc | 4.37 ns/B 218.3 MiB/s 16.60 c/B EAX dec | 4.36 ns/B 218.7 MiB/s 16.57 c/B EAX auth | 2.16 ns/B 440.7 MiB/s 8.22 c/B GCM enc | 5.78 ns/B 165.0 MiB/s 21.96 c/B GCM dec | 5.78 ns/B 165.0 MiB/s 21.96 c/B GCM auth | 3.59 ns/B 265.9 MiB/s 13.63 c/B OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B GnuPG-bug-id: 4529 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael/ppc: implement single-block mode, and implement OCB block cipherShawn Landden2019-08-261-0/+676
* cipher/rijndael-ppc.c: New implementation of single-block mode, and implementation of OCB mode. -- GnuPG-bug-id: 4529 [jk: split rijndael-ppc.c from patch 'rijndael/ppc: reimplement single-block mode, and implement OCB block cipher' for basis of new PowerPC vector crypto implementation of AES: https://lists.gnupg.org/pipermail/gcrypt-devel/2019-July/004765.html] [jk: coding-style fixes] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>