summaryrefslogtreecommitdiff
path: root/cipher/rijndael-internal.h
Commit message (Collapse)AuthorAgeFilesLines
* Simplify AES key schedule implementationJussi Kivilinna2022-07-311-5/+7
| | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_setkey): New key schedule with simplified structure and less stack usage. * cipher/rijndael-internal.h (RIJNDAEL_context_s): Add 'keyschedule32b'. (keyschenc32b): New. * cipher/rijndael-ppc-common.h (vec_u32): New. * cipher/rijndael-ppc.c (vec_bswap32_const): Remove. (_gcry_aes_sbox4_ppc8): Optimize for less instructions emitted. (keysched_idx): New. (_gcry_aes_ppc8_setkey): New key schedule with simplified structure. * cipher/rijndael-tables.h (rcon): Remove. * cipher/rijndael.c (sbox4): New. (do_setkey): New key schedule with simplified structure and less stack usage. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: add x86_64 VAES/AVX2 accelerated implementationJussi Kivilinna2021-02-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-vaes.c' and 'rijndael-vaes-avx2-amd64.S'. * cipher/rijndael-internal.h (USE_VAES): New. * cipher/rijndael-vaes-avx2-amd64.S: New. * cipher/rijndael-vaes.c: New. * cipher/rijndael.c (_gcry_aes_vaes_cfb_dec, _gcry_aes_vaes_cbc_dec) (_gcry_aes_vaes_ctr_enc, _gcry_aes_vaes_ocb_crypt) (_gcry_aes_vaes_xts_crypt): New. (do_setkey) [USE_VAES]: Add detection for VAES. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128) [USE_VAES]: Increase number of selftest blocks. * configure.ac: Add 'rijndael-vaes.lo' and 'rijndael-vaes-avx2-amd64.lo'. -- Patch adds VAES/AVX2 accelerated implementation for CBC-decryption, CFB-decryption, CTR-encryption, OCB-en/decryption and XTS-en/decryption. Benchmarks on AMD Ryzen 5800X: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.067 ns/B 14314 MiB/s 0.323 c/B 4850 CFB dec | 0.067 ns/B 14322 MiB/s 0.323 c/B 4850 CTR enc | 0.066 ns/B 14429 MiB/s 0.321 c/B 4850 CTR dec | 0.066 ns/B 14433 MiB/s 0.320 c/B 4850 XTS enc | 0.087 ns/B 10910 MiB/s 0.424 c/B 4850 XTS dec | 0.088 ns/B 10856 MiB/s 0.426 c/B 4850 OCB enc | 0.070 ns/B 13633 MiB/s 0.339 c/B 4850 OCB dec | 0.069 ns/B 13911 MiB/s 0.332 c/B 4850 After (XTS ~1.7x faster, others ~1.9x faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CBC dec | 0.034 ns/B 28159 MiB/s 0.164 c/B 4850 CFB dec | 0.034 ns/B 27955 MiB/s 0.165 c/B 4850 CTR enc | 0.034 ns/B 28214 MiB/s 0.164 c/B 4850 CTR dec | 0.034 ns/B 28146 MiB/s 0.164 c/B 4850 XTS enc | 0.051 ns/B 18539 MiB/s 0.249 c/B 4850 XTS dec | 0.051 ns/B 18655 MiB/s 0.248 c/B 4850 GCM auth | 0.088 ns/B 10817 MiB/s 0.428 c/B 4850 OCB enc | 0.037 ns/B 25824 MiB/s 0.179 c/B 4850 OCB dec | 0.038 ns/B 25359 MiB/s 0.182 c/B 4850 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: remove unused use_xxx flagsJussi Kivilinna2021-01-261-18/+2
| | | | | | | | | | | | * cipher/rijndael-internal.h (RIJNDAEL_context_s): Remove unused 'use_padlock', 'use_aesni', 'use_ssse3', 'use_arm_ce', 'use_ppc_crypto' and 'use_ppc9le_crypto'. * cipher/rijndael.c (do_setkey): Do not setup 'use_padlock', 'use_aesni', 'use_ssse3', 'use_arm_ce', 'use_ppc_crypto' and 'use_ppc9le_crypto'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add s390x/zSeries acceleration for AESJussi Kivilinna2020-12-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac: Add 'rijndael-s390x.lo'. * cipher/Makefile.am: Add 'rijndael-s390x.c'. * cipher/rijndael-internal.c (USE_S390X_CRYPTO): New. (RIJNDAEL_context_s) [USE_S390X_CRYPTO]: New 'km*_func' members. * cipher/rijndael-s390x.c: New. * cipher/rijndael.c (_gcry_aes_s390x_setup_acceleration) (_gcry_aes_s390x_setup_setkey) (_gcry_aes_s390x_setup_prepare_decryption, _gcry_aes_s390x_encrypt) (_gcry_aes_s390x_decrypt): New. (do_setkey) [USE_S390X_CRYPTO]: Add s390x acceleration setup. -- Patchs adds acceleration for single-block AES and following modes: - CBC, CBC-MAC, CFB, OFB, CTR, XTS and OCB Benchmarks (z15, 5.2Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 3.81 ns/B 250.2 MiB/s 19.82 c/B ECB dec | 4.13 ns/B 231.1 MiB/s 21.46 c/B CBC enc | 3.69 ns/B 258.5 MiB/s 19.19 c/B CBC dec | 3.71 ns/B 257.1 MiB/s 19.29 c/B CFB enc | 3.69 ns/B 258.7 MiB/s 19.17 c/B CFB dec | 3.56 ns/B 267.8 MiB/s 18.52 c/B OFB enc | 3.85 ns/B 247.8 MiB/s 20.01 c/B OFB dec | 3.85 ns/B 247.9 MiB/s 20.01 c/B CTR enc | 3.65 ns/B 261.6 MiB/s 18.96 c/B CTR dec | 3.64 ns/B 261.6 MiB/s 18.95 c/B XTS enc | 3.66 ns/B 260.8 MiB/s 19.02 c/B XTS dec | 3.75 ns/B 254.2 MiB/s 19.51 c/B CCM enc | 7.34 ns/B 129.9 MiB/s 38.19 c/B CCM dec | 7.34 ns/B 129.9 MiB/s 38.19 c/B CCM auth | 3.70 ns/B 257.6 MiB/s 19.25 c/B EAX enc | 7.34 ns/B 129.8 MiB/s 38.19 c/B EAX dec | 7.35 ns/B 129.8 MiB/s 38.20 c/B EAX auth | 3.70 ns/B 257.8 MiB/s 19.24 c/B GCM enc | 6.22 ns/B 153.3 MiB/s 32.36 c/B GCM dec | 6.23 ns/B 153.0 MiB/s 32.42 c/B GCM auth | 2.59 ns/B 368.9 MiB/s 13.44 c/B OCB enc | 3.82 ns/B 249.7 MiB/s 19.86 c/B OCB dec | 3.90 ns/B 244.2 MiB/s 20.31 c/B OCB auth | 3.88 ns/B 245.5 MiB/s 20.20 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 2.10 ns/B 453.1 MiB/s 10.94 c/B ECB dec | 2.11 ns/B 453.0 MiB/s 10.95 c/B CBC enc | 0.182 ns/B 5240 MiB/s 0.946 c/B CBC dec | 0.044 ns/B 21581 MiB/s 0.230 c/B CFB enc | 0.206 ns/B 4623 MiB/s 1.07 c/B CFB dec | 0.140 ns/B 6826 MiB/s 0.727 c/B OFB enc | 0.183 ns/B 5222 MiB/s 0.950 c/B OFB dec | 0.182 ns/B 5252 MiB/s 0.944 c/B CTR enc | 0.059 ns/B 16095 MiB/s 0.308 c/B CTR dec | 0.059 ns/B 16045 MiB/s 0.309 c/B XTS enc | 0.043 ns/B 21998 MiB/s 0.225 c/B XTS dec | 0.043 ns/B 22012 MiB/s 0.225 c/B CCM enc | 0.239 ns/B 3989 MiB/s 1.24 c/B CCM dec | 0.239 ns/B 3987 MiB/s 1.24 c/B CCM auth | 0.180 ns/B 5288 MiB/s 0.938 c/B EAX enc | 0.242 ns/B 3940 MiB/s 1.26 c/B EAX dec | 0.243 ns/B 3926 MiB/s 1.26 c/B EAX auth | 0.183 ns/B 5218 MiB/s 0.950 c/B GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B OCB enc | 0.186 ns/B 5132 MiB/s 0.966 c/B OCB dec | 0.176 ns/B 5414 MiB/s 0.916 c/B OCB auth | 0.149 ns/B 6394 MiB/s 0.776 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: clean-up prepare_decryption functionJussi Kivilinna2020-09-271-0/+2
| | | | | | | | | | | | | | * cipher/rijndael-internal.h (rijndael_prepare_decfn_t): New. (RIJNDAEL_context_s): New member 'prepare_decryption'. * cipher/rijndael-padlock.c (_gcry_aes_padlock_prepare_decryption): New. * cipher/rijndael.c (_gcry_aes_padlock_prepare_decryption): New. (do_setkey): Setup 'ctx->prepare_decryption' for each acceleration type. (prepare_decryption): Remove calls to other prepare decryption functions. (check_decryption_preparation): Call 'ctx->prepare_decryption' instead of 'prepare_decryption'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add POWER9 little-endian variant of PPC AES implementationJussi Kivilinna2020-02-021-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac: Add 'rijndael-ppc9le.lo'. * cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New. (RIJNDAEL_context_s): Add 'use_ppc9le_crypto'. * cipher/rijndael.c (_gcry_aes_ppc9le_encrypt) (_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc) (_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc) (_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec) (_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth) (_gcry_aes_ppc9le_xts_crypt): New. (do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt) [USE_PPC_CRYPTO_WITH_PPC9LE]: New. * cipher/rijndael-ppc.c: Split common code to headers 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'. (asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New. * cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'. (CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2. (XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm instruction. * cipher/rijndael-ppc9le.c: New. -- Provide POWER9 little-endian optimized variant of PPC vcrypto AES implementation. This implementation uses 'lxvb16x' and 'stxvb16x' instructions to load/store vectors directly in big-endian order. Benchmark on POWER9 (~3.8Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster): AES | nanosecs/byte mebibytes/sec cycles/byte CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael-ppc: add key setup and enable single block PowerPC AESJussi Kivilinna2019-08-261-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-ppc.c'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO): New. (RIJNDAEL_context): Add 'use_ppc_crypto'. * cipher/rijndael-ppc.c (backwards, swap_if_le): Remove. (u128_t, ALWAYS_INLINE, NO_INLINE, NO_INSTRUMENT_FUNCTION) (ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE, ASM_FUNC_ATTR_NOINLINE) (ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE, VEC_STORE_BE) (vec_bswap32_const, vec_aligned_ld, vec_load_be_const) (vec_load_be, vec_aligned_st, vec_store_be, _gcry_aes_sbox4_ppc8) (_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption) (aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): New. (_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt): Rewrite. (_gcry_aes_ppc8_ocb_crypt): Comment out. * cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_setkey) (_gcry_aes_ppc8_prepare_decryption, _gcry_aes_ppc8_encrypt) (_gcry_aes_ppc8_decrypt): New prototypes. (do_setkey) [USE_PPC_CRYPTO]: Add setup for PowerPC AES. (prepare_decryption) [USE_PPC_CRYPTO]: Ditto. * configure.ac: Add 'rijndael-ppc.lo'. (gcry_cv_ppc_altivec, gcry_cv_cc_ppc_altivec_cflags) (gcry_cv_gcc_inline_asm_ppc_altivec) (gcry_cv_gcc_inline_asm_ppc_arch_3_00): New checks. -- Benchmark on POWER8 ~3.8Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 7.27 ns/B 131.2 MiB/s 27.61 c/B ECB dec | 7.70 ns/B 123.8 MiB/s 29.28 c/B CBC enc | 6.38 ns/B 149.5 MiB/s 24.24 c/B CBC dec | 6.17 ns/B 154.5 MiB/s 23.45 c/B CFB enc | 6.45 ns/B 147.9 MiB/s 24.51 c/B CFB dec | 6.20 ns/B 153.8 MiB/s 23.57 c/B OFB enc | 7.36 ns/B 129.6 MiB/s 27.96 c/B OFB dec | 7.36 ns/B 129.6 MiB/s 27.96 c/B CTR enc | 6.22 ns/B 153.2 MiB/s 23.65 c/B CTR dec | 6.22 ns/B 153.3 MiB/s 23.65 c/B XTS enc | 6.67 ns/B 142.9 MiB/s 25.36 c/B XTS dec | 6.70 ns/B 142.3 MiB/s 25.46 c/B CCM enc | 12.61 ns/B 75.60 MiB/s 47.93 c/B CCM dec | 12.62 ns/B 75.56 MiB/s 47.96 c/B CCM auth | 6.41 ns/B 148.8 MiB/s 24.36 c/B EAX enc | 12.62 ns/B 75.55 MiB/s 47.96 c/B EAX dec | 12.62 ns/B 75.55 MiB/s 47.97 c/B EAX auth | 6.39 ns/B 149.2 MiB/s 24.30 c/B GCM enc | 9.81 ns/B 97.24 MiB/s 37.27 c/B GCM dec | 9.81 ns/B 97.20 MiB/s 37.28 c/B GCM auth | 3.59 ns/B 265.8 MiB/s 13.63 c/B OCB enc | 6.39 ns/B 149.3 MiB/s 24.27 c/B OCB dec | 6.38 ns/B 149.5 MiB/s 24.25 c/B OCB auth | 6.35 ns/B 150.2 MiB/s 24.13 c/B After: ECB enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B ECB dec | 1.34 ns/B 711.1 MiB/s 5.10 c/B CBC enc | 2.13 ns/B 448.5 MiB/s 8.08 c/B CBC dec | 1.05 ns/B 908.0 MiB/s 3.99 c/B CFB enc | 2.17 ns/B 439.9 MiB/s 8.24 c/B CFB dec | 2.22 ns/B 429.8 MiB/s 8.43 c/B OFB enc | 1.49 ns/B 640.1 MiB/s 5.66 c/B OFB dec | 1.49 ns/B 640.1 MiB/s 5.66 c/B CTR enc | 2.21 ns/B 432.5 MiB/s 8.38 c/B CTR dec | 2.20 ns/B 432.5 MiB/s 8.38 c/B XTS enc | 2.32 ns/B 410.6 MiB/s 8.83 c/B XTS dec | 2.33 ns/B 409.7 MiB/s 8.85 c/B CCM enc | 4.36 ns/B 218.7 MiB/s 16.57 c/B CCM dec | 4.36 ns/B 218.8 MiB/s 16.56 c/B CCM auth | 2.17 ns/B 440.4 MiB/s 8.23 c/B EAX enc | 4.37 ns/B 218.3 MiB/s 16.60 c/B EAX dec | 4.36 ns/B 218.7 MiB/s 16.57 c/B EAX auth | 2.16 ns/B 440.7 MiB/s 8.22 c/B GCM enc | 5.78 ns/B 165.0 MiB/s 21.96 c/B GCM dec | 5.78 ns/B 165.0 MiB/s 21.96 c/B GCM auth | 3.59 ns/B 265.9 MiB/s 13.63 c/B OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B GnuPG-bug-id: 4529 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* AES: move look-up tables to .data section and unshare between processesJussi Kivilinna2019-06-051-1/+3
| | | | | | | | | | | | | | | | | | | | * cipher/rijndael-internal.h (ATTR_ALIGNED_64): New. * cipher/rijndael-tables.h (encT): Move to 'enc_tables' structure. (enc_tables): New structure for encryption table with counters before and after. (encT): New macro. (dec_tables): Add counters before and after encryption table; Move from .rodata to .data section. (do_encrypt): Change 'encT' to 'enc_tables.T'. (do_decrypt): Change '&dec_tables' to 'dec_tables.T'. * cipher/cipher-gcm.c (prefetch_table): Make inline; Handle input with length not multiple of 256. (prefetch_enc, prefetch_dec): Modify pre- and post-table counters to unshare look-up table pages between processes. -- GnuPG-bug-id: 4541 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Optimizations for AES-NI OCBJussi Kivilinna2018-11-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB values L0L1 and L0L1L0; Swap dimensions for OCB L table. * cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and L0L1L0 values. (ocb_crypt): Process input in 24KiB chunks for better cache locality for checksumming. * cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always inlining functions, change all functions with 'inline' to use ALWAYS_INLINE. (NO_INLINE): New macro. (aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to... (aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and adjust accordingly (xmm7 moved from *_7_15 to *_2_7). (aesni_prepare_2_6, aesni_prepare_7_15): Rename to... (aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust accordingly. (aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to... (aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust accordingly. (aesni_ocb_checksum): New. (aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum calculation as separate pass instead of inline; Use NO_INLINE. (_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel with help of pre-computed offsets L0+L1 ja L0+L1+L0. * cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add 'use_avx2' and 'use_avx'. * cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW feature is available. * tests/basic.c (do_check_ocb_cipher): New test vector; increase size of temporary buffers for new test vector. (check_ocb_cipher_largebuf_split): Make test plaintext non-uniform for better checksum testing. (check_ocb_cipher_checksum): New. (check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum. (check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf test runs. -- Benchmark on Haswell i7-4970k @ 4.0Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 0.175 ns/B 5436 MiB/s 0.702 c/B OCB dec | 0.184 ns/B 5184 MiB/s 0.736 c/B OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B After (enc +2% faster, dec +7% faster): OCB enc | 0.172 ns/B 5547 MiB/s 0.688 c/B OCB dec | 0.171 ns/B 5582 MiB/s 0.683 c/B OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/AArch64 Crypto Extension implementation of AESJussi Kivilinna2016-09-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-armv-aarch64-ce.S'. * cipher/rijndael-armv8-aarch64-ce.S: New. * cipher/rijndael-internal.h (USE_ARM_CE): Enable for ARMv8/AArch64. * configure.ac: Add 'rijndael-armv-aarch64-ce.lo' and 'rijndael-armv8-ce.lo' for ARMv8/AArch64. -- Improvement vs AArch64 assembly on Cortex-A53: AES-128 AES-192 AES-256 CBC enc: 13.19x 13.53x 13.76x CBC dec: 20.53x 21.91x 22.60x CFB enc: 14.29x 14.50x 14.63x CFB dec: 20.42x 21.69x 22.50x CTR: 18.29x 19.61x 20.53x OCB enc: 15.21x 16.32x 17.12x OCB dec: 14.95x 16.11x 16.88x OCB auth: 16.73x 17.93x 18.66x Benchmark on Cortex-A53 (1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 21.86 ns/B 43.62 MiB/s 25.19 c/B ECB dec | 22.68 ns/B 42.05 MiB/s 26.13 c/B CBC enc | 18.66 ns/B 51.10 MiB/s 21.50 c/B CBC dec | 18.72 ns/B 50.95 MiB/s 21.56 c/B CFB enc | 18.61 ns/B 51.25 MiB/s 21.44 c/B CFB dec | 18.61 ns/B 51.25 MiB/s 21.44 c/B OFB enc | 22.84 ns/B 41.75 MiB/s 26.31 c/B OFB dec | 22.84 ns/B 41.75 MiB/s 26.31 c/B CTR enc | 18.89 ns/B 50.50 MiB/s 21.76 c/B CTR dec | 18.89 ns/B 50.50 MiB/s 21.76 c/B CCM enc | 37.55 ns/B 25.40 MiB/s 43.25 c/B CCM dec | 37.55 ns/B 25.40 MiB/s 43.25 c/B CCM auth | 18.77 ns/B 50.80 MiB/s 21.63 c/B GCM enc | 20.18 ns/B 47.25 MiB/s 23.25 c/B GCM dec | 20.18 ns/B 47.25 MiB/s 23.25 c/B GCM auth | 1.30 ns/B 732.5 MiB/s 1.50 c/B OCB enc | 19.67 ns/B 48.48 MiB/s 22.66 c/B OCB dec | 19.73 ns/B 48.34 MiB/s 22.72 c/B OCB auth | 19.46 ns/B 49.00 MiB/s 22.42 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B ECB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B CBC enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B CBC dec | 22.25 ns/B 42.87 MiB/s 25.63 c/B CFB enc | 22.03 ns/B 43.30 MiB/s 25.38 c/B CFB dec | 22.03 ns/B 43.29 MiB/s 25.38 c/B OFB enc | 26.26 ns/B 36.32 MiB/s 30.25 c/B OFB dec | 26.26 ns/B 36.32 MiB/s 30.25 c/B CTR enc | 22.30 ns/B 42.76 MiB/s 25.69 c/B CTR dec | 22.30 ns/B 42.76 MiB/s 25.69 c/B CCM enc | 44.38 ns/B 21.49 MiB/s 51.13 c/B CCM dec | 44.38 ns/B 21.49 MiB/s 51.13 c/B CCM auth | 22.20 ns/B 42.97 MiB/s 25.57 c/B GCM enc | 23.60 ns/B 40.41 MiB/s 27.19 c/B GCM dec | 23.60 ns/B 40.41 MiB/s 27.19 c/B GCM auth | 1.30 ns/B 732.4 MiB/s 1.50 c/B OCB enc | 23.09 ns/B 41.31 MiB/s 26.60 c/B OCB dec | 23.21 ns/B 41.09 MiB/s 26.74 c/B OCB auth | 22.88 ns/B 41.68 MiB/s 26.36 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 28.76 ns/B 33.17 MiB/s 33.13 c/B ECB dec | 29.46 ns/B 32.37 MiB/s 33.94 c/B CBC enc | 25.45 ns/B 37.48 MiB/s 29.31 c/B CBC dec | 25.50 ns/B 37.40 MiB/s 29.38 c/B CFB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B CFB dec | 25.39 ns/B 37.56 MiB/s 29.25 c/B OFB enc | 29.62 ns/B 32.19 MiB/s 34.13 c/B OFB dec | 29.62 ns/B 32.19 MiB/s 34.13 c/B CTR enc | 25.67 ns/B 37.15 MiB/s 29.57 c/B CTR dec | 25.67 ns/B 37.15 MiB/s 29.57 c/B CCM enc | 51.11 ns/B 18.66 MiB/s 58.88 c/B CCM dec | 51.11 ns/B 18.66 MiB/s 58.88 c/B CCM auth | 25.56 ns/B 37.32 MiB/s 29.44 c/B GCM enc | 26.96 ns/B 35.37 MiB/s 31.06 c/B GCM dec | 26.98 ns/B 35.35 MiB/s 31.08 c/B GCM auth | 1.30 ns/B 733.4 MiB/s 1.50 c/B OCB enc | 26.45 ns/B 36.05 MiB/s 30.47 c/B OCB dec | 26.53 ns/B 35.95 MiB/s 30.56 c/B OCB auth | 26.24 ns/B 36.34 MiB/s 30.23 c/B = After: Cipher: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.83 ns/B 197.5 MiB/s 5.56 c/B ECB dec | 4.99 ns/B 191.1 MiB/s 5.75 c/B CBC enc | 1.41 ns/B 675.5 MiB/s 1.63 c/B CBC dec | 0.911 ns/B 1046.9 MiB/s 1.05 c/B CFB enc | 1.30 ns/B 732.2 MiB/s 1.50 c/B CFB dec | 0.911 ns/B 1046.7 MiB/s 1.05 c/B OFB enc | 5.81 ns/B 164.3 MiB/s 6.69 c/B OFB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B CTR enc | 1.03 ns/B 924.0 MiB/s 1.19 c/B CTR dec | 1.03 ns/B 924.1 MiB/s 1.19 c/B CCM enc | 2.50 ns/B 381.8 MiB/s 2.88 c/B CCM dec | 2.50 ns/B 381.7 MiB/s 2.88 c/B CCM auth | 1.57 ns/B 606.1 MiB/s 1.81 c/B GCM enc | 2.33 ns/B 408.5 MiB/s 2.69 c/B GCM dec | 2.34 ns/B 408.4 MiB/s 2.69 c/B GCM auth | 1.30 ns/B 732.1 MiB/s 1.50 c/B OCB enc | 1.29 ns/B 736.6 MiB/s 1.49 c/B OCB dec | 1.32 ns/B 724.4 MiB/s 1.52 c/B OCB auth | 1.16 ns/B 819.6 MiB/s 1.34 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.48 ns/B 174.0 MiB/s 6.31 c/B ECB dec | 5.64 ns/B 169.0 MiB/s 6.50 c/B CBC enc | 1.63 ns/B 585.8 MiB/s 1.88 c/B CBC dec | 1.02 ns/B 935.8 MiB/s 1.17 c/B CFB enc | 1.52 ns/B 627.7 MiB/s 1.75 c/B CFB dec | 1.02 ns/B 935.9 MiB/s 1.17 c/B OFB enc | 6.46 ns/B 147.7 MiB/s 7.44 c/B OFB dec | 6.46 ns/B 147.7 MiB/s 7.44 c/B CTR enc | 1.14 ns/B 836.1 MiB/s 1.31 c/B CTR dec | 1.14 ns/B 835.9 MiB/s 1.31 c/B CCM enc | 2.83 ns/B 337.6 MiB/s 3.25 c/B CCM dec | 2.82 ns/B 338.0 MiB/s 3.25 c/B CCM auth | 1.79 ns/B 532.7 MiB/s 2.06 c/B GCM enc | 2.44 ns/B 390.3 MiB/s 2.82 c/B GCM dec | 2.44 ns/B 390.2 MiB/s 2.82 c/B GCM auth | 1.30 ns/B 731.9 MiB/s 1.50 c/B OCB enc | 1.41 ns/B 674.7 MiB/s 1.63 c/B OCB dec | 1.44 ns/B 662.0 MiB/s 1.66 c/B OCB auth | 1.28 ns/B 746.1 MiB/s 1.47 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 6.13 ns/B 155.5 MiB/s 7.06 c/B ECB dec | 6.29 ns/B 151.5 MiB/s 7.25 c/B CBC enc | 1.85 ns/B 516.8 MiB/s 2.13 c/B CBC dec | 1.13 ns/B 845.6 MiB/s 1.30 c/B CFB enc | 1.74 ns/B 549.5 MiB/s 2.00 c/B CFB dec | 1.13 ns/B 846.1 MiB/s 1.30 c/B OFB enc | 7.11 ns/B 134.2 MiB/s 8.19 c/B OFB dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B CTR enc | 1.25 ns/B 763.5 MiB/s 1.44 c/B CTR dec | 1.25 ns/B 763.4 MiB/s 1.44 c/B CCM enc | 3.15 ns/B 302.9 MiB/s 3.63 c/B CCM dec | 3.15 ns/B 302.9 MiB/s 3.63 c/B CCM auth | 2.01 ns/B 474.2 MiB/s 2.32 c/B GCM enc | 2.55 ns/B 374.2 MiB/s 2.94 c/B GCM dec | 2.55 ns/B 373.7 MiB/s 2.94 c/B GCM auth | 1.30 ns/B 732.2 MiB/s 1.50 c/B OCB enc | 1.54 ns/B 617.6 MiB/s 1.78 c/B OCB dec | 1.57 ns/B 606.8 MiB/s 1.81 c/B OCB auth | 1.40 ns/B 679.8 MiB/s 1.62 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add AArch64 assembly implementation of AESJussi Kivilinna2016-09-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-aarch64.S'. * cipher/rijndael-aarch64.S: New. * cipher/rijndael-internal.h: Enable USE_ARM_ASM if __AARCH64EL__ and HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS defined. * configure.ac (gcry_cv_gcc_aarch64_platform_as_ok): New check. [host=aarch64]: Add 'rijndael-aarch64.lo'. -- Patch adds ARMv8/Aarch64 implementation of AES. Benchmark on Cortex-A53 (1536 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 19.37 ns/B 49.22 MiB/s 29.76 c/B ECB dec | 19.85 ns/B 48.03 MiB/s 30.50 c/B CBC enc | 16.84 ns/B 56.62 MiB/s 25.87 c/B CBC dec | 16.81 ns/B 56.74 MiB/s 25.82 c/B CFB enc | 16.80 ns/B 56.75 MiB/s 25.81 c/B CFB dec | 16.81 ns/B 56.75 MiB/s 25.81 c/B OFB enc | 20.02 ns/B 47.64 MiB/s 30.75 c/B OFB dec | 20.02 ns/B 47.64 MiB/s 30.75 c/B CTR enc | 17.06 ns/B 55.91 MiB/s 26.20 c/B CTR dec | 17.06 ns/B 55.92 MiB/s 26.20 c/B CCM enc | 33.94 ns/B 28.10 MiB/s 52.13 c/B CCM dec | 33.94 ns/B 28.10 MiB/s 52.14 c/B CCM auth | 16.97 ns/B 56.18 MiB/s 26.07 c/B GCM enc | 28.70 ns/B 33.23 MiB/s 44.09 c/B GCM dec | 28.70 ns/B 33.23 MiB/s 44.09 c/B GCM auth | 11.66 ns/B 81.81 MiB/s 17.90 c/B OCB enc | 17.66 ns/B 53.99 MiB/s 27.13 c/B OCB dec | 17.61 ns/B 54.16 MiB/s 27.05 c/B OCB auth | 17.44 ns/B 54.69 MiB/s 26.78 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B ECB dec | 22.55 ns/B 42.30 MiB/s 34.63 c/B CBC enc | 19.33 ns/B 49.33 MiB/s 29.70 c/B CBC dec | 19.50 ns/B 48.91 MiB/s 29.95 c/B CFB enc | 19.29 ns/B 49.44 MiB/s 29.63 c/B CFB dec | 19.28 ns/B 49.46 MiB/s 29.61 c/B OFB enc | 22.49 ns/B 42.40 MiB/s 34.55 c/B OFB dec | 22.50 ns/B 42.38 MiB/s 34.56 c/B CTR enc | 19.53 ns/B 48.83 MiB/s 30.00 c/B CTR dec | 19.54 ns/B 48.80 MiB/s 30.02 c/B CCM enc | 38.91 ns/B 24.51 MiB/s 59.77 c/B CCM dec | 38.90 ns/B 24.51 MiB/s 59.76 c/B CCM auth | 19.45 ns/B 49.02 MiB/s 29.88 c/B GCM enc | 31.13 ns/B 30.63 MiB/s 47.82 c/B GCM dec | 31.14 ns/B 30.63 MiB/s 47.82 c/B GCM auth | 11.66 ns/B 81.80 MiB/s 17.91 c/B OCB enc | 20.15 ns/B 47.33 MiB/s 30.95 c/B OCB dec | 20.30 ns/B 46.98 MiB/s 31.18 c/B OCB auth | 19.92 ns/B 47.88 MiB/s 30.59 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 24.33 ns/B 39.19 MiB/s 37.38 c/B ECB dec | 25.23 ns/B 37.80 MiB/s 38.76 c/B CBC enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B CBC dec | 22.18 ns/B 42.99 MiB/s 34.07 c/B CFB enc | 21.77 ns/B 43.80 MiB/s 33.44 c/B CFB dec | 21.77 ns/B 43.81 MiB/s 33.44 c/B OFB enc | 24.99 ns/B 38.16 MiB/s 38.39 c/B OFB dec | 24.99 ns/B 38.17 MiB/s 38.38 c/B CTR enc | 22.02 ns/B 43.32 MiB/s 33.82 c/B CTR dec | 22.02 ns/B 43.31 MiB/s 33.82 c/B CCM enc | 43.86 ns/B 21.74 MiB/s 67.38 c/B CCM dec | 43.87 ns/B 21.74 MiB/s 67.39 c/B CCM auth | 21.94 ns/B 43.48 MiB/s 33.69 c/B GCM enc | 33.66 ns/B 28.33 MiB/s 51.71 c/B GCM dec | 33.66 ns/B 28.33 MiB/s 51.70 c/B GCM auth | 11.69 ns/B 81.59 MiB/s 17.95 c/B OCB enc | 22.90 ns/B 41.65 MiB/s 35.17 c/B OCB dec | 23.25 ns/B 41.02 MiB/s 35.71 c/B OCB auth | 22.69 ns/B 42.03 MiB/s 34.85 c/B = After (~1.2x faster): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 16.40 ns/B 58.16 MiB/s 25.19 c/B ECB dec | 17.01 ns/B 56.07 MiB/s 26.13 c/B CBC enc | 13.99 ns/B 68.15 MiB/s 21.49 c/B CBC dec | 14.04 ns/B 67.94 MiB/s 21.56 c/B CFB enc | 13.96 ns/B 68.32 MiB/s 21.44 c/B CFB dec | 13.95 ns/B 68.34 MiB/s 21.43 c/B OFB enc | 17.14 ns/B 55.65 MiB/s 26.32 c/B OFB dec | 17.13 ns/B 55.67 MiB/s 26.31 c/B CTR enc | 14.17 ns/B 67.31 MiB/s 21.76 c/B CTR dec | 14.17 ns/B 67.29 MiB/s 21.77 c/B CCM enc | 28.16 ns/B 33.86 MiB/s 43.26 c/B CCM dec | 28.16 ns/B 33.87 MiB/s 43.26 c/B CCM auth | 14.08 ns/B 67.71 MiB/s 21.63 c/B GCM enc | 25.82 ns/B 36.94 MiB/s 39.66 c/B GCM dec | 25.82 ns/B 36.94 MiB/s 39.65 c/B GCM auth | 11.67 ns/B 81.74 MiB/s 17.92 c/B OCB enc | 14.78 ns/B 64.55 MiB/s 22.69 c/B OCB dec | 14.80 ns/B 64.43 MiB/s 22.74 c/B OCB auth | 14.59 ns/B 65.36 MiB/s 22.41 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 19.05 ns/B 50.07 MiB/s 29.25 c/B ECB dec | 19.62 ns/B 48.62 MiB/s 30.13 c/B CBC enc | 16.56 ns/B 57.59 MiB/s 25.44 c/B CBC dec | 16.69 ns/B 57.14 MiB/s 25.64 c/B CFB enc | 16.52 ns/B 57.71 MiB/s 25.38 c/B CFB dec | 16.52 ns/B 57.73 MiB/s 25.37 c/B OFB enc | 19.70 ns/B 48.41 MiB/s 30.26 c/B OFB dec | 19.69 ns/B 48.43 MiB/s 30.24 c/B CTR enc | 16.73 ns/B 57.00 MiB/s 25.70 c/B CTR dec | 16.73 ns/B 57.01 MiB/s 25.70 c/B CCM enc | 33.29 ns/B 28.65 MiB/s 51.13 c/B CCM dec | 33.29 ns/B 28.65 MiB/s 51.13 c/B CCM auth | 16.65 ns/B 57.29 MiB/s 25.57 c/B GCM enc | 28.39 ns/B 33.60 MiB/s 43.60 c/B GCM dec | 28.39 ns/B 33.59 MiB/s 43.60 c/B GCM auth | 11.64 ns/B 81.92 MiB/s 17.88 c/B OCB enc | 17.33 ns/B 55.03 MiB/s 26.62 c/B OCB dec | 17.40 ns/B 54.82 MiB/s 26.72 c/B OCB auth | 17.16 ns/B 55.59 MiB/s 26.35 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 21.56 ns/B 44.23 MiB/s 33.12 c/B ECB dec | 22.09 ns/B 43.17 MiB/s 33.93 c/B CBC enc | 19.09 ns/B 49.97 MiB/s 29.31 c/B CBC dec | 19.13 ns/B 49.86 MiB/s 29.38 c/B CFB enc | 19.04 ns/B 50.09 MiB/s 29.24 c/B CFB dec | 19.04 ns/B 50.08 MiB/s 29.25 c/B OFB enc | 22.22 ns/B 42.93 MiB/s 34.13 c/B OFB dec | 22.22 ns/B 42.92 MiB/s 34.13 c/B CTR enc | 19.25 ns/B 49.53 MiB/s 29.57 c/B CTR dec | 19.25 ns/B 49.55 MiB/s 29.57 c/B CCM enc | 38.33 ns/B 24.88 MiB/s 58.88 c/B CCM dec | 38.34 ns/B 24.88 MiB/s 58.88 c/B CCM auth | 19.17 ns/B 49.76 MiB/s 29.44 c/B GCM enc | 30.91 ns/B 30.86 MiB/s 47.47 c/B GCM dec | 30.91 ns/B 30.85 MiB/s 47.48 c/B GCM auth | 11.71 ns/B 81.47 MiB/s 17.98 c/B OCB enc | 19.85 ns/B 48.04 MiB/s 30.49 c/B OCB dec | 19.89 ns/B 47.95 MiB/s 30.55 c/B OCB auth | 19.67 ns/B 48.48 MiB/s 30.22 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/AArch32 Crypto Extension implementation of AESJussi Kivilinna2016-07-141-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and 'rijndael-armv-aarch32-ce.S'. * cipher/rijndael-armv8-aarch32-ce.S: New. * cipher/rijndael-armv8-ce.c: New. * cipher/rijndael-internal.h (USE_ARM_CE): New. (RIJNDAEL_context_s): Add 'use_arm_ce'. * cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey) (_gcry_aes_armv8_ce_prepare_decryption) (_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt) (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_ocb_auth): New. (do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key setup for ARM CE. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add ARM CE support. * configure.ac: Add 'rijndael-armv8-ce.lo' and 'rijndael-armv8-aarch32-ce.lo'. -- Improvement vs ARM assembly on Cortex-A53: AES-128 AES-192 AES-256 CBC enc: 14.8x 12.8x 11.4x CBC dec: 21.4x 20.5x 19.4x CFB enc: 16.2x 13.6x 11.6x CFB dec: 21.6x 20.5x 19.4x CTR: 19.1x 18.6x 17.8x OCB enc: 16.0x 16.2x 16.1x OCB dec: 15.6x 15.9x 15.8x OCB auth: 18.3x 18.4x 18.0x Benchmark on Cortex-A53 (1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 24.42 ns/B 39.06 MiB/s 28.13 c/B ECB dec | 25.07 ns/B 38.05 MiB/s 28.88 c/B CBC enc | 21.05 ns/B 45.30 MiB/s 24.25 c/B CBC dec | 21.16 ns/B 45.07 MiB/s 24.38 c/B CFB enc | 21.05 ns/B 45.31 MiB/s 24.25 c/B CFB dec | 21.38 ns/B 44.61 MiB/s 24.62 c/B OFB enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B OFB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B CTR enc | 21.17 ns/B 45.06 MiB/s 24.38 c/B CTR dec | 21.16 ns/B 45.06 MiB/s 24.38 c/B CCM enc | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM dec | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM auth | 21.17 ns/B 45.06 MiB/s 24.38 c/B GCM enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B GCM dec | 22.08 ns/B 43.18 MiB/s 25.44 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 26.20 ns/B 36.40 MiB/s 30.18 c/B OCB dec | 25.97 ns/B 36.73 MiB/s 29.91 c/B OCB auth | 24.52 ns/B 38.90 MiB/s 28.24 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 27.83 ns/B 34.26 MiB/s 32.06 c/B ECB dec | 28.54 ns/B 33.42 MiB/s 32.88 c/B CBC enc | 24.47 ns/B 38.97 MiB/s 28.19 c/B CBC dec | 25.27 ns/B 37.74 MiB/s 29.11 c/B CFB enc | 25.08 ns/B 38.02 MiB/s 28.89 c/B CFB dec | 25.31 ns/B 37.68 MiB/s 29.16 c/B OFB enc | 29.57 ns/B 32.25 MiB/s 34.06 c/B OFB dec | 29.57 ns/B 32.25 MiB/s 34.06 c/B CTR enc | 25.24 ns/B 37.78 MiB/s 29.08 c/B CTR dec | 25.24 ns/B 37.79 MiB/s 29.08 c/B CCM enc | 49.81 ns/B 19.15 MiB/s 57.38 c/B CCM dec | 49.80 ns/B 19.15 MiB/s 57.37 c/B CCM auth | 24.58 ns/B 38.80 MiB/s 28.32 c/B GCM enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B GCM dec | 26.11 ns/B 36.52 MiB/s 30.08 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 29.59 ns/B 32.23 MiB/s 34.09 c/B OCB dec | 29.42 ns/B 32.42 MiB/s 33.89 c/B OCB auth | 27.92 ns/B 34.16 MiB/s 32.16 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 31.20 ns/B 30.57 MiB/s 35.94 c/B ECB dec | 31.80 ns/B 29.99 MiB/s 36.63 c/B CBC enc | 27.83 ns/B 34.27 MiB/s 32.06 c/B CBC dec | 27.87 ns/B 34.21 MiB/s 32.11 c/B CFB enc | 27.88 ns/B 34.20 MiB/s 32.12 c/B CFB dec | 28.16 ns/B 33.87 MiB/s 32.44 c/B OFB enc | 32.93 ns/B 28.96 MiB/s 37.94 c/B OFB dec | 32.93 ns/B 28.96 MiB/s 37.94 c/B CTR enc | 27.95 ns/B 34.13 MiB/s 32.19 c/B CTR dec | 27.95 ns/B 34.12 MiB/s 32.20 c/B CCM enc | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM dec | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM auth | 27.95 ns/B 34.12 MiB/s 32.20 c/B GCM enc | 28.86 ns/B 33.05 MiB/s 33.25 c/B GCM dec | 28.87 ns/B 33.04 MiB/s 33.25 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 32.96 ns/B 28.94 MiB/s 37.97 c/B OCB dec | 32.73 ns/B 29.14 MiB/s 37.70 c/B OCB auth | 31.29 ns/B 30.48 MiB/s 36.04 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.10 ns/B 187.0 MiB/s 5.88 c/B ECB dec | 5.27 ns/B 181.0 MiB/s 6.07 c/B CBC enc | 1.41 ns/B 675.8 MiB/s 1.63 c/B CBC dec | 0.992 ns/B 961.7 MiB/s 1.14 c/B CFB enc | 1.30 ns/B 732.4 MiB/s 1.50 c/B CFB dec | 0.991 ns/B 962.7 MiB/s 1.14 c/B OFB enc | 7.05 ns/B 135.2 MiB/s 8.13 c/B OFB dec | 7.05 ns/B 135.2 MiB/s 8.13 c/B CTR enc | 1.11 ns/B 856.9 MiB/s 1.28 c/B CTR dec | 1.11 ns/B 857.0 MiB/s 1.28 c/B CCM enc | 2.58 ns/B 369.8 MiB/s 2.97 c/B CCM dec | 2.58 ns/B 369.5 MiB/s 2.97 c/B CCM auth | 1.58 ns/B 605.2 MiB/s 1.82 c/B GCM enc | 2.04 ns/B 467.9 MiB/s 2.35 c/B GCM dec | 2.04 ns/B 466.6 MiB/s 2.35 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 1.64 ns/B 579.8 MiB/s 1.89 c/B OCB dec | 1.66 ns/B 574.5 MiB/s 1.91 c/B OCB auth | 1.33 ns/B 715.5 MiB/s 1.54 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.64 ns/B 169.0 MiB/s 6.50 c/B ECB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B CBC enc | 1.90 ns/B 502.1 MiB/s 2.19 c/B CBC dec | 1.24 ns/B 771.7 MiB/s 1.42 c/B CFB enc | 1.84 ns/B 517.1 MiB/s 2.12 c/B CFB dec | 1.23 ns/B 772.5 MiB/s 1.42 c/B OFB enc | 7.60 ns/B 125.5 MiB/s 8.75 c/B OFB dec | 7.60 ns/B 125.6 MiB/s 8.75 c/B CTR enc | 1.36 ns/B 702.7 MiB/s 1.56 c/B CTR dec | 1.36 ns/B 702.5 MiB/s 1.56 c/B CCM enc | 3.31 ns/B 287.8 MiB/s 3.82 c/B CCM dec | 3.31 ns/B 288.0 MiB/s 3.81 c/B CCM auth | 2.06 ns/B 462.1 MiB/s 2.38 c/B GCM enc | 2.28 ns/B 418.4 MiB/s 2.63 c/B GCM dec | 2.28 ns/B 418.0 MiB/s 2.63 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 1.83 ns/B 520.1 MiB/s 2.11 c/B OCB dec | 1.84 ns/B 517.8 MiB/s 2.12 c/B OCB auth | 1.52 ns/B 626.1 MiB/s 1.75 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.86 ns/B 162.7 MiB/s 6.75 c/B ECB dec | 6.02 ns/B 158.3 MiB/s 6.94 c/B CBC enc | 2.44 ns/B 390.5 MiB/s 2.81 c/B CBC dec | 1.45 ns/B 656.4 MiB/s 1.67 c/B CFB enc | 2.39 ns/B 399.5 MiB/s 2.75 c/B CFB dec | 1.45 ns/B 656.8 MiB/s 1.67 c/B OFB enc | 7.81 ns/B 122.1 MiB/s 9.00 c/B OFB dec | 7.81 ns/B 122.1 MiB/s 9.00 c/B CTR enc | 1.57 ns/B 605.8 MiB/s 1.81 c/B CTR dec | 1.57 ns/B 605.9 MiB/s 1.81 c/B CCM enc | 4.07 ns/B 234.3 MiB/s 4.69 c/B CCM dec | 4.07 ns/B 234.1 MiB/s 4.69 c/B CCM auth | 2.61 ns/B 365.7 MiB/s 3.00 c/B GCM enc | 2.50 ns/B 381.9 MiB/s 2.88 c/B GCM dec | 2.49 ns/B 382.3 MiB/s 2.87 c/B GCM auth | 0.926 ns/B 1029.7 MiB/s 1.07 c/B OCB enc | 2.05 ns/B 465.6 MiB/s 2.36 c/B OCB dec | 2.06 ns/B 462.0 MiB/s 2.38 c/B OCB auth | 1.74 ns/B 548.4 MiB/s 2.00 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Enable AMD64 AES implementation for WIN64Jussi Kivilinna2015-05-021-1/+2
| | | | | | | | | | | | | | * cipher/rijndael-amd64.S: Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (ELF): New macro to mask lines with ELF specific commands. * cipher/rijndael-internal.h (USE_AMD64_ASM): Enable when HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined. (do_encrypt, do_decrypt) [USE_AMD64_ASM && !HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Use assembly block to call AMD64 assembly encrypt/decrypt function. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64Jussi Kivilinna2015-05-011-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Disable GCM and AES-NI assembly implementations for WIN64Jussi Kivilinna2015-05-011-1/+3
| | | | | | | | | * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when __WIN64__ defined. * cipher/rijndael-internal.h (USE_AESNI): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add Intel SSSE3 based vector permutation AES implementationJussi Kivilinna2014-12-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'. * cipher/rijndael-internal.h (USE_SSSE3): New. (RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'. * cipher/rijndael-ssse3-amd64.c: New. * cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey) (_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt) (_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc) (_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc) (_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New. (do_setkey): Add HWF check for SSSE3 and setup for SSSE3 implementation. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add selection for SSSE3 implementation. * configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'. -- This patch adds "AES with vector permutations" implementation by Mike Hamburg. Public-domain source-code is available at: http://crypto.stanford.edu/vpaes/ Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled): Old (AMD64 asm): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B New (ssse3): AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: fix compiler warnings on ARMJussi Kivilinna2014-12-251-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-internal.h (RIJNDAEL_context_s): Add u32 variants of keyschedule arrays to unions u1 and u2. (keyschedenc32, keyscheddec32): New. * cipher/rijndael.c (u32_a_t): Remove. (do_setkey): Add and use tkk[].data32, k_u32, tk_u32 and W_u32; Remove casting byte arrays to u32_a_t. (prepare_decryption, do_encrypt_fn, do_decrypt_fn): Use keyschedenc32 and keyscheddec32; Remove casting byte arrays to u32_a_t. -- Patch fixes 'cast increases required alignment' compiler warnings that GCC was showing: rijndael.c: In function 'do_setkey': rijndael.c:310:13: warning: cast increases required alignment of target type [-Wcast-align] *((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]); ^ rijndael.c:310:34: warning: cast increases required alignment of target type [-Wcast-align] *((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]); [removed the rest] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: use more compact look-up tables and add table prefetchingJussi Kivilinna2014-12-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-internal.h (rijndael_prefetchfn_t): New. (RIJNDAEL_context): Add 'prefetch_enc_fn' and 'prefetch_dec_fn'. * cipher/rijndael-tables.h (S, T1, T2, T3, T4, T5, T6, T7, T8, S5, U1) (U2, U3, U4): Remove. (encT, dec_tables, decT, inv_sbox): Add. * cipher/rijndael.c (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block, _gcry_aes_arm_encrypt_block) (_gcry_aes_arm_encrypt_block): Add parameter for passing table pointer to assembly implementation. (prefetch_table, prefetch_enc, prefetch_dec): New. (do_setkey): Setup context prefetch functions depending on selected rijndael implementation; Use new tables for key setup. (prepare_decryption): Use new tables for decryption key setup. (do_encrypt_aligned): Rename to... (do_encrypt_fn): ... to this, change to use new compact tables, make handle unaligned input and unroll rounds loop by two. (do_encrypt): Remove handling of unaligned input/output; pass table pointer to assembly implementations. (rijndael_encrypt, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec): Prefetch encryption tables before encryption. (do_decrypt_aligned): Rename to... (do_decrypt_fn): ... to this, change to use new compact tables, make handle unaligned input and unroll rounds loop by two. (do_decrypt): Remove handling of unaligned input/output; pass table pointer to assembly implementations. (rijndael_decrypt, _gcry_aes_cbc_dec): Prefetch decryption tables before decryption. * cipher/rijndael-amd64.S: Use 1+1.25 KiB tables for encryption+decryption; remove tables from assembly file. * cipher/rijndael-arm.S: Ditto. -- Patch replaces 4+4.25 KiB look-up tables in generic implementation and 8+8 KiB look-up tables in AMD64 implementation and 2+2 KiB look-up tables in ARM implementation with 1+1.25 KiB look-up tables, and adds prefetching of look-up tables. AMD64 assembly is slower than before because of additional rotation instructions. The generic C implementation is now better optimized and actually faster than before. Benchmark results on Intel i5-4570 (turbo off) (64-bit, AMD64 assembly): tests/bench-slope --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes Old: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 3.10 ns/B 307.5 MiB/s 9.92 c/B ECB dec | 3.15 ns/B 302.5 MiB/s 10.09 c/B CBC enc | 3.46 ns/B 275.5 MiB/s 11.08 c/B CBC dec | 3.19 ns/B 299.2 MiB/s 10.20 c/B CFB enc | 3.48 ns/B 274.4 MiB/s 11.12 c/B CFB dec | 3.23 ns/B 294.8 MiB/s 10.35 c/B OFB enc | 3.29 ns/B 290.2 MiB/s 10.52 c/B OFB dec | 3.31 ns/B 288.3 MiB/s 10.58 c/B CTR enc | 3.64 ns/B 261.7 MiB/s 11.66 c/B CTR dec | 3.65 ns/B 261.6 MiB/s 11.67 c/B New: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 4.21 ns/B 226.7 MiB/s 13.46 c/B ECB dec | 4.27 ns/B 223.2 MiB/s 13.67 c/B CBC enc | 4.15 ns/B 229.8 MiB/s 13.28 c/B CBC dec | 3.85 ns/B 247.8 MiB/s 12.31 c/B CFB enc | 4.16 ns/B 229.1 MiB/s 13.32 c/B CFB dec | 3.88 ns/B 245.9 MiB/s 12.41 c/B OFB enc | 4.38 ns/B 217.8 MiB/s 14.01 c/B OFB dec | 4.36 ns/B 218.6 MiB/s 13.96 c/B CTR enc | 4.30 ns/B 221.6 MiB/s 13.77 c/B CTR dec | 4.30 ns/B 221.7 MiB/s 13.76 c/B Benchmark on Intel i5-4570 (turbo off) (32-bit mingw, generic C): tests/bench-slope.exe --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes Old: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 6.03 ns/B 158.2 MiB/s 19.29 c/B ECB dec | 5.81 ns/B 164.1 MiB/s 18.60 c/B CBC enc | 6.22 ns/B 153.4 MiB/s 19.90 c/B CBC dec | 5.91 ns/B 161.3 MiB/s 18.92 c/B CFB enc | 6.25 ns/B 152.7 MiB/s 19.99 c/B CFB dec | 6.24 ns/B 152.8 MiB/s 19.97 c/B OFB enc | 6.33 ns/B 150.6 MiB/s 20.27 c/B OFB dec | 6.33 ns/B 150.7 MiB/s 20.25 c/B CTR enc | 6.28 ns/B 152.0 MiB/s 20.08 c/B CTR dec | 6.28 ns/B 151.7 MiB/s 20.11 c/B New: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.02 ns/B 190.0 MiB/s 16.06 c/B ECB dec | 5.33 ns/B 178.8 MiB/s 17.07 c/B CBC enc | 4.64 ns/B 205.4 MiB/s 14.86 c/B CBC dec | 4.95 ns/B 192.7 MiB/s 15.84 c/B CFB enc | 4.75 ns/B 200.7 MiB/s 15.20 c/B CFB dec | 4.74 ns/B 201.1 MiB/s 15.18 c/B OFB enc | 5.29 ns/B 180.3 MiB/s 16.93 c/B OFB dec | 5.29 ns/B 180.3 MiB/s 16.93 c/B CTR enc | 4.77 ns/B 200.0 MiB/s 15.26 c/B CTR dec | 4.77 ns/B 199.8 MiB/s 15.27 c/B Benchmark on Cortex-A8 (ARM assembly): tests/bench-slope --cpu-mhz 1008 cipher aes Old: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 21.84 ns/B 43.66 MiB/s 22.02 c/B ECB dec | 22.35 ns/B 42.67 MiB/s 22.53 c/B CBC enc | 22.97 ns/B 41.53 MiB/s 23.15 c/B CBC dec | 23.48 ns/B 40.61 MiB/s 23.67 c/B CFB enc | 22.72 ns/B 41.97 MiB/s 22.90 c/B CFB dec | 23.41 ns/B 40.74 MiB/s 23.59 c/B OFB enc | 23.65 ns/B 40.32 MiB/s 23.84 c/B OFB dec | 23.67 ns/B 40.29 MiB/s 23.86 c/B CTR enc | 23.24 ns/B 41.03 MiB/s 23.43 c/B CTR dec | 23.23 ns/B 41.05 MiB/s 23.42 c/B New: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 26.03 ns/B 36.64 MiB/s 26.24 c/B ECB dec | 26.97 ns/B 35.36 MiB/s 27.18 c/B CBC enc | 23.21 ns/B 41.09 MiB/s 23.39 c/B CBC dec | 23.36 ns/B 40.83 MiB/s 23.54 c/B CFB enc | 23.02 ns/B 41.42 MiB/s 23.21 c/B CFB dec | 23.67 ns/B 40.28 MiB/s 23.86 c/B OFB enc | 27.86 ns/B 34.24 MiB/s 28.08 c/B OFB dec | 27.87 ns/B 34.21 MiB/s 28.10 c/B CTR enc | 23.47 ns/B 40.63 MiB/s 23.66 c/B CTR dec | 23.49 ns/B 40.61 MiB/s 23.67 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: refactor to reduce number of #ifdefs and branchesJussi Kivilinna2014-12-011-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt) (_gcry_aes_aesni_decrypt): Make return stack burn depth. * cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block): Ditto. * cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block) (_gcry_aes_arm_decrypt_block): Ditto. * cipher/rijndael-internal.h (RIJNDAEL_context_s) (rijndael_cryptfn_t): New. (RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'. * cipher/rijndael.c (_gcry_aes_amd64_encrypt_block) (_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt) (_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block) (_gcry_aes_arm_decrypt_block): Change prototypes. (do_padlock_encrypt, do_padlock_decrypt): New. (do_setkey): Separate key-length to rounds conversion from HW features check; Add selection for ctx->encrypt_fn and ctx->decrypt_fn. (do_encrypt_aligned, do_decrypt_aligned): Move inside '[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and USE_ARM_ASM to... (do_encrypt, do_decrypt): ...here; Return stack depth; Remove second temporary buffer from non-aligned input/output case. (do_padlock): Move decrypt_flag to last argument; Return stack depth. (rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn. (_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned. (_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer after use. (rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn. (_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place of do_decrypt/do_decrypt_aligned. (_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* rijndael: split AES-NI functions to separate fileJussi Kivilinna2014-12-011-0/+118
* cipher/Makefile.in: Add 'rijndael-aesni.c'. * cipher/rijndael-aesni.c: New. * cipher/rijndael-internal.h: New. * cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16) (USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context) (keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'. (u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6) (aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4) (do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move to 'rijndael-aesni.c'. (prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc) (_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions in 'rijdael-aesni.c'. * configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'. -- Clean-up rijndael.c before new new hardware acceleration support gets added. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>