| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_setkey): New key
schedule with simplified structure and less stack usage.
* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add
'keyschedule32b'.
(keyschenc32b): New.
* cipher/rijndael-ppc-common.h (vec_u32): New.
* cipher/rijndael-ppc.c (vec_bswap32_const): Remove.
(_gcry_aes_sbox4_ppc8): Optimize for less instructions emitted.
(keysched_idx): New.
(_gcry_aes_ppc8_setkey): New key schedule with simplified structure.
* cipher/rijndael-tables.h (rcon): Remove.
* cipher/rijndael.c (sbox4): New.
(do_setkey): New key schedule with simplified structure and less
stack usage.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-vaes.c' and
'rijndael-vaes-avx2-amd64.S'.
* cipher/rijndael-internal.h (USE_VAES): New.
* cipher/rijndael-vaes-avx2-amd64.S: New.
* cipher/rijndael-vaes.c: New.
* cipher/rijndael.c (_gcry_aes_vaes_cfb_dec, _gcry_aes_vaes_cbc_dec)
(_gcry_aes_vaes_ctr_enc, _gcry_aes_vaes_ocb_crypt)
(_gcry_aes_vaes_xts_crypt): New.
(do_setkey) [USE_VAES]: Add detection for VAES.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128)
[USE_VAES]: Increase number of selftest blocks.
* configure.ac: Add 'rijndael-vaes.lo' and
'rijndael-vaes-avx2-amd64.lo'.
--
Patch adds VAES/AVX2 accelerated implementation for CBC-decryption,
CFB-decryption, CTR-encryption, OCB-en/decryption and XTS-en/decryption.
Benchmarks on AMD Ryzen 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.067 ns/B 14314 MiB/s 0.323 c/B 4850
CFB dec | 0.067 ns/B 14322 MiB/s 0.323 c/B 4850
CTR enc | 0.066 ns/B 14429 MiB/s 0.321 c/B 4850
CTR dec | 0.066 ns/B 14433 MiB/s 0.320 c/B 4850
XTS enc | 0.087 ns/B 10910 MiB/s 0.424 c/B 4850
XTS dec | 0.088 ns/B 10856 MiB/s 0.426 c/B 4850
OCB enc | 0.070 ns/B 13633 MiB/s 0.339 c/B 4850
OCB dec | 0.069 ns/B 13911 MiB/s 0.332 c/B 4850
After (XTS ~1.7x faster, others ~1.9x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.034 ns/B 28159 MiB/s 0.164 c/B 4850
CFB dec | 0.034 ns/B 27955 MiB/s 0.165 c/B 4850
CTR enc | 0.034 ns/B 28214 MiB/s 0.164 c/B 4850
CTR dec | 0.034 ns/B 28146 MiB/s 0.164 c/B 4850
XTS enc | 0.051 ns/B 18539 MiB/s 0.249 c/B 4850
XTS dec | 0.051 ns/B 18655 MiB/s 0.248 c/B 4850
GCM auth | 0.088 ns/B 10817 MiB/s 0.428 c/B 4850
OCB enc | 0.037 ns/B 25824 MiB/s 0.179 c/B 4850
OCB dec | 0.038 ns/B 25359 MiB/s 0.182 c/B 4850
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-internal.h (RIJNDAEL_context_s): Remove unused
'use_padlock', 'use_aesni', 'use_ssse3', 'use_arm_ce', 'use_ppc_crypto'
and 'use_ppc9le_crypto'.
* cipher/rijndael.c (do_setkey): Do not setup 'use_padlock',
'use_aesni', 'use_ssse3', 'use_arm_ce', 'use_ppc_crypto' and
'use_ppc9le_crypto'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add 'rijndael-s390x.lo'.
* cipher/Makefile.am: Add 'rijndael-s390x.c'.
* cipher/rijndael-internal.c (USE_S390X_CRYPTO): New.
(RIJNDAEL_context_s) [USE_S390X_CRYPTO]: New 'km*_func' members.
* cipher/rijndael-s390x.c: New.
* cipher/rijndael.c (_gcry_aes_s390x_setup_acceleration)
(_gcry_aes_s390x_setup_setkey)
(_gcry_aes_s390x_setup_prepare_decryption, _gcry_aes_s390x_encrypt)
(_gcry_aes_s390x_decrypt): New.
(do_setkey) [USE_S390X_CRYPTO]: Add s390x acceleration setup.
--
Patchs adds acceleration for single-block AES and following modes:
- CBC, CBC-MAC, CFB, OFB, CTR, XTS and OCB
Benchmarks (z15, 5.2Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.81 ns/B 250.2 MiB/s 19.82 c/B
ECB dec | 4.13 ns/B 231.1 MiB/s 21.46 c/B
CBC enc | 3.69 ns/B 258.5 MiB/s 19.19 c/B
CBC dec | 3.71 ns/B 257.1 MiB/s 19.29 c/B
CFB enc | 3.69 ns/B 258.7 MiB/s 19.17 c/B
CFB dec | 3.56 ns/B 267.8 MiB/s 18.52 c/B
OFB enc | 3.85 ns/B 247.8 MiB/s 20.01 c/B
OFB dec | 3.85 ns/B 247.9 MiB/s 20.01 c/B
CTR enc | 3.65 ns/B 261.6 MiB/s 18.96 c/B
CTR dec | 3.64 ns/B 261.6 MiB/s 18.95 c/B
XTS enc | 3.66 ns/B 260.8 MiB/s 19.02 c/B
XTS dec | 3.75 ns/B 254.2 MiB/s 19.51 c/B
CCM enc | 7.34 ns/B 129.9 MiB/s 38.19 c/B
CCM dec | 7.34 ns/B 129.9 MiB/s 38.19 c/B
CCM auth | 3.70 ns/B 257.6 MiB/s 19.25 c/B
EAX enc | 7.34 ns/B 129.8 MiB/s 38.19 c/B
EAX dec | 7.35 ns/B 129.8 MiB/s 38.20 c/B
EAX auth | 3.70 ns/B 257.8 MiB/s 19.24 c/B
GCM enc | 6.22 ns/B 153.3 MiB/s 32.36 c/B
GCM dec | 6.23 ns/B 153.0 MiB/s 32.42 c/B
GCM auth | 2.59 ns/B 368.9 MiB/s 13.44 c/B
OCB enc | 3.82 ns/B 249.7 MiB/s 19.86 c/B
OCB dec | 3.90 ns/B 244.2 MiB/s 20.31 c/B
OCB auth | 3.88 ns/B 245.5 MiB/s 20.20 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 2.10 ns/B 453.1 MiB/s 10.94 c/B
ECB dec | 2.11 ns/B 453.0 MiB/s 10.95 c/B
CBC enc | 0.182 ns/B 5240 MiB/s 0.946 c/B
CBC dec | 0.044 ns/B 21581 MiB/s 0.230 c/B
CFB enc | 0.206 ns/B 4623 MiB/s 1.07 c/B
CFB dec | 0.140 ns/B 6826 MiB/s 0.727 c/B
OFB enc | 0.183 ns/B 5222 MiB/s 0.950 c/B
OFB dec | 0.182 ns/B 5252 MiB/s 0.944 c/B
CTR enc | 0.059 ns/B 16095 MiB/s 0.308 c/B
CTR dec | 0.059 ns/B 16045 MiB/s 0.309 c/B
XTS enc | 0.043 ns/B 21998 MiB/s 0.225 c/B
XTS dec | 0.043 ns/B 22012 MiB/s 0.225 c/B
CCM enc | 0.239 ns/B 3989 MiB/s 1.24 c/B
CCM dec | 0.239 ns/B 3987 MiB/s 1.24 c/B
CCM auth | 0.180 ns/B 5288 MiB/s 0.938 c/B
EAX enc | 0.242 ns/B 3940 MiB/s 1.26 c/B
EAX dec | 0.243 ns/B 3926 MiB/s 1.26 c/B
EAX auth | 0.183 ns/B 5218 MiB/s 0.950 c/B
GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B
GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B
GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B
OCB enc | 0.186 ns/B 5132 MiB/s 0.966 c/B
OCB dec | 0.176 ns/B 5414 MiB/s 0.916 c/B
OCB auth | 0.149 ns/B 6394 MiB/s 0.776 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-internal.h (rijndael_prepare_decfn_t): New.
(RIJNDAEL_context_s): New member 'prepare_decryption'.
* cipher/rijndael-padlock.c (_gcry_aes_padlock_prepare_decryption): New.
* cipher/rijndael.c (_gcry_aes_padlock_prepare_decryption): New.
(do_setkey): Setup 'ctx->prepare_decryption' for each acceleration type.
(prepare_decryption): Remove calls to other prepare decryption functions.
(check_decryption_preparation): Call 'ctx->prepare_decryption' instead
of 'prepare_decryption'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac: Add 'rijndael-ppc9le.lo'.
* cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h'
and 'rijndael-ppc-functions.h'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New.
(RIJNDAEL_context_s): Add 'use_ppc9le_crypto'.
* cipher/rijndael.c (_gcry_aes_ppc9le_encrypt)
(_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc)
(_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc)
(_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec)
(_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth)
(_gcry_aes_ppc9le_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt)
[USE_PPC_CRYPTO_WITH_PPC9LE]: New.
* cipher/rijndael-ppc.c: Split common code to headers
'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'.
* cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'.
(asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New.
* cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'.
(CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2.
(XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm
instruction.
* cipher/rijndael-ppc9le.c: New.
--
Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.
Benchmark on POWER9 (~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B
CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B
CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B
CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B
CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B
CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B
XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B
XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B
OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B
OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B
OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B
After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B
CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B
CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B
CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B
CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B
CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B
XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B
XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B
OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B
OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B
OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-ppc.c'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO): New.
(RIJNDAEL_context): Add 'use_ppc_crypto'.
* cipher/rijndael-ppc.c (backwards, swap_if_le): Remove.
(u128_t, ALWAYS_INLINE, NO_INLINE, NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE, ASM_FUNC_ATTR_NOINLINE)
(ALIGNED_LOAD, ALIGNED_STORE, VEC_LOAD_BE, VEC_STORE_BE)
(vec_bswap32_const, vec_aligned_ld, vec_load_be_const)
(vec_load_be, vec_aligned_st, vec_store_be, _gcry_aes_sbox4_ppc8)
(_gcry_aes_ppc8_setkey, _gcry_aes_ppc8_prepare_decryption)
(aes_ppc8_encrypt_altivec, aes_ppc8_decrypt_altivec): New.
(_gcry_aes_ppc8_encrypt, _gcry_aes_ppc8_decrypt): Rewrite.
(_gcry_aes_ppc8_ocb_crypt): Comment out.
* cipher/rijndael.c [USE_PPC_CRYPTO] (_gcry_aes_ppc8_setkey)
(_gcry_aes_ppc8_prepare_decryption, _gcry_aes_ppc8_encrypt)
(_gcry_aes_ppc8_decrypt): New prototypes.
(do_setkey) [USE_PPC_CRYPTO]: Add setup for PowerPC AES.
(prepare_decryption) [USE_PPC_CRYPTO]: Ditto.
* configure.ac: Add 'rijndael-ppc.lo'.
(gcry_cv_ppc_altivec, gcry_cv_cc_ppc_altivec_cflags)
(gcry_cv_gcc_inline_asm_ppc_altivec)
(gcry_cv_gcc_inline_asm_ppc_arch_3_00): New checks.
--
Benchmark on POWER8 ~3.8Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 7.27 ns/B 131.2 MiB/s 27.61 c/B
ECB dec | 7.70 ns/B 123.8 MiB/s 29.28 c/B
CBC enc | 6.38 ns/B 149.5 MiB/s 24.24 c/B
CBC dec | 6.17 ns/B 154.5 MiB/s 23.45 c/B
CFB enc | 6.45 ns/B 147.9 MiB/s 24.51 c/B
CFB dec | 6.20 ns/B 153.8 MiB/s 23.57 c/B
OFB enc | 7.36 ns/B 129.6 MiB/s 27.96 c/B
OFB dec | 7.36 ns/B 129.6 MiB/s 27.96 c/B
CTR enc | 6.22 ns/B 153.2 MiB/s 23.65 c/B
CTR dec | 6.22 ns/B 153.3 MiB/s 23.65 c/B
XTS enc | 6.67 ns/B 142.9 MiB/s 25.36 c/B
XTS dec | 6.70 ns/B 142.3 MiB/s 25.46 c/B
CCM enc | 12.61 ns/B 75.60 MiB/s 47.93 c/B
CCM dec | 12.62 ns/B 75.56 MiB/s 47.96 c/B
CCM auth | 6.41 ns/B 148.8 MiB/s 24.36 c/B
EAX enc | 12.62 ns/B 75.55 MiB/s 47.96 c/B
EAX dec | 12.62 ns/B 75.55 MiB/s 47.97 c/B
EAX auth | 6.39 ns/B 149.2 MiB/s 24.30 c/B
GCM enc | 9.81 ns/B 97.24 MiB/s 37.27 c/B
GCM dec | 9.81 ns/B 97.20 MiB/s 37.28 c/B
GCM auth | 3.59 ns/B 265.8 MiB/s 13.63 c/B
OCB enc | 6.39 ns/B 149.3 MiB/s 24.27 c/B
OCB dec | 6.38 ns/B 149.5 MiB/s 24.25 c/B
OCB auth | 6.35 ns/B 150.2 MiB/s 24.13 c/B
After:
ECB enc | 1.29 ns/B 737.7 MiB/s 4.91 c/B
ECB dec | 1.34 ns/B 711.1 MiB/s 5.10 c/B
CBC enc | 2.13 ns/B 448.5 MiB/s 8.08 c/B
CBC dec | 1.05 ns/B 908.0 MiB/s 3.99 c/B
CFB enc | 2.17 ns/B 439.9 MiB/s 8.24 c/B
CFB dec | 2.22 ns/B 429.8 MiB/s 8.43 c/B
OFB enc | 1.49 ns/B 640.1 MiB/s 5.66 c/B
OFB dec | 1.49 ns/B 640.1 MiB/s 5.66 c/B
CTR enc | 2.21 ns/B 432.5 MiB/s 8.38 c/B
CTR dec | 2.20 ns/B 432.5 MiB/s 8.38 c/B
XTS enc | 2.32 ns/B 410.6 MiB/s 8.83 c/B
XTS dec | 2.33 ns/B 409.7 MiB/s 8.85 c/B
CCM enc | 4.36 ns/B 218.7 MiB/s 16.57 c/B
CCM dec | 4.36 ns/B 218.8 MiB/s 16.56 c/B
CCM auth | 2.17 ns/B 440.4 MiB/s 8.23 c/B
EAX enc | 4.37 ns/B 218.3 MiB/s 16.60 c/B
EAX dec | 4.36 ns/B 218.7 MiB/s 16.57 c/B
EAX auth | 2.16 ns/B 440.7 MiB/s 8.22 c/B
GCM enc | 5.78 ns/B 165.0 MiB/s 21.96 c/B
GCM dec | 5.78 ns/B 165.0 MiB/s 21.96 c/B
GCM auth | 3.59 ns/B 265.9 MiB/s 13.63 c/B
OCB enc | 2.33 ns/B 410.1 MiB/s 8.84 c/B
OCB dec | 2.34 ns/B 407.2 MiB/s 8.90 c/B
OCB auth | 2.32 ns/B 411.1 MiB/s 8.82 c/B
GnuPG-bug-id: 4529
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-internal.h (ATTR_ALIGNED_64): New.
* cipher/rijndael-tables.h (encT): Move to 'enc_tables' structure.
(enc_tables): New structure for encryption table with counters before
and after.
(encT): New macro.
(dec_tables): Add counters before and after encryption table; Move
from .rodata to .data section.
(do_encrypt): Change 'encT' to 'enc_tables.T'.
(do_decrypt): Change '&dec_tables' to 'dec_tables.T'.
* cipher/cipher-gcm.c (prefetch_table): Make inline; Handle input
with length not multiple of 256.
(prefetch_enc, prefetch_dec): Modify pre- and post-table counters
to unshare look-up table pages between processes.
--
GnuPG-bug-id: 4541
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB
values L0L1 and L0L1L0; Swap dimensions for OCB L table.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and
L0L1L0 values.
(ocb_crypt): Process input in 24KiB chunks for better cache locality
for checksumming.
* cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always
inlining functions, change all functions with 'inline' to use
ALWAYS_INLINE.
(NO_INLINE): New macro.
(aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to...
(aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and
adjust accordingly (xmm7 moved from *_7_15 to *_2_7).
(aesni_prepare_2_6, aesni_prepare_7_15): Rename to...
(aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust
accordingly.
(aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to...
(aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust
accordingly.
(aesni_ocb_checksum): New.
(aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum
calculation as separate pass instead of inline; Use NO_INLINE.
(_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0.
* cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add
'use_avx2' and 'use_avx'.
* cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if
Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW
feature is available.
* tests/basic.c (do_check_ocb_cipher): New test vector; increase
size of temporary buffers for new test vector.
(check_ocb_cipher_largebuf_split): Make test plaintext non-uniform
for better checksum testing.
(check_ocb_cipher_checksum): New.
(check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum.
(check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf
test runs.
--
Benchmark on Haswell i7-4970k @ 4.0Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.175 ns/B 5436 MiB/s 0.702 c/B
OCB dec | 0.184 ns/B 5184 MiB/s 0.736 c/B
OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B
After (enc +2% faster, dec +7% faster):
OCB enc | 0.172 ns/B 5547 MiB/s 0.688 c/B
OCB dec | 0.171 ns/B 5582 MiB/s 0.683 c/B
OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-armv-aarch64-ce.S'.
* cipher/rijndael-armv8-aarch64-ce.S: New.
* cipher/rijndael-internal.h (USE_ARM_CE): Enable for ARMv8/AArch64.
* configure.ac: Add 'rijndael-armv-aarch64-ce.lo' and
'rijndael-armv8-ce.lo' for ARMv8/AArch64.
--
Improvement vs AArch64 assembly on Cortex-A53:
AES-128 AES-192 AES-256
CBC enc: 13.19x 13.53x 13.76x
CBC dec: 20.53x 21.91x 22.60x
CFB enc: 14.29x 14.50x 14.63x
CFB dec: 20.42x 21.69x 22.50x
CTR: 18.29x 19.61x 20.53x
OCB enc: 15.21x 16.32x 17.12x
OCB dec: 14.95x 16.11x 16.88x
OCB auth: 16.73x 17.93x 18.66x
Benchmark on Cortex-A53 (1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.86 ns/B 43.62 MiB/s 25.19 c/B
ECB dec | 22.68 ns/B 42.05 MiB/s 26.13 c/B
CBC enc | 18.66 ns/B 51.10 MiB/s 21.50 c/B
CBC dec | 18.72 ns/B 50.95 MiB/s 21.56 c/B
CFB enc | 18.61 ns/B 51.25 MiB/s 21.44 c/B
CFB dec | 18.61 ns/B 51.25 MiB/s 21.44 c/B
OFB enc | 22.84 ns/B 41.75 MiB/s 26.31 c/B
OFB dec | 22.84 ns/B 41.75 MiB/s 26.31 c/B
CTR enc | 18.89 ns/B 50.50 MiB/s 21.76 c/B
CTR dec | 18.89 ns/B 50.50 MiB/s 21.76 c/B
CCM enc | 37.55 ns/B 25.40 MiB/s 43.25 c/B
CCM dec | 37.55 ns/B 25.40 MiB/s 43.25 c/B
CCM auth | 18.77 ns/B 50.80 MiB/s 21.63 c/B
GCM enc | 20.18 ns/B 47.25 MiB/s 23.25 c/B
GCM dec | 20.18 ns/B 47.25 MiB/s 23.25 c/B
GCM auth | 1.30 ns/B 732.5 MiB/s 1.50 c/B
OCB enc | 19.67 ns/B 48.48 MiB/s 22.66 c/B
OCB dec | 19.73 ns/B 48.34 MiB/s 22.72 c/B
OCB auth | 19.46 ns/B 49.00 MiB/s 22.42 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B
ECB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B
CBC enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B
CBC dec | 22.25 ns/B 42.87 MiB/s 25.63 c/B
CFB enc | 22.03 ns/B 43.30 MiB/s 25.38 c/B
CFB dec | 22.03 ns/B 43.29 MiB/s 25.38 c/B
OFB enc | 26.26 ns/B 36.32 MiB/s 30.25 c/B
OFB dec | 26.26 ns/B 36.32 MiB/s 30.25 c/B
CTR enc | 22.30 ns/B 42.76 MiB/s 25.69 c/B
CTR dec | 22.30 ns/B 42.76 MiB/s 25.69 c/B
CCM enc | 44.38 ns/B 21.49 MiB/s 51.13 c/B
CCM dec | 44.38 ns/B 21.49 MiB/s 51.13 c/B
CCM auth | 22.20 ns/B 42.97 MiB/s 25.57 c/B
GCM enc | 23.60 ns/B 40.41 MiB/s 27.19 c/B
GCM dec | 23.60 ns/B 40.41 MiB/s 27.19 c/B
GCM auth | 1.30 ns/B 732.4 MiB/s 1.50 c/B
OCB enc | 23.09 ns/B 41.31 MiB/s 26.60 c/B
OCB dec | 23.21 ns/B 41.09 MiB/s 26.74 c/B
OCB auth | 22.88 ns/B 41.68 MiB/s 26.36 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 28.76 ns/B 33.17 MiB/s 33.13 c/B
ECB dec | 29.46 ns/B 32.37 MiB/s 33.94 c/B
CBC enc | 25.45 ns/B 37.48 MiB/s 29.31 c/B
CBC dec | 25.50 ns/B 37.40 MiB/s 29.38 c/B
CFB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B
CFB dec | 25.39 ns/B 37.56 MiB/s 29.25 c/B
OFB enc | 29.62 ns/B 32.19 MiB/s 34.13 c/B
OFB dec | 29.62 ns/B 32.19 MiB/s 34.13 c/B
CTR enc | 25.67 ns/B 37.15 MiB/s 29.57 c/B
CTR dec | 25.67 ns/B 37.15 MiB/s 29.57 c/B
CCM enc | 51.11 ns/B 18.66 MiB/s 58.88 c/B
CCM dec | 51.11 ns/B 18.66 MiB/s 58.88 c/B
CCM auth | 25.56 ns/B 37.32 MiB/s 29.44 c/B
GCM enc | 26.96 ns/B 35.37 MiB/s 31.06 c/B
GCM dec | 26.98 ns/B 35.35 MiB/s 31.08 c/B
GCM auth | 1.30 ns/B 733.4 MiB/s 1.50 c/B
OCB enc | 26.45 ns/B 36.05 MiB/s 30.47 c/B
OCB dec | 26.53 ns/B 35.95 MiB/s 30.56 c/B
OCB auth | 26.24 ns/B 36.34 MiB/s 30.23 c/B
=
After:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.83 ns/B 197.5 MiB/s 5.56 c/B
ECB dec | 4.99 ns/B 191.1 MiB/s 5.75 c/B
CBC enc | 1.41 ns/B 675.5 MiB/s 1.63 c/B
CBC dec | 0.911 ns/B 1046.9 MiB/s 1.05 c/B
CFB enc | 1.30 ns/B 732.2 MiB/s 1.50 c/B
CFB dec | 0.911 ns/B 1046.7 MiB/s 1.05 c/B
OFB enc | 5.81 ns/B 164.3 MiB/s 6.69 c/B
OFB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B
CTR enc | 1.03 ns/B 924.0 MiB/s 1.19 c/B
CTR dec | 1.03 ns/B 924.1 MiB/s 1.19 c/B
CCM enc | 2.50 ns/B 381.8 MiB/s 2.88 c/B
CCM dec | 2.50 ns/B 381.7 MiB/s 2.88 c/B
CCM auth | 1.57 ns/B 606.1 MiB/s 1.81 c/B
GCM enc | 2.33 ns/B 408.5 MiB/s 2.69 c/B
GCM dec | 2.34 ns/B 408.4 MiB/s 2.69 c/B
GCM auth | 1.30 ns/B 732.1 MiB/s 1.50 c/B
OCB enc | 1.29 ns/B 736.6 MiB/s 1.49 c/B
OCB dec | 1.32 ns/B 724.4 MiB/s 1.52 c/B
OCB auth | 1.16 ns/B 819.6 MiB/s 1.34 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.48 ns/B 174.0 MiB/s 6.31 c/B
ECB dec | 5.64 ns/B 169.0 MiB/s 6.50 c/B
CBC enc | 1.63 ns/B 585.8 MiB/s 1.88 c/B
CBC dec | 1.02 ns/B 935.8 MiB/s 1.17 c/B
CFB enc | 1.52 ns/B 627.7 MiB/s 1.75 c/B
CFB dec | 1.02 ns/B 935.9 MiB/s 1.17 c/B
OFB enc | 6.46 ns/B 147.7 MiB/s 7.44 c/B
OFB dec | 6.46 ns/B 147.7 MiB/s 7.44 c/B
CTR enc | 1.14 ns/B 836.1 MiB/s 1.31 c/B
CTR dec | 1.14 ns/B 835.9 MiB/s 1.31 c/B
CCM enc | 2.83 ns/B 337.6 MiB/s 3.25 c/B
CCM dec | 2.82 ns/B 338.0 MiB/s 3.25 c/B
CCM auth | 1.79 ns/B 532.7 MiB/s 2.06 c/B
GCM enc | 2.44 ns/B 390.3 MiB/s 2.82 c/B
GCM dec | 2.44 ns/B 390.2 MiB/s 2.82 c/B
GCM auth | 1.30 ns/B 731.9 MiB/s 1.50 c/B
OCB enc | 1.41 ns/B 674.7 MiB/s 1.63 c/B
OCB dec | 1.44 ns/B 662.0 MiB/s 1.66 c/B
OCB auth | 1.28 ns/B 746.1 MiB/s 1.47 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 6.13 ns/B 155.5 MiB/s 7.06 c/B
ECB dec | 6.29 ns/B 151.5 MiB/s 7.25 c/B
CBC enc | 1.85 ns/B 516.8 MiB/s 2.13 c/B
CBC dec | 1.13 ns/B 845.6 MiB/s 1.30 c/B
CFB enc | 1.74 ns/B 549.5 MiB/s 2.00 c/B
CFB dec | 1.13 ns/B 846.1 MiB/s 1.30 c/B
OFB enc | 7.11 ns/B 134.2 MiB/s 8.19 c/B
OFB dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B
CTR enc | 1.25 ns/B 763.5 MiB/s 1.44 c/B
CTR dec | 1.25 ns/B 763.4 MiB/s 1.44 c/B
CCM enc | 3.15 ns/B 302.9 MiB/s 3.63 c/B
CCM dec | 3.15 ns/B 302.9 MiB/s 3.63 c/B
CCM auth | 2.01 ns/B 474.2 MiB/s 2.32 c/B
GCM enc | 2.55 ns/B 374.2 MiB/s 2.94 c/B
GCM dec | 2.55 ns/B 373.7 MiB/s 2.94 c/B
GCM auth | 1.30 ns/B 732.2 MiB/s 1.50 c/B
OCB enc | 1.54 ns/B 617.6 MiB/s 1.78 c/B
OCB dec | 1.57 ns/B 606.8 MiB/s 1.81 c/B
OCB auth | 1.40 ns/B 679.8 MiB/s 1.62 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-aarch64.S'.
* cipher/rijndael-aarch64.S: New.
* cipher/rijndael-internal.h: Enable USE_ARM_ASM if __AARCH64EL__ and
HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS defined.
* configure.ac (gcry_cv_gcc_aarch64_platform_as_ok): New check.
[host=aarch64]: Add 'rijndael-aarch64.lo'.
--
Patch adds ARMv8/Aarch64 implementation of AES.
Benchmark on Cortex-A53 (1536 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 19.37 ns/B 49.22 MiB/s 29.76 c/B
ECB dec | 19.85 ns/B 48.03 MiB/s 30.50 c/B
CBC enc | 16.84 ns/B 56.62 MiB/s 25.87 c/B
CBC dec | 16.81 ns/B 56.74 MiB/s 25.82 c/B
CFB enc | 16.80 ns/B 56.75 MiB/s 25.81 c/B
CFB dec | 16.81 ns/B 56.75 MiB/s 25.81 c/B
OFB enc | 20.02 ns/B 47.64 MiB/s 30.75 c/B
OFB dec | 20.02 ns/B 47.64 MiB/s 30.75 c/B
CTR enc | 17.06 ns/B 55.91 MiB/s 26.20 c/B
CTR dec | 17.06 ns/B 55.92 MiB/s 26.20 c/B
CCM enc | 33.94 ns/B 28.10 MiB/s 52.13 c/B
CCM dec | 33.94 ns/B 28.10 MiB/s 52.14 c/B
CCM auth | 16.97 ns/B 56.18 MiB/s 26.07 c/B
GCM enc | 28.70 ns/B 33.23 MiB/s 44.09 c/B
GCM dec | 28.70 ns/B 33.23 MiB/s 44.09 c/B
GCM auth | 11.66 ns/B 81.81 MiB/s 17.90 c/B
OCB enc | 17.66 ns/B 53.99 MiB/s 27.13 c/B
OCB dec | 17.61 ns/B 54.16 MiB/s 27.05 c/B
OCB auth | 17.44 ns/B 54.69 MiB/s 26.78 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B
ECB dec | 22.55 ns/B 42.30 MiB/s 34.63 c/B
CBC enc | 19.33 ns/B 49.33 MiB/s 29.70 c/B
CBC dec | 19.50 ns/B 48.91 MiB/s 29.95 c/B
CFB enc | 19.29 ns/B 49.44 MiB/s 29.63 c/B
CFB dec | 19.28 ns/B 49.46 MiB/s 29.61 c/B
OFB enc | 22.49 ns/B 42.40 MiB/s 34.55 c/B
OFB dec | 22.50 ns/B 42.38 MiB/s 34.56 c/B
CTR enc | 19.53 ns/B 48.83 MiB/s 30.00 c/B
CTR dec | 19.54 ns/B 48.80 MiB/s 30.02 c/B
CCM enc | 38.91 ns/B 24.51 MiB/s 59.77 c/B
CCM dec | 38.90 ns/B 24.51 MiB/s 59.76 c/B
CCM auth | 19.45 ns/B 49.02 MiB/s 29.88 c/B
GCM enc | 31.13 ns/B 30.63 MiB/s 47.82 c/B
GCM dec | 31.14 ns/B 30.63 MiB/s 47.82 c/B
GCM auth | 11.66 ns/B 81.80 MiB/s 17.91 c/B
OCB enc | 20.15 ns/B 47.33 MiB/s 30.95 c/B
OCB dec | 20.30 ns/B 46.98 MiB/s 31.18 c/B
OCB auth | 19.92 ns/B 47.88 MiB/s 30.59 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 24.33 ns/B 39.19 MiB/s 37.38 c/B
ECB dec | 25.23 ns/B 37.80 MiB/s 38.76 c/B
CBC enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B
CBC dec | 22.18 ns/B 42.99 MiB/s 34.07 c/B
CFB enc | 21.77 ns/B 43.80 MiB/s 33.44 c/B
CFB dec | 21.77 ns/B 43.81 MiB/s 33.44 c/B
OFB enc | 24.99 ns/B 38.16 MiB/s 38.39 c/B
OFB dec | 24.99 ns/B 38.17 MiB/s 38.38 c/B
CTR enc | 22.02 ns/B 43.32 MiB/s 33.82 c/B
CTR dec | 22.02 ns/B 43.31 MiB/s 33.82 c/B
CCM enc | 43.86 ns/B 21.74 MiB/s 67.38 c/B
CCM dec | 43.87 ns/B 21.74 MiB/s 67.39 c/B
CCM auth | 21.94 ns/B 43.48 MiB/s 33.69 c/B
GCM enc | 33.66 ns/B 28.33 MiB/s 51.71 c/B
GCM dec | 33.66 ns/B 28.33 MiB/s 51.70 c/B
GCM auth | 11.69 ns/B 81.59 MiB/s 17.95 c/B
OCB enc | 22.90 ns/B 41.65 MiB/s 35.17 c/B
OCB dec | 23.25 ns/B 41.02 MiB/s 35.71 c/B
OCB auth | 22.69 ns/B 42.03 MiB/s 34.85 c/B
=
After (~1.2x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 16.40 ns/B 58.16 MiB/s 25.19 c/B
ECB dec | 17.01 ns/B 56.07 MiB/s 26.13 c/B
CBC enc | 13.99 ns/B 68.15 MiB/s 21.49 c/B
CBC dec | 14.04 ns/B 67.94 MiB/s 21.56 c/B
CFB enc | 13.96 ns/B 68.32 MiB/s 21.44 c/B
CFB dec | 13.95 ns/B 68.34 MiB/s 21.43 c/B
OFB enc | 17.14 ns/B 55.65 MiB/s 26.32 c/B
OFB dec | 17.13 ns/B 55.67 MiB/s 26.31 c/B
CTR enc | 14.17 ns/B 67.31 MiB/s 21.76 c/B
CTR dec | 14.17 ns/B 67.29 MiB/s 21.77 c/B
CCM enc | 28.16 ns/B 33.86 MiB/s 43.26 c/B
CCM dec | 28.16 ns/B 33.87 MiB/s 43.26 c/B
CCM auth | 14.08 ns/B 67.71 MiB/s 21.63 c/B
GCM enc | 25.82 ns/B 36.94 MiB/s 39.66 c/B
GCM dec | 25.82 ns/B 36.94 MiB/s 39.65 c/B
GCM auth | 11.67 ns/B 81.74 MiB/s 17.92 c/B
OCB enc | 14.78 ns/B 64.55 MiB/s 22.69 c/B
OCB dec | 14.80 ns/B 64.43 MiB/s 22.74 c/B
OCB auth | 14.59 ns/B 65.36 MiB/s 22.41 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 19.05 ns/B 50.07 MiB/s 29.25 c/B
ECB dec | 19.62 ns/B 48.62 MiB/s 30.13 c/B
CBC enc | 16.56 ns/B 57.59 MiB/s 25.44 c/B
CBC dec | 16.69 ns/B 57.14 MiB/s 25.64 c/B
CFB enc | 16.52 ns/B 57.71 MiB/s 25.38 c/B
CFB dec | 16.52 ns/B 57.73 MiB/s 25.37 c/B
OFB enc | 19.70 ns/B 48.41 MiB/s 30.26 c/B
OFB dec | 19.69 ns/B 48.43 MiB/s 30.24 c/B
CTR enc | 16.73 ns/B 57.00 MiB/s 25.70 c/B
CTR dec | 16.73 ns/B 57.01 MiB/s 25.70 c/B
CCM enc | 33.29 ns/B 28.65 MiB/s 51.13 c/B
CCM dec | 33.29 ns/B 28.65 MiB/s 51.13 c/B
CCM auth | 16.65 ns/B 57.29 MiB/s 25.57 c/B
GCM enc | 28.39 ns/B 33.60 MiB/s 43.60 c/B
GCM dec | 28.39 ns/B 33.59 MiB/s 43.60 c/B
GCM auth | 11.64 ns/B 81.92 MiB/s 17.88 c/B
OCB enc | 17.33 ns/B 55.03 MiB/s 26.62 c/B
OCB dec | 17.40 ns/B 54.82 MiB/s 26.72 c/B
OCB auth | 17.16 ns/B 55.59 MiB/s 26.35 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.56 ns/B 44.23 MiB/s 33.12 c/B
ECB dec | 22.09 ns/B 43.17 MiB/s 33.93 c/B
CBC enc | 19.09 ns/B 49.97 MiB/s 29.31 c/B
CBC dec | 19.13 ns/B 49.86 MiB/s 29.38 c/B
CFB enc | 19.04 ns/B 50.09 MiB/s 29.24 c/B
CFB dec | 19.04 ns/B 50.08 MiB/s 29.25 c/B
OFB enc | 22.22 ns/B 42.93 MiB/s 34.13 c/B
OFB dec | 22.22 ns/B 42.92 MiB/s 34.13 c/B
CTR enc | 19.25 ns/B 49.53 MiB/s 29.57 c/B
CTR dec | 19.25 ns/B 49.55 MiB/s 29.57 c/B
CCM enc | 38.33 ns/B 24.88 MiB/s 58.88 c/B
CCM dec | 38.34 ns/B 24.88 MiB/s 58.88 c/B
CCM auth | 19.17 ns/B 49.76 MiB/s 29.44 c/B
GCM enc | 30.91 ns/B 30.86 MiB/s 47.47 c/B
GCM dec | 30.91 ns/B 30.85 MiB/s 47.48 c/B
GCM auth | 11.71 ns/B 81.47 MiB/s 17.98 c/B
OCB enc | 19.85 ns/B 48.04 MiB/s 30.49 c/B
OCB dec | 19.89 ns/B 47.95 MiB/s 30.55 c/B
OCB auth | 19.67 ns/B 48.48 MiB/s 30.22 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and
'rijndael-armv-aarch32-ce.S'.
* cipher/rijndael-armv8-aarch32-ce.S: New.
* cipher/rijndael-armv8-ce.c: New.
* cipher/rijndael-internal.h (USE_ARM_CE): New.
(RIJNDAEL_context_s): Add 'use_arm_ce'.
* cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey)
(_gcry_aes_armv8_ce_prepare_decryption)
(_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt)
(_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc)
(_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec)
(_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt)
(_gcry_aes_armv8_ce_ocb_auth): New.
(do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key
setup for ARM CE.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add
ARM CE support.
* configure.ac: Add 'rijndael-armv8-ce.lo' and
'rijndael-armv8-aarch32-ce.lo'.
--
Improvement vs ARM assembly on Cortex-A53:
AES-128 AES-192 AES-256
CBC enc: 14.8x 12.8x 11.4x
CBC dec: 21.4x 20.5x 19.4x
CFB enc: 16.2x 13.6x 11.6x
CFB dec: 21.6x 20.5x 19.4x
CTR: 19.1x 18.6x 17.8x
OCB enc: 16.0x 16.2x 16.1x
OCB dec: 15.6x 15.9x 15.8x
OCB auth: 18.3x 18.4x 18.0x
Benchmark on Cortex-A53 (1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 24.42 ns/B 39.06 MiB/s 28.13 c/B
ECB dec | 25.07 ns/B 38.05 MiB/s 28.88 c/B
CBC enc | 21.05 ns/B 45.30 MiB/s 24.25 c/B
CBC dec | 21.16 ns/B 45.07 MiB/s 24.38 c/B
CFB enc | 21.05 ns/B 45.31 MiB/s 24.25 c/B
CFB dec | 21.38 ns/B 44.61 MiB/s 24.62 c/B
OFB enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B
OFB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B
CTR enc | 21.17 ns/B 45.06 MiB/s 24.38 c/B
CTR dec | 21.16 ns/B 45.06 MiB/s 24.38 c/B
CCM enc | 42.32 ns/B 22.53 MiB/s 48.75 c/B
CCM dec | 42.32 ns/B 22.53 MiB/s 48.75 c/B
CCM auth | 21.17 ns/B 45.06 MiB/s 24.38 c/B
GCM enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B
GCM dec | 22.08 ns/B 43.18 MiB/s 25.44 c/B
GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B
OCB enc | 26.20 ns/B 36.40 MiB/s 30.18 c/B
OCB dec | 25.97 ns/B 36.73 MiB/s 29.91 c/B
OCB auth | 24.52 ns/B 38.90 MiB/s 28.24 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 27.83 ns/B 34.26 MiB/s 32.06 c/B
ECB dec | 28.54 ns/B 33.42 MiB/s 32.88 c/B
CBC enc | 24.47 ns/B 38.97 MiB/s 28.19 c/B
CBC dec | 25.27 ns/B 37.74 MiB/s 29.11 c/B
CFB enc | 25.08 ns/B 38.02 MiB/s 28.89 c/B
CFB dec | 25.31 ns/B 37.68 MiB/s 29.16 c/B
OFB enc | 29.57 ns/B 32.25 MiB/s 34.06 c/B
OFB dec | 29.57 ns/B 32.25 MiB/s 34.06 c/B
CTR enc | 25.24 ns/B 37.78 MiB/s 29.08 c/B
CTR dec | 25.24 ns/B 37.79 MiB/s 29.08 c/B
CCM enc | 49.81 ns/B 19.15 MiB/s 57.38 c/B
CCM dec | 49.80 ns/B 19.15 MiB/s 57.37 c/B
CCM auth | 24.58 ns/B 38.80 MiB/s 28.32 c/B
GCM enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B
GCM dec | 26.11 ns/B 36.52 MiB/s 30.08 c/B
GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B
OCB enc | 29.59 ns/B 32.23 MiB/s 34.09 c/B
OCB dec | 29.42 ns/B 32.42 MiB/s 33.89 c/B
OCB auth | 27.92 ns/B 34.16 MiB/s 32.16 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 31.20 ns/B 30.57 MiB/s 35.94 c/B
ECB dec | 31.80 ns/B 29.99 MiB/s 36.63 c/B
CBC enc | 27.83 ns/B 34.27 MiB/s 32.06 c/B
CBC dec | 27.87 ns/B 34.21 MiB/s 32.11 c/B
CFB enc | 27.88 ns/B 34.20 MiB/s 32.12 c/B
CFB dec | 28.16 ns/B 33.87 MiB/s 32.44 c/B
OFB enc | 32.93 ns/B 28.96 MiB/s 37.94 c/B
OFB dec | 32.93 ns/B 28.96 MiB/s 37.94 c/B
CTR enc | 27.95 ns/B 34.13 MiB/s 32.19 c/B
CTR dec | 27.95 ns/B 34.12 MiB/s 32.20 c/B
CCM enc | 55.88 ns/B 17.07 MiB/s 64.38 c/B
CCM dec | 55.88 ns/B 17.07 MiB/s 64.38 c/B
CCM auth | 27.95 ns/B 34.12 MiB/s 32.20 c/B
GCM enc | 28.86 ns/B 33.05 MiB/s 33.25 c/B
GCM dec | 28.87 ns/B 33.04 MiB/s 33.25 c/B
GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B
OCB enc | 32.96 ns/B 28.94 MiB/s 37.97 c/B
OCB dec | 32.73 ns/B 29.14 MiB/s 37.70 c/B
OCB auth | 31.29 ns/B 30.48 MiB/s 36.04 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.10 ns/B 187.0 MiB/s 5.88 c/B
ECB dec | 5.27 ns/B 181.0 MiB/s 6.07 c/B
CBC enc | 1.41 ns/B 675.8 MiB/s 1.63 c/B
CBC dec | 0.992 ns/B 961.7 MiB/s 1.14 c/B
CFB enc | 1.30 ns/B 732.4 MiB/s 1.50 c/B
CFB dec | 0.991 ns/B 962.7 MiB/s 1.14 c/B
OFB enc | 7.05 ns/B 135.2 MiB/s 8.13 c/B
OFB dec | 7.05 ns/B 135.2 MiB/s 8.13 c/B
CTR enc | 1.11 ns/B 856.9 MiB/s 1.28 c/B
CTR dec | 1.11 ns/B 857.0 MiB/s 1.28 c/B
CCM enc | 2.58 ns/B 369.8 MiB/s 2.97 c/B
CCM dec | 2.58 ns/B 369.5 MiB/s 2.97 c/B
CCM auth | 1.58 ns/B 605.2 MiB/s 1.82 c/B
GCM enc | 2.04 ns/B 467.9 MiB/s 2.35 c/B
GCM dec | 2.04 ns/B 466.6 MiB/s 2.35 c/B
GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B
OCB enc | 1.64 ns/B 579.8 MiB/s 1.89 c/B
OCB dec | 1.66 ns/B 574.5 MiB/s 1.91 c/B
OCB auth | 1.33 ns/B 715.5 MiB/s 1.54 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.64 ns/B 169.0 MiB/s 6.50 c/B
ECB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B
CBC enc | 1.90 ns/B 502.1 MiB/s 2.19 c/B
CBC dec | 1.24 ns/B 771.7 MiB/s 1.42 c/B
CFB enc | 1.84 ns/B 517.1 MiB/s 2.12 c/B
CFB dec | 1.23 ns/B 772.5 MiB/s 1.42 c/B
OFB enc | 7.60 ns/B 125.5 MiB/s 8.75 c/B
OFB dec | 7.60 ns/B 125.6 MiB/s 8.75 c/B
CTR enc | 1.36 ns/B 702.7 MiB/s 1.56 c/B
CTR dec | 1.36 ns/B 702.5 MiB/s 1.56 c/B
CCM enc | 3.31 ns/B 287.8 MiB/s 3.82 c/B
CCM dec | 3.31 ns/B 288.0 MiB/s 3.81 c/B
CCM auth | 2.06 ns/B 462.1 MiB/s 2.38 c/B
GCM enc | 2.28 ns/B 418.4 MiB/s 2.63 c/B
GCM dec | 2.28 ns/B 418.0 MiB/s 2.63 c/B
GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B
OCB enc | 1.83 ns/B 520.1 MiB/s 2.11 c/B
OCB dec | 1.84 ns/B 517.8 MiB/s 2.12 c/B
OCB auth | 1.52 ns/B 626.1 MiB/s 1.75 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.86 ns/B 162.7 MiB/s 6.75 c/B
ECB dec | 6.02 ns/B 158.3 MiB/s 6.94 c/B
CBC enc | 2.44 ns/B 390.5 MiB/s 2.81 c/B
CBC dec | 1.45 ns/B 656.4 MiB/s 1.67 c/B
CFB enc | 2.39 ns/B 399.5 MiB/s 2.75 c/B
CFB dec | 1.45 ns/B 656.8 MiB/s 1.67 c/B
OFB enc | 7.81 ns/B 122.1 MiB/s 9.00 c/B
OFB dec | 7.81 ns/B 122.1 MiB/s 9.00 c/B
CTR enc | 1.57 ns/B 605.8 MiB/s 1.81 c/B
CTR dec | 1.57 ns/B 605.9 MiB/s 1.81 c/B
CCM enc | 4.07 ns/B 234.3 MiB/s 4.69 c/B
CCM dec | 4.07 ns/B 234.1 MiB/s 4.69 c/B
CCM auth | 2.61 ns/B 365.7 MiB/s 3.00 c/B
GCM enc | 2.50 ns/B 381.9 MiB/s 2.88 c/B
GCM dec | 2.49 ns/B 382.3 MiB/s 2.87 c/B
GCM auth | 0.926 ns/B 1029.7 MiB/s 1.07 c/B
OCB enc | 2.05 ns/B 465.6 MiB/s 2.36 c/B
OCB dec | 2.06 ns/B 462.0 MiB/s 2.38 c/B
OCB auth | 1.74 ns/B 548.4 MiB/s 2.00 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-amd64.S: Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(ELF): New macro to mask lines with ELF specific commands.
* cipher/rijndael-internal.h (USE_AMD64_ASM): Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(do_encrypt, do_decrypt)
[USE_AMD64_ASM && !HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS]: Use
assembly block to call AMD64 assembly encrypt/decrypt function.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector
registers before use and restore after.
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency
on !defined(__WIN64__).
* cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable,
aesni_prepare, aesni_prepare_2_6, aesni_cleanup)
( aesni_cleanup_2_6): New.
[!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New.
(_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc)
(_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec)
(_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use
'aesni_prepare_2_6'.
* cipher/rijndael-internal.h (USE_SSSE3): Enable if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS.
(USE_AESNI): Remove dependency on !defined(__WIN64__)
* cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]
(vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New.
[!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New.
(vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use
'vpaes_ssse3_prepare'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use
'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to
exclude '.type' and '.size' markers from assembly code, as they are
not support on WIN64/COFF objects.
* configure.ac (gcry_cv_gcc_attribute_ms_abi)
(gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi)
(gcry_cv_gcc_default_abi_is_sysv_abi)
(gcry_cv_gcc_win64_platform_as_ok): New checks.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when
__WIN64__ defined.
* cipher/rijndael-internal.h (USE_AESNI): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'.
* cipher/rijndael-internal.h (USE_SSSE3): New.
(RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'.
* cipher/rijndael-ssse3-amd64.c: New.
* cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey)
(_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New.
(do_setkey): Add HWF check for SSSE3 and setup for SSSE3
implementation.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add
selection for SSSE3 implementation.
* configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'.
--
This patch adds "AES with vector permutations" implementation by
Mike Hamburg. Public-domain source-code is available at:
http://crypto.stanford.edu/vpaes/
Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.79 ns/B 108.5 MiB/s 18.46 c/B
ECB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CBC enc | 7.77 ns/B 122.7 MiB/s 16.33 c/B
CBC dec | 7.74 ns/B 123.2 MiB/s 16.26 c/B
CFB enc | 7.88 ns/B 121.0 MiB/s 16.54 c/B
CFB dec | 7.56 ns/B 126.1 MiB/s 15.88 c/B
OFB enc | 9.02 ns/B 105.8 MiB/s 18.94 c/B
OFB dec | 9.07 ns/B 105.1 MiB/s 19.05 c/B
CTR enc | 7.80 ns/B 122.2 MiB/s 16.38 c/B
CTR dec | 7.81 ns/B 122.2 MiB/s 16.39 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.77 ns/B 165.2 MiB/s 12.13 c/B
ECB dec | 7.13 ns/B 133.7 MiB/s 14.98 c/B
CBC enc | 5.27 ns/B 181.0 MiB/s 11.06 c/B
CBC dec | 6.39 ns/B 149.3 MiB/s 13.42 c/B
CFB enc | 5.27 ns/B 180.9 MiB/s 11.07 c/B
CFB dec | 5.28 ns/B 180.7 MiB/s 11.08 c/B
OFB enc | 6.11 ns/B 156.1 MiB/s 12.83 c/B
OFB dec | 6.13 ns/B 155.5 MiB/s 12.88 c/B
CTR enc | 5.26 ns/B 181.5 MiB/s 11.04 c/B
CTR dec | 5.24 ns/B 182.0 MiB/s 11.00 c/B
Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled):
Old (AMD64 asm):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 8.06 ns/B 118.3 MiB/s 20.15 c/B
ECB dec | 8.21 ns/B 116.1 MiB/s 20.53 c/B
CBC enc | 7.88 ns/B 121.1 MiB/s 19.69 c/B
CBC dec | 7.57 ns/B 126.0 MiB/s 18.92 c/B
CFB enc | 7.87 ns/B 121.2 MiB/s 19.67 c/B
CFB dec | 7.56 ns/B 126.2 MiB/s 18.89 c/B
OFB enc | 8.27 ns/B 115.3 MiB/s 20.67 c/B
OFB dec | 8.28 ns/B 115.1 MiB/s 20.71 c/B
CTR enc | 8.02 ns/B 119.0 MiB/s 20.04 c/B
CTR dec | 8.02 ns/B 118.9 MiB/s 20.05 c/B
New (ssse3):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.03 ns/B 236.6 MiB/s 10.07 c/B
ECB dec | 5.28 ns/B 180.8 MiB/s 13.19 c/B
CBC enc | 3.77 ns/B 252.7 MiB/s 9.43 c/B
CBC dec | 4.69 ns/B 203.3 MiB/s 11.73 c/B
CFB enc | 3.75 ns/B 254.3 MiB/s 9.37 c/B
CFB dec | 3.69 ns/B 258.6 MiB/s 9.22 c/B
OFB enc | 4.17 ns/B 228.7 MiB/s 10.43 c/B
OFB dec | 4.17 ns/B 228.7 MiB/s 10.42 c/B
CTR enc | 3.72 ns/B 256.5 MiB/s 9.30 c/B
CTR dec | 3.72 ns/B 256.1 MiB/s 9.31 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add u32 variants of
keyschedule arrays to unions u1 and u2.
(keyschedenc32, keyscheddec32): New.
* cipher/rijndael.c (u32_a_t): Remove.
(do_setkey): Add and use tkk[].data32, k_u32, tk_u32 and W_u32; Remove
casting byte arrays to u32_a_t.
(prepare_decryption, do_encrypt_fn, do_decrypt_fn): Use keyschedenc32
and keyscheddec32; Remove casting byte arrays to u32_a_t.
--
Patch fixes 'cast increases required alignment' compiler warnings that GCC was showing:
rijndael.c: In function 'do_setkey':
rijndael.c:310:13: warning: cast increases required alignment of target type [-Wcast-align]
*((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
^
rijndael.c:310:34: warning: cast increases required alignment of target type [-Wcast-align]
*((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
[removed the rest]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-internal.h (rijndael_prefetchfn_t): New.
(RIJNDAEL_context): Add 'prefetch_enc_fn' and 'prefetch_dec_fn'.
* cipher/rijndael-tables.h (S, T1, T2, T3, T4, T5, T6, T7, T8, S5, U1)
(U2, U3, U4): Remove.
(encT, dec_tables, decT, inv_sbox): Add.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_encrypt_block): Add parameter for passing table pointer
to assembly implementation.
(prefetch_table, prefetch_enc, prefetch_dec): New.
(do_setkey): Setup context prefetch functions depending on selected
rijndael implementation; Use new tables for key setup.
(prepare_decryption): Use new tables for decryption key setup.
(do_encrypt_aligned): Rename to...
(do_encrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_encrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_encrypt, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec): Prefetch encryption tables
before encryption.
(do_decrypt_aligned): Rename to...
(do_decrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_decrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_decrypt, _gcry_aes_cbc_dec): Prefetch decryption tables
before decryption.
* cipher/rijndael-amd64.S: Use 1+1.25 KiB tables for
encryption+decryption; remove tables from assembly file.
* cipher/rijndael-arm.S: Ditto.
--
Patch replaces 4+4.25 KiB look-up tables in generic implementation and
8+8 KiB look-up tables in AMD64 implementation and 2+2 KiB look-up tables in
ARM implementation with 1+1.25 KiB look-up tables, and adds prefetching of
look-up tables.
AMD64 assembly is slower than before because of additional rotation
instructions. The generic C implementation is now better optimized and
actually faster than before.
Benchmark results on Intel i5-4570 (turbo off) (64-bit, AMD64 assembly):
tests/bench-slope --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.10 ns/B 307.5 MiB/s 9.92 c/B
ECB dec | 3.15 ns/B 302.5 MiB/s 10.09 c/B
CBC enc | 3.46 ns/B 275.5 MiB/s 11.08 c/B
CBC dec | 3.19 ns/B 299.2 MiB/s 10.20 c/B
CFB enc | 3.48 ns/B 274.4 MiB/s 11.12 c/B
CFB dec | 3.23 ns/B 294.8 MiB/s 10.35 c/B
OFB enc | 3.29 ns/B 290.2 MiB/s 10.52 c/B
OFB dec | 3.31 ns/B 288.3 MiB/s 10.58 c/B
CTR enc | 3.64 ns/B 261.7 MiB/s 11.66 c/B
CTR dec | 3.65 ns/B 261.6 MiB/s 11.67 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.21 ns/B 226.7 MiB/s 13.46 c/B
ECB dec | 4.27 ns/B 223.2 MiB/s 13.67 c/B
CBC enc | 4.15 ns/B 229.8 MiB/s 13.28 c/B
CBC dec | 3.85 ns/B 247.8 MiB/s 12.31 c/B
CFB enc | 4.16 ns/B 229.1 MiB/s 13.32 c/B
CFB dec | 3.88 ns/B 245.9 MiB/s 12.41 c/B
OFB enc | 4.38 ns/B 217.8 MiB/s 14.01 c/B
OFB dec | 4.36 ns/B 218.6 MiB/s 13.96 c/B
CTR enc | 4.30 ns/B 221.6 MiB/s 13.77 c/B
CTR dec | 4.30 ns/B 221.7 MiB/s 13.76 c/B
Benchmark on Intel i5-4570 (turbo off) (32-bit mingw, generic C):
tests/bench-slope.exe --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 6.03 ns/B 158.2 MiB/s 19.29 c/B
ECB dec | 5.81 ns/B 164.1 MiB/s 18.60 c/B
CBC enc | 6.22 ns/B 153.4 MiB/s 19.90 c/B
CBC dec | 5.91 ns/B 161.3 MiB/s 18.92 c/B
CFB enc | 6.25 ns/B 152.7 MiB/s 19.99 c/B
CFB dec | 6.24 ns/B 152.8 MiB/s 19.97 c/B
OFB enc | 6.33 ns/B 150.6 MiB/s 20.27 c/B
OFB dec | 6.33 ns/B 150.7 MiB/s 20.25 c/B
CTR enc | 6.28 ns/B 152.0 MiB/s 20.08 c/B
CTR dec | 6.28 ns/B 151.7 MiB/s 20.11 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.02 ns/B 190.0 MiB/s 16.06 c/B
ECB dec | 5.33 ns/B 178.8 MiB/s 17.07 c/B
CBC enc | 4.64 ns/B 205.4 MiB/s 14.86 c/B
CBC dec | 4.95 ns/B 192.7 MiB/s 15.84 c/B
CFB enc | 4.75 ns/B 200.7 MiB/s 15.20 c/B
CFB dec | 4.74 ns/B 201.1 MiB/s 15.18 c/B
OFB enc | 5.29 ns/B 180.3 MiB/s 16.93 c/B
OFB dec | 5.29 ns/B 180.3 MiB/s 16.93 c/B
CTR enc | 4.77 ns/B 200.0 MiB/s 15.26 c/B
CTR dec | 4.77 ns/B 199.8 MiB/s 15.27 c/B
Benchmark on Cortex-A8 (ARM assembly):
tests/bench-slope --cpu-mhz 1008 cipher aes
Old:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.84 ns/B 43.66 MiB/s 22.02 c/B
ECB dec | 22.35 ns/B 42.67 MiB/s 22.53 c/B
CBC enc | 22.97 ns/B 41.53 MiB/s 23.15 c/B
CBC dec | 23.48 ns/B 40.61 MiB/s 23.67 c/B
CFB enc | 22.72 ns/B 41.97 MiB/s 22.90 c/B
CFB dec | 23.41 ns/B 40.74 MiB/s 23.59 c/B
OFB enc | 23.65 ns/B 40.32 MiB/s 23.84 c/B
OFB dec | 23.67 ns/B 40.29 MiB/s 23.86 c/B
CTR enc | 23.24 ns/B 41.03 MiB/s 23.43 c/B
CTR dec | 23.23 ns/B 41.05 MiB/s 23.42 c/B
New:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 26.03 ns/B 36.64 MiB/s 26.24 c/B
ECB dec | 26.97 ns/B 35.36 MiB/s 27.18 c/B
CBC enc | 23.21 ns/B 41.09 MiB/s 23.39 c/B
CBC dec | 23.36 ns/B 40.83 MiB/s 23.54 c/B
CFB enc | 23.02 ns/B 41.42 MiB/s 23.21 c/B
CFB dec | 23.67 ns/B 40.28 MiB/s 23.86 c/B
OFB enc | 27.86 ns/B 34.24 MiB/s 28.08 c/B
OFB dec | 27.87 ns/B 34.21 MiB/s 28.10 c/B
CTR enc | 23.47 ns/B 40.63 MiB/s 23.66 c/B
CTR dec | 23.49 ns/B 40.61 MiB/s 23.67 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt): Make return stack burn depth.
* cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): Ditto.
* cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Ditto.
* cipher/rijndael-internal.h (RIJNDAEL_context_s)
(rijndael_cryptfn_t): New.
(RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Change prototypes.
(do_padlock_encrypt, do_padlock_decrypt): New.
(do_setkey): Separate key-length to rounds conversion from
HW features check; Add selection for ctx->encrypt_fn and
ctx->decrypt_fn.
(do_encrypt_aligned, do_decrypt_aligned): Move inside
'[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and
USE_ARM_ASM to...
(do_encrypt, do_decrypt): ...here; Return stack depth; Remove second
temporary buffer from non-aligned input/output case.
(do_padlock): Move decrypt_flag to last argument; Return stack depth.
(rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn.
(_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call
ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned.
(_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of
do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer
after use.
(rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn.
(_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place
of do_decrypt/do_decrypt_aligned.
(_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.in: Add 'rijndael-aesni.c'.
* cipher/rijndael-aesni.c: New.
* cipher/rijndael-internal.h: New.
* cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16)
(USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context)
(keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'.
(u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6)
(aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4)
(do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move
to 'rijndael-aesni.c'.
(prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc)
(_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions
in 'rijdael-aesni.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'.
--
Clean-up rijndael.c before new new hardware acceleration support gets added.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|