|
* configure.ac: Add 'rijndael-ppc9le.lo'.
* cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h'
and 'rijndael-ppc-functions.h'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New.
(RIJNDAEL_context_s): Add 'use_ppc9le_crypto'.
* cipher/rijndael.c (_gcry_aes_ppc9le_encrypt)
(_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc)
(_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc)
(_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec)
(_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth)
(_gcry_aes_ppc9le_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt)
[USE_PPC_CRYPTO_WITH_PPC9LE]: New.
* cipher/rijndael-ppc.c: Split common code to headers
'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'.
* cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'.
(asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New.
* cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'.
(CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2.
(XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm
instruction.
* cipher/rijndael-ppc9le.c: New.
--
Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.
Benchmark on POWER9 (~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B
CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B
CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B
CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B
CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B
CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B
XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B
XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B
OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B
OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B
OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B
After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B
CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B
CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B
CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B
CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B
CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B
XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B
XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B
OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B
OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B
OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|