summaryrefslogtreecommitdiff
path: root/cipher/rijndael-armv8-ce.c
Commit message (Collapse)AuthorAgeFilesLines
* rijndael: add ECB acceleration (for benchmarking purposes)Jussi Kivilinna2022-10-261-51/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'. * cipher/cipher.c (do_ecb_crypt): Use bulk function if available. * cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label '.Ldeclast' to '.Lenclast'. (_gcry_aes_aesni_ecb_crypt): New. * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce) (_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change return value from void to size_t. (ocb_crypt_fn_t, xts_crypt_fn_t): Remove. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove indirect function call; Return value from called function (allows tail call optimization). (_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows tail call optimization). (_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce) (_gcry_aes_armv8_ce_ecb_crypt): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ecb_crypt_amd64): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64) (_gcry_aes_vaes_ecb_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt) (_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New. (do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE. -- Benchmark on AMD Ryzen 9 7900X: Before (OCB for reference): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1 ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608 OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625 OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625 After: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625 ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Simplify AES key schedule implementationJussi Kivilinna2022-07-311-80/+24
| | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_setkey): New key schedule with simplified structure and less stack usage. * cipher/rijndael-internal.h (RIJNDAEL_context_s): Add 'keyschedule32b'. (keyschenc32b): New. * cipher/rijndael-ppc-common.h (vec_u32): New. * cipher/rijndael-ppc.c (vec_bswap32_const): Remove. (_gcry_aes_sbox4_ppc8): Optimize for less instructions emitted. (keysched_idx): New. (_gcry_aes_ppc8_setkey): New key schedule with simplified structure. * cipher/rijndael-tables.h (rcon): Remove. * cipher/rijndael.c (sbox4): New. (do_setkey): New key schedule with simplified structure and less stack usage. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: move CBC/CFB/CTR self-tests to tests/basicJussi Kivilinna2022-05-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Remove 'cipher-selftest.c' and 'cipher-selftest.h'. * cipher/cipher-selftest.c: Remove (refactor these tests to tests/basic.c). * cipher/cipher-selftest.h: Remove. * cipher/blowfish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/camellia-glue.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/cast5.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/des.c (bulk_selftest_setkey, selftest_ctr, selftest_cbc) (selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/rijndael.c (selftest_basic_128, selftest_basic_192) (selftest_basic_256): Allocate context from stack instead of heap and handle alignment manually. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/serpent.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/sm4.c (selftest_ctr_128, selftest_cbc_128) (selftest_cfb_128): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * cipher/twofish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove. (selftest): Remove CTR/CBC/CFB bulk self-tests. * tests/basic.c (buf_xor, cipher_cbc_bulk_test, buf_xor_2dst) (cipher_cfb_bulk_test, cipher_ctr_bulk_test): New. (check_ciphers): Run cipher_cbc_bulk_test(), cipher_cfb_bulk_test() and cipher_ctr_bulk_test() for block ciphers. --- CBC/CFB/CTR bulk self-tests are quite computationally heavy and slow down use cases where application opens cipher context once, does processing and exits. Better place for these tests is in `tests/basic`. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8-CE HW acceleration for GCM-SIV counter modeJussi Kivilinna2021-08-261-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ctr32le_enc_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ctr32le_enc_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_ctr32le_enc_armv8_ce) (_gcry_aes_armv8_ce_ctr32le_enc): New. * cipher/rijndael.c (_gcry_aes_armv8_ce_ctr32le_enc): New prototype. (do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for ARMv8-CE. -- Benchmark on Cortex-A53 (aarch64): Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 11.77 ns/B 81.03 MiB/s 7.63 c/B 647.9 GCM-SIV dec | 11.92 ns/B 79.98 MiB/s 7.73 c/B 647.9 GCM-SIV auth | 2.99 ns/B 318.9 MiB/s 1.94 c/B 648.0 After (~2.4x faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 4.66 ns/B 204.5 MiB/s 3.02 c/B 647.9 GCM-SIV dec | 4.82 ns/B 198.0 MiB/s 3.12 c/B 647.9 GCM-SIV auth | 3.00 ns/B 318.4 MiB/s 1.94 c/B 648.0 GnuPG-bug-id: T4485 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* AES: setup cipher object bulk routines with optimized versionsJussi Kivilinna2018-06-191-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-aesni.c (_gcry_aes_aesni_prepare_decryption): Rename... (do_aesni_prepare_decryption): .. to this. (_gcry_aes_aesni_prepare_decryption): New. (_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc) (_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec) (_gcry_aes_aesni_cbc_dec): Reorder parameters to match bulk operations. (_gcry_aes_aesni_cbc_dec, aesni_ocb_dec) (_gcry_aes_aesni_xts_dec): Check and prepare decryption. (_gcry_aes_aesni_ocb_crypt, _gcry_aes_aesni_ocb_auth): Change return type to size_t. * cipher/rijndael-armv8-ce.c (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec): Reorder parameters to match bulk operations. (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_xts_dec): Check and prepare decryption. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_ocb_auth): Change return type to size_t. * cipher/rijndael-ssse3-amd64.c (_gcry_ssse3_prepare_decryption): Rename... (do_ssse3_prepare_decryption): .. to this. (_gcry_ssse3_prepare_decryption): New. (_gcry_aes_ssse3_cfb_enc, _gcry_aes_ssse3_cbc_enc) (_gcry_aes_ssse3_ctr_enc, _gcry_aes_ssse3_cfb_dec) (_gcry_aes_ssse3_cbc_dec): Reorder parameters to match bulk operations. (_gcry_aes_ssse3_cbc_dec, ssse3_ocb_dec): Check and prepare decryption. (_gcry_aes_ssse3_ocb_crypt, _gcry_aes_ssse3_ocb_auth): Change return type to size_t. * cipher/rijndael.c (_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc) (_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec) (_gcry_aes_aesni_cbc_dec, _gcry_aes_aesni_ocb_crypt) (_gcry_aes_aesni_ocb_auth, _gcry_aes_aesni_xts_crypt) (_gcry_aes_ssse3_cfb_enc, _gcry_aes_ssse3_cbc_enc) (_gcry_aes_ssse3_ctr_enc, _gcry_aes_ssse3_cfb_dec) (_gcry_aes_ssse3_cbc_dec, _gcry_aes_ssse3_ocb_crypt) (_gcry_aes_ssse3_ocb_auth, _gcry_aes_ssse3_xts_crypt) (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_ocb_auth, _gcry_aes_armv8_ce_xts_crypt): Change prototypes to match bulk operations. (do_setkey): Setup bulk operations with optimized implementations. (_gcry_aes_cfb_enc, _gcry_aes_cbc_enc, _gcry_aes_ctr_enc) (_gcry_aes_cfb_dec, _gcry_aes_cbc_dec, _gcry_aes_ocb_crypt) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt): Update usage to match new prototypes, avoid prefetch and decryption preparation on optimized code paths. -- Replace bulk operation functions of cipher object with faster version for reduced per call overhead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/CE acceleration for AES-XTSJussi Kivilinna2018-01-201-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce, xts_crypt_fn_t) (_gcry_aes_armv8_ce_xts_crypt): New. * cipher/rijndael.c (_gcry_aes_armv8_ce_xts_crypt): New. (_gcry_aes_xts_crypt) [USE_ARM_CE]: New. -- Benchmark on Cortex-A53 (AArch64, 1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 4.88 ns/B 195.5 MiB/s 5.62 c/B XTS dec | 4.94 ns/B 192.9 MiB/s 5.70 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 5.55 ns/B 171.8 MiB/s 6.39 c/B XTS dec | 5.61 ns/B 169.9 MiB/s 6.47 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 6.22 ns/B 153.3 MiB/s 7.17 c/B XTS dec | 6.29 ns/B 151.7 MiB/s 7.24 c/B = After (~2.6x faster): AES | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 1.83 ns/B 520.9 MiB/s 2.11 c/B XTS dec | 1.82 ns/B 524.9 MiB/s 2.09 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 1.97 ns/B 483.3 MiB/s 2.27 c/B XTS dec | 1.96 ns/B 486.9 MiB/s 2.26 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 2.11 ns/B 450.9 MiB/s 2.44 c/B XTS dec | 2.10 ns/B 453.8 MiB/s 2.42 c/B = Benchmark on Cortex-A53 (AArch32, 1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 6.52 ns/B 146.2 MiB/s 7.51 c/B XTS dec | 6.57 ns/B 145.2 MiB/s 7.57 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 7.10 ns/B 134.3 MiB/s 8.18 c/B XTS dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 7.30 ns/B 130.7 MiB/s 8.41 c/B XTS dec | 7.38 ns/B 129.3 MiB/s 8.50 c/B = After (~2.7x faster): Cipher: AES | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 2.33 ns/B 409.6 MiB/s 2.68 c/B XTS dec | 2.35 ns/B 405.3 MiB/s 2.71 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 2.53 ns/B 377.6 MiB/s 2.91 c/B XTS dec | 2.54 ns/B 375.5 MiB/s 2.93 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte XTS enc | 2.75 ns/B 346.8 MiB/s 3.17 c/B XTS dec | 2.76 ns/B 345.2 MiB/s 3.18 c/B = Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* OCB ARM CE: Move ocb_get_l handling to assembly partJussi Kivilinna2016-12-101-113/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/rijndael-armv8-aarch32-ce.S: Add OCB 'L_{ntz(i)}' calculation. * cipher/rijndael-armv8-aarch64-ce.S: Ditto. * cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce) (_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce) (ocb_cryt_fn_t): Updated arguments. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_ocb_auth): Remove 'ocb_get_l' handling and splitting input to 32 block chunks, instead pass full buffers to assembly. -- Performance on Cortex-A53 (AArch32): Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 1.63 ns/B 583.8 MiB/s 1.88 c/B OCB dec | 1.67 ns/B 572.1 MiB/s 1.92 c/B OCB auth | 1.33 ns/B 717.1 MiB/s 1.53 c/B After (~12% faster): AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 1.47 ns/B 650.2 MiB/s 1.69 c/B OCB dec | 1.48 ns/B 644.5 MiB/s 1.70 c/B OCB auth | 1.19 ns/B 798.2 MiB/s 1.38 c/B Performance on Cortex-A53 (AArch64): Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 1.29 ns/B 738.5 MiB/s 1.49 c/B OCB dec | 1.32 ns/B 723.5 MiB/s 1.52 c/B OCB auth | 1.15 ns/B 827.0 MiB/s 1.33 c/B After (~8% faster): AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 1.21 ns/B 789.1 MiB/s 1.39 c/B OCB dec | 1.21 ns/B 789.2 MiB/s 1.39 c/B OCB auth | 1.10 ns/B 867.0 MiB/s 1.27 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* OCB: Move large L handling from bottom to upper levelJussi Kivilinna2016-12-101-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-ocb.c (_gcry_cipher_ocb_get_l): Remove. (ocb_get_L_big): New. (_gcry_cipher_ocb_authenticate): L-big handling done in upper processing loop, so that lower level never sees the case where 'aad_nblocks % 65536 == 0'; Add missing stack burn. (ocb_aad_finalize): Add missing stack burn. (ocb_crypt): L-big handling done in upper processing loop, so that lower level never sees the case where 'data_nblocks % 65536 == 0'. * cipher/cipher-internal.h (_gcry_cipher_ocb_get_l): Remove. (ocb_get_l): Remove 'l_tmp' usage and simplify since input is more limited now, 'N is not multiple of 65536'. * cipher/rijndael-aesni.c (get_l): Remove. (aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Remove l_tmp; Use 'ocb_get_l'. * cipher/rijndael-ssse3-amd64.c (get_l): Remove. (ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Remove l_tmp; Use 'ocb_get_l'. * cipher/camellia-glue.c: Remove OCB l_tmp usage. * cipher/rijndael-armv8-ce.c: Ditto. * cipher/rijndael.c: Ditto. * cipher/serpent.c: Ditto. * cipher/twofish.c: Ditto. -- Move large L value generation to up-most level to simplify lower level ocb_get_l for greater performance and simpler implementation. This helps implementing OCB in assembly as 'ocb_get_l' no longer has function call on slow-path. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/AArch32 Crypto Extension implementation of AESJussi Kivilinna2016-07-141-0/+469
* cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and 'rijndael-armv-aarch32-ce.S'. * cipher/rijndael-armv8-aarch32-ce.S: New. * cipher/rijndael-armv8-ce.c: New. * cipher/rijndael-internal.h (USE_ARM_CE): New. (RIJNDAEL_context_s): Add 'use_arm_ce'. * cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey) (_gcry_aes_armv8_ce_prepare_decryption) (_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt) (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_ocb_auth): New. (do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key setup for ARM CE. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add ARM CE support. * configure.ac: Add 'rijndael-armv8-ce.lo' and 'rijndael-armv8-aarch32-ce.lo'. -- Improvement vs ARM assembly on Cortex-A53: AES-128 AES-192 AES-256 CBC enc: 14.8x 12.8x 11.4x CBC dec: 21.4x 20.5x 19.4x CFB enc: 16.2x 13.6x 11.6x CFB dec: 21.6x 20.5x 19.4x CTR: 19.1x 18.6x 17.8x OCB enc: 16.0x 16.2x 16.1x OCB dec: 15.6x 15.9x 15.8x OCB auth: 18.3x 18.4x 18.0x Benchmark on Cortex-A53 (1152 Mhz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 24.42 ns/B 39.06 MiB/s 28.13 c/B ECB dec | 25.07 ns/B 38.05 MiB/s 28.88 c/B CBC enc | 21.05 ns/B 45.30 MiB/s 24.25 c/B CBC dec | 21.16 ns/B 45.07 MiB/s 24.38 c/B CFB enc | 21.05 ns/B 45.31 MiB/s 24.25 c/B CFB dec | 21.38 ns/B 44.61 MiB/s 24.62 c/B OFB enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B OFB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B CTR enc | 21.17 ns/B 45.06 MiB/s 24.38 c/B CTR dec | 21.16 ns/B 45.06 MiB/s 24.38 c/B CCM enc | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM dec | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM auth | 21.17 ns/B 45.06 MiB/s 24.38 c/B GCM enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B GCM dec | 22.08 ns/B 43.18 MiB/s 25.44 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 26.20 ns/B 36.40 MiB/s 30.18 c/B OCB dec | 25.97 ns/B 36.73 MiB/s 29.91 c/B OCB auth | 24.52 ns/B 38.90 MiB/s 28.24 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 27.83 ns/B 34.26 MiB/s 32.06 c/B ECB dec | 28.54 ns/B 33.42 MiB/s 32.88 c/B CBC enc | 24.47 ns/B 38.97 MiB/s 28.19 c/B CBC dec | 25.27 ns/B 37.74 MiB/s 29.11 c/B CFB enc | 25.08 ns/B 38.02 MiB/s 28.89 c/B CFB dec | 25.31 ns/B 37.68 MiB/s 29.16 c/B OFB enc | 29.57 ns/B 32.25 MiB/s 34.06 c/B OFB dec | 29.57 ns/B 32.25 MiB/s 34.06 c/B CTR enc | 25.24 ns/B 37.78 MiB/s 29.08 c/B CTR dec | 25.24 ns/B 37.79 MiB/s 29.08 c/B CCM enc | 49.81 ns/B 19.15 MiB/s 57.38 c/B CCM dec | 49.80 ns/B 19.15 MiB/s 57.37 c/B CCM auth | 24.58 ns/B 38.80 MiB/s 28.32 c/B GCM enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B GCM dec | 26.11 ns/B 36.52 MiB/s 30.08 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 29.59 ns/B 32.23 MiB/s 34.09 c/B OCB dec | 29.42 ns/B 32.42 MiB/s 33.89 c/B OCB auth | 27.92 ns/B 34.16 MiB/s 32.16 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 31.20 ns/B 30.57 MiB/s 35.94 c/B ECB dec | 31.80 ns/B 29.99 MiB/s 36.63 c/B CBC enc | 27.83 ns/B 34.27 MiB/s 32.06 c/B CBC dec | 27.87 ns/B 34.21 MiB/s 32.11 c/B CFB enc | 27.88 ns/B 34.20 MiB/s 32.12 c/B CFB dec | 28.16 ns/B 33.87 MiB/s 32.44 c/B OFB enc | 32.93 ns/B 28.96 MiB/s 37.94 c/B OFB dec | 32.93 ns/B 28.96 MiB/s 37.94 c/B CTR enc | 27.95 ns/B 34.13 MiB/s 32.19 c/B CTR dec | 27.95 ns/B 34.12 MiB/s 32.20 c/B CCM enc | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM dec | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM auth | 27.95 ns/B 34.12 MiB/s 32.20 c/B GCM enc | 28.86 ns/B 33.05 MiB/s 33.25 c/B GCM dec | 28.87 ns/B 33.04 MiB/s 33.25 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 32.96 ns/B 28.94 MiB/s 37.97 c/B OCB dec | 32.73 ns/B 29.14 MiB/s 37.70 c/B OCB auth | 31.29 ns/B 30.48 MiB/s 36.04 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.10 ns/B 187.0 MiB/s 5.88 c/B ECB dec | 5.27 ns/B 181.0 MiB/s 6.07 c/B CBC enc | 1.41 ns/B 675.8 MiB/s 1.63 c/B CBC dec | 0.992 ns/B 961.7 MiB/s 1.14 c/B CFB enc | 1.30 ns/B 732.4 MiB/s 1.50 c/B CFB dec | 0.991 ns/B 962.7 MiB/s 1.14 c/B OFB enc | 7.05 ns/B 135.2 MiB/s 8.13 c/B OFB dec | 7.05 ns/B 135.2 MiB/s 8.13 c/B CTR enc | 1.11 ns/B 856.9 MiB/s 1.28 c/B CTR dec | 1.11 ns/B 857.0 MiB/s 1.28 c/B CCM enc | 2.58 ns/B 369.8 MiB/s 2.97 c/B CCM dec | 2.58 ns/B 369.5 MiB/s 2.97 c/B CCM auth | 1.58 ns/B 605.2 MiB/s 1.82 c/B GCM enc | 2.04 ns/B 467.9 MiB/s 2.35 c/B GCM dec | 2.04 ns/B 466.6 MiB/s 2.35 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 1.64 ns/B 579.8 MiB/s 1.89 c/B OCB dec | 1.66 ns/B 574.5 MiB/s 1.91 c/B OCB auth | 1.33 ns/B 715.5 MiB/s 1.54 c/B = AES192 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.64 ns/B 169.0 MiB/s 6.50 c/B ECB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B CBC enc | 1.90 ns/B 502.1 MiB/s 2.19 c/B CBC dec | 1.24 ns/B 771.7 MiB/s 1.42 c/B CFB enc | 1.84 ns/B 517.1 MiB/s 2.12 c/B CFB dec | 1.23 ns/B 772.5 MiB/s 1.42 c/B OFB enc | 7.60 ns/B 125.5 MiB/s 8.75 c/B OFB dec | 7.60 ns/B 125.6 MiB/s 8.75 c/B CTR enc | 1.36 ns/B 702.7 MiB/s 1.56 c/B CTR dec | 1.36 ns/B 702.5 MiB/s 1.56 c/B CCM enc | 3.31 ns/B 287.8 MiB/s 3.82 c/B CCM dec | 3.31 ns/B 288.0 MiB/s 3.81 c/B CCM auth | 2.06 ns/B 462.1 MiB/s 2.38 c/B GCM enc | 2.28 ns/B 418.4 MiB/s 2.63 c/B GCM dec | 2.28 ns/B 418.0 MiB/s 2.63 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 1.83 ns/B 520.1 MiB/s 2.11 c/B OCB dec | 1.84 ns/B 517.8 MiB/s 2.12 c/B OCB auth | 1.52 ns/B 626.1 MiB/s 1.75 c/B = AES256 | nanosecs/byte mebibytes/sec cycles/byte ECB enc | 5.86 ns/B 162.7 MiB/s 6.75 c/B ECB dec | 6.02 ns/B 158.3 MiB/s 6.94 c/B CBC enc | 2.44 ns/B 390.5 MiB/s 2.81 c/B CBC dec | 1.45 ns/B 656.4 MiB/s 1.67 c/B CFB enc | 2.39 ns/B 399.5 MiB/s 2.75 c/B CFB dec | 1.45 ns/B 656.8 MiB/s 1.67 c/B OFB enc | 7.81 ns/B 122.1 MiB/s 9.00 c/B OFB dec | 7.81 ns/B 122.1 MiB/s 9.00 c/B CTR enc | 1.57 ns/B 605.8 MiB/s 1.81 c/B CTR dec | 1.57 ns/B 605.9 MiB/s 1.81 c/B CCM enc | 4.07 ns/B 234.3 MiB/s 4.69 c/B CCM dec | 4.07 ns/B 234.1 MiB/s 4.69 c/B CCM auth | 2.61 ns/B 365.7 MiB/s 3.00 c/B GCM enc | 2.50 ns/B 381.9 MiB/s 2.88 c/B GCM dec | 2.49 ns/B 382.3 MiB/s 2.87 c/B GCM auth | 0.926 ns/B 1029.7 MiB/s 1.07 c/B OCB enc | 2.05 ns/B 465.6 MiB/s 2.36 c/B OCB dec | 2.06 ns/B 462.0 MiB/s 2.38 c/B OCB auth | 1.74 ns/B 548.4 MiB/s 2.00 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>