| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aarch64.S: Align functions to 16 bytes.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/rijndael-armv8-aarch64-ce.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sha512-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
* cipher/sm4-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-armv9-aarch64-sve-ce.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
* mpi/aarch64/mpih-add1.S: Likewise.
* mpi/aarch64/mpih-mul1.S: Likewise.
* mpi/aarch64/mpih-mul2.S: Likewise.
* mpi/aarch64/mpih-mul3.S: Likewise.
* mpi/aarch64/mpih-sub1.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'.
* cipher/cipher.c (do_ecb_crypt): Use bulk function if available.
* cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label
'.Ldeclast' to '.Lenclast'.
(_gcry_aes_aesni_ecb_crypt): New.
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce)
(_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change
return value from void to size_t.
(ocb_crypt_fn_t, xts_crypt_fn_t): Remove.
(_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove
indirect function call; Return value from called function (allows tail
call optimization).
(_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows
tail call optimization).
(_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce)
(_gcry_aes_armv8_ce_ecb_crypt): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ecb_crypt_amd64): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64)
(_gcry_aes_vaes_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt)
(_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New.
(do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE.
--
Benchmark on AMD Ryzen 9 7900X:
Before (OCB for reference):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1
ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608
OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625
OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625
ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (ret_spec_stop): New.
* cipher/asm-poly1305-aarch64.h: Use 'ret_spec_stop' for 'ret'
instruction.
* cipher/camellia-aarch64.S: Likewise.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/rijndael-armv8-aarch64-ce.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
* mpi/aarch64/mpih-add1.S: Likewise.
* mpi/aarch64/mpih-mul1.S: Likewise.
* mpi/aarch64/mpih-mul2.S: Likewise.
* mpi/aarch64/mpih-mul3.S: Likewise.
* mpi/aarch64/mpih-sub1.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-aarch64-ce.S (vk14): Remove.
(vklast, __, _): New.
(aes_preload_keys): Setup vklast.
(do_aes_one128/192/256): Split to ...
(do_aes_one_part1, do_aes_part2_128/192/256): ... these and add
interleave ops.
(do_aes_one128/192/256): New using above part1 and part2 macros.
(aes_round_4): Rename to ...
(aes_round_4_multikey): ... this and allow different key used for
parallel blocks.
(aes_round_4): New using above multikey macro.
(aes_lastround_4): Reorder AES round and xor instructions, allow
different last key for parallel blocks.
(do_aes_4_128/192/256): Split to ...
(do_aes_4_part1_multikey, do_aes_4_part1)
(do_aes_4_part2_128/192/256): ... these.
(do_aes_4_128/192/256): New using above part1 and part2 macros.
(CLEAR_REG): Use movi for clearing registers.
(aes_clear_keys): Remove branching and clear all key registers.
(_gcry_aes_enc_armv8_ce, _gcry_aes_dec_armv8_ce): Adjust to macro
changes.
(_gcry_aes_cbc_enc_armv8_ce, _gcry_aes_cbc_dec_armv8_ce)
(_gcry_aes_cfb_enc_armv8_ce, _gcry_aes_cfb_enc_armv8_ce)
(_gcry_aes_ctr32le_enc_armv8_ce): Apply entry/loop-body/exit
optimization for better interleaving of input/output processing;
First/last round key and input/output xoring optimization to reduce
critical path length.
(_gcry_aes_ctr_enc_armv8_ce): Add fast path for counter incrementing
without byte-swaps when counter does not overflow 8-bit; Apply
entry/loop-body/exit optimization for better interleaving of
input/output processing; First/last round key and input/output
xoring optimization to reduce critical path length.
(_gcry_aes_ocb_enc_armv8_ce, _gcry_aes_ocb_dec_armv8_ce): Add aligned
processing for nblk and OCB offsets; Apply entry/loop-body/exit
optimization for better interleaving of input/output processing;
First/last round key and input/output xoring optimization to reduce
critical path length; Change to use same function body macro for
both encryption and decryption.
(_gcry_aes_xts_enc_armv8_ce, _gcry_aes_xts_dec_armv8_ce): Apply
entry/loop-body/exit optimization for better interleaving of
input/output processing; First/last round key and input/output
xoring optimization to reduce critical path length; Change to use
same function body macro for both encryption and decryption.
--
Benchmark on AWS Graviton2 (2500Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 0.663 ns/B 1439 MiB/s 1.66 c/B
CBC dec | 0.288 ns/B 3310 MiB/s 0.720 c/B
CFB enc | 0.657 ns/B 1453 MiB/s 1.64 c/B
CFB dec | 0.288 ns/B 3313 MiB/s 0.720 c/B
CTR dec | 0.314 ns/B 3039 MiB/s 0.785 c/B
XTS enc | 0.357 ns/B 2674 MiB/s 0.891 c/B
XTS dec | 0.358 ns/B 2666 MiB/s 0.894 c/B
OCB enc | 0.343 ns/B 2784 MiB/s 0.856 c/B
OCB dec | 0.341 ns/B 2795 MiB/s 0.853 c/B
GCM-SIV enc | 0.526 ns/B 1813 MiB/s 1.31 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte perf increase
CBC enc | 0.500 ns/B 1906 MiB/s 1.25 c/B +33%
CBC dec | 0.263 ns/B 3622 MiB/s 0.658 c/B +9%
CFB enc | 0.500 ns/B 1906 MiB/s 1.25 c/B +31%
CFB dec | 0.263 ns/B 3620 MiB/s 0.658 c/B +9%
CTR enc | 0.264 ns/B 3618 MiB/s 0.659 c/B +19%
XTS enc | 0.350 ns/B 2722 MiB/s 0.876 c/B +2%
OCB enc | 0.275 ns/B 3468 MiB/s 0.687 c/B +25%
OCB dec | 0.276 ns/B 3459 MiB/s 0.689 c/B +24%
GCM-SIV enc | 0.494 ns/B 1929 MiB/s 1.24 c/B +6%
Benchmark on Cortex-A53 (1152Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.41 ns/B 675.9 MiB/s 1.63 c/B
CBC dec | 0.910 ns/B 1048 MiB/s 1.05 c/B
CFB enc | 1.30 ns/B 732.2 MiB/s 1.50 c/B
CFB dec | 0.910 ns/B 1048 MiB/s 1.05 c/B
CTR enc | 1.03 ns/B 924.4 MiB/s 1.19 c/B
XTS enc | 1.25 ns/B 763.0 MiB/s 1.44 c/B
OCB enc | 1.21 ns/B 789.5 MiB/s 1.39 c/B
OCB dec | 1.21 ns/B 788.9 MiB/s 1.39 c/B
GCM-SIV enc | 1.92 ns/B 496.6 MiB/s 2.21 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte perf increase
CBC enc | 1.14 ns/B 836.6 MiB/s 1.31 c/B +24%
CBC dec | 0.843 ns/B 1132 MiB/s 0.971 c/B +8%
CFB enc | 1.19 ns/B 798.8 MiB/s 1.38 c/B +9%
CFB dec | 0.842 ns/B 1132 MiB/s 0.970 c/B +8%
CTR enc | 0.898 ns/B 1062 MiB/s 1.03 c/B +16%
XTS enc | 1.22 ns/B 779.9 MiB/s 1.41 c/B +2%
OCB enc | 0.992 ns/B 961.0 MiB/s 1.14 c/B +22%
OCB dec | 0.993 ns/B 960.5 MiB/s 1.14 c/B +22%
GCM-SIV enc | 1.88 ns/B 507.3 MiB/s 2.17 c/B +2%
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-aarch32-ce.S
(_gcry_aes_ctr32le_enc_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S
(_gcry_aes_ctr32le_enc_armv8_ce): New.
* cipher/rijndael-armv8-ce.c
(_gcry_aes_ctr32le_enc_armv8_ce)
(_gcry_aes_armv8_ce_ctr32le_enc): New.
* cipher/rijndael.c
(_gcry_aes_armv8_ce_ctr32le_enc): New prototype.
(do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for ARMv8-CE.
--
Benchmark on Cortex-A53 (aarch64):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 11.77 ns/B 81.03 MiB/s 7.63 c/B 647.9
GCM-SIV dec | 11.92 ns/B 79.98 MiB/s 7.73 c/B 647.9
GCM-SIV auth | 2.99 ns/B 318.9 MiB/s 1.94 c/B 648.0
After (~2.4x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 4.66 ns/B 204.5 MiB/s 3.02 c/B 647.9
GCM-SIV dec | 4.82 ns/B 198.0 MiB/s 3.12 c/B 647.9
GCM-SIV auth | 3.00 ns/B 318.4 MiB/s 1.94 c/B 648.0
GnuPG-bug-id: T4485
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (GET_DATA_POINTER): New.
* cipher/chacha20-aarch64.S (GET_DATA_POINTER): Remove.
* cipher/cipher-gcm-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/crc-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/rijndael-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/sha1-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/sha256-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (CFI_STARTPROC, CFI_ENDPROC)
(CFI_REMEMBER_STATE, CFI_RESTORE_STATE, CFI_ADJUST_CFA_OFFSET)
(CFI_REL_OFFSET, CFI_DEF_CFA_REGISTER, CFI_REGISTER, CFI_RESTORE)
(DW_REGNO_SP, DW_SLEB128_7BIT, DW_SLEB128_28BIT, CFI_CFA_ON_STACK)
(CFI_REG_ON_STACK): New.
* cipher/camellia-aarch64.S: Add CFI directives.
* cipher/chacha20-aarch64.S: Add CFI directives.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Add CFI directives.
* cipher/crc-armv8-aarch64-ce.S: Add CFI directives.
* cipher/rijndael-aarch64.S: Add CFI directives.
* cipher/rijndael-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha1-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha256-armv8-aarch64-ce.S: Add CFI directives.
* cipher/twofish-aarch64.S: Add CFI directives.
* mpi/aarch64/mpih-add1.S: Add CFI directives.
* mpi/aarch64/mpih-mul1.S: Add CFI directives.
* mpi/aarch64/mpih-mul2.S: Add CFI directives.
* mpi/aarch64/mpih-mul3.S: Add CFI directives.
* mpi/aarch64/mpih-sub1.S: Add CFI directives.
* mpi/asm-common-aarch64.h: Include "../cipher/asm-common-aarch64.h".
(ELF): Remove.
--
This commit adds CFI directives that add DWARF unwinding information for
debugger to backtrace when executing code from 64-bit ARM assembly files.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aarch64.S (_gcry_camellia_arm_encrypt_block)
(__gcry_camellia_arm_decrypt_block): Make comment section about input
registers match usage.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ocb_auth_armv8_ce): Use
'w12' and 'w7' instead of 'x12' and 'x7'.
(_gcry_aes_xts_enc_armv8_ce, _gcry_aes_xts_dec_armv8_ce): Fix function
prototype in comments.
* mpi/aarch64/mpih-add1.S: Use 32-bit registers for 32-bit mpi_size_t
parameters.
* mpi/aarch64/mpih-mul1.S: Ditto.
* mpi/aarch64/mpih-mul2.S: Ditto.
* mpi/aarch64/mpih-mul3.S: Ditto.
* mpi/aarch64/mpih-sub1.S: Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h: New.
* cipher/camellia-aarch64.S: Use ELF macro, use x19 instead of x18.
* cipher/chacha20-aarch64.S: Use ELF macro, don't use GOT on windows.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Use ELF macro.
* cipher/rijndael-aarch64.S: Use ELF macro.
* cipher/rijndael-armv8-aarch64-ce.S: Use ELF macro.
* cipher/sha1-armv8-aarch64-ce.S: Use ELF macro.
* cipher/sha256-armv8-aarch64-ce.S: Use ELF macro.
* cipher/twofish-aarch64.S: Use ELF macro.
* configure.ac: Don't require .size and .type in aarch64 assembly check.
--
Don't require .type and .size in configure; we can make
them optional via a preprocessor macro.
This is mostly a mechanical change, wrapping the .type and .size
directives in an ELF() macro, with two actual manual changes:
(when targeting windows):
- Don't load global symbols via a GOT (in chacha20)
- Don't use the x18 register (in camellia); back up and restore x19
in the prologue/epilogue and use that instead.
x18 is a platform specific register; on linux, it's free to be used
by user code, while it's reserved for platform use on windows and
darwin. Always use x19 instead of x18 for consistency.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_xts_enc_armv8_ce)
(_gcry_aes_xts_dec_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_xts_enc_armv8_ce)
(_gcry_aes_xts_dec_armv8_ce): New.
* cipher/rijndael-armv8-ce.c (_gcry_aes_xts_enc_armv8_ce)
(_gcry_aes_xts_dec_armv8_ce, xts_crypt_fn_t)
(_gcry_aes_armv8_ce_xts_crypt): New.
* cipher/rijndael.c (_gcry_aes_armv8_ce_xts_crypt): New.
(_gcry_aes_xts_crypt) [USE_ARM_CE]: New.
--
Benchmark on Cortex-A53 (AArch64, 1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 4.88 ns/B 195.5 MiB/s 5.62 c/B
XTS dec | 4.94 ns/B 192.9 MiB/s 5.70 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 5.55 ns/B 171.8 MiB/s 6.39 c/B
XTS dec | 5.61 ns/B 169.9 MiB/s 6.47 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 6.22 ns/B 153.3 MiB/s 7.17 c/B
XTS dec | 6.29 ns/B 151.7 MiB/s 7.24 c/B
=
After (~2.6x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 1.83 ns/B 520.9 MiB/s 2.11 c/B
XTS dec | 1.82 ns/B 524.9 MiB/s 2.09 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 1.97 ns/B 483.3 MiB/s 2.27 c/B
XTS dec | 1.96 ns/B 486.9 MiB/s 2.26 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.11 ns/B 450.9 MiB/s 2.44 c/B
XTS dec | 2.10 ns/B 453.8 MiB/s 2.42 c/B
=
Benchmark on Cortex-A53 (AArch32, 1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 6.52 ns/B 146.2 MiB/s 7.51 c/B
XTS dec | 6.57 ns/B 145.2 MiB/s 7.57 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 7.10 ns/B 134.3 MiB/s 8.18 c/B
XTS dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 7.30 ns/B 130.7 MiB/s 8.41 c/B
XTS dec | 7.38 ns/B 129.3 MiB/s 8.50 c/B
=
After (~2.7x faster):
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.33 ns/B 409.6 MiB/s 2.68 c/B
XTS dec | 2.35 ns/B 405.3 MiB/s 2.71 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.53 ns/B 377.6 MiB/s 2.91 c/B
XTS dec | 2.54 ns/B 375.5 MiB/s 2.93 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.75 ns/B 346.8 MiB/s 3.17 c/B
XTS dec | 2.76 ns/B 345.2 MiB/s 3.18 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-armv8-aarch64-ce.S: Use '.cpu generic+simd+crypto'
instead of '.arch armv8-a+crypto'.
* cipher/rijndael-armv8-aarch64-ce.S: Ditto.
* cipher/sha1-armv8-aarch64-ce.S: Ditto.
* cipher/sha256-armv8-aarch64-ce.S: Ditto.
* configure.ac (gcry_cv_gcc_inline_asm_aarch64_neon): Ditto.
(gcry_cv_gcc_inline_asm_aarch64_crypto): Ditto; and include NEON
instructions to crypto instructions check.
--
GnuPG-bug-id: 2975
Reported-by: Kirill Ponomarev <kp@krion.cc>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/rijndael-armv8-aarch32-ce.S: Add OCB 'L_{ntz(i)}' calculation.
* cipher/rijndael-armv8-aarch64-ce.S: Ditto.
* cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce)
(_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce)
(ocb_cryt_fn_t): Updated arguments.
(_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_ocb_auth): Remove
'ocb_get_l' handling and splitting input to 32 block chunks, instead
pass full buffers to assembly.
--
Performance on Cortex-A53 (AArch32):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 1.63 ns/B 583.8 MiB/s 1.88 c/B
OCB dec | 1.67 ns/B 572.1 MiB/s 1.92 c/B
OCB auth | 1.33 ns/B 717.1 MiB/s 1.53 c/B
After (~12% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 1.47 ns/B 650.2 MiB/s 1.69 c/B
OCB dec | 1.48 ns/B 644.5 MiB/s 1.70 c/B
OCB auth | 1.19 ns/B 798.2 MiB/s 1.38 c/B
Performance on Cortex-A53 (AArch64):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 1.29 ns/B 738.5 MiB/s 1.49 c/B
OCB dec | 1.32 ns/B 723.5 MiB/s 1.52 c/B
OCB auth | 1.15 ns/B 827.0 MiB/s 1.33 c/B
After (~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 1.21 ns/B 789.1 MiB/s 1.39 c/B
OCB dec | 1.21 ns/B 789.2 MiB/s 1.39 c/B
OCB auth | 1.10 ns/B 867.0 MiB/s 1.27 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'rijndael-armv-aarch64-ce.S'.
* cipher/rijndael-armv8-aarch64-ce.S: New.
* cipher/rijndael-internal.h (USE_ARM_CE): Enable for ARMv8/AArch64.
* configure.ac: Add 'rijndael-armv-aarch64-ce.lo' and
'rijndael-armv8-ce.lo' for ARMv8/AArch64.
--
Improvement vs AArch64 assembly on Cortex-A53:
AES-128 AES-192 AES-256
CBC enc: 13.19x 13.53x 13.76x
CBC dec: 20.53x 21.91x 22.60x
CFB enc: 14.29x 14.50x 14.63x
CFB dec: 20.42x 21.69x 22.50x
CTR: 18.29x 19.61x 20.53x
OCB enc: 15.21x 16.32x 17.12x
OCB dec: 14.95x 16.11x 16.88x
OCB auth: 16.73x 17.93x 18.66x
Benchmark on Cortex-A53 (1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.86 ns/B 43.62 MiB/s 25.19 c/B
ECB dec | 22.68 ns/B 42.05 MiB/s 26.13 c/B
CBC enc | 18.66 ns/B 51.10 MiB/s 21.50 c/B
CBC dec | 18.72 ns/B 50.95 MiB/s 21.56 c/B
CFB enc | 18.61 ns/B 51.25 MiB/s 21.44 c/B
CFB dec | 18.61 ns/B 51.25 MiB/s 21.44 c/B
OFB enc | 22.84 ns/B 41.75 MiB/s 26.31 c/B
OFB dec | 22.84 ns/B 41.75 MiB/s 26.31 c/B
CTR enc | 18.89 ns/B 50.50 MiB/s 21.76 c/B
CTR dec | 18.89 ns/B 50.50 MiB/s 21.76 c/B
CCM enc | 37.55 ns/B 25.40 MiB/s 43.25 c/B
CCM dec | 37.55 ns/B 25.40 MiB/s 43.25 c/B
CCM auth | 18.77 ns/B 50.80 MiB/s 21.63 c/B
GCM enc | 20.18 ns/B 47.25 MiB/s 23.25 c/B
GCM dec | 20.18 ns/B 47.25 MiB/s 23.25 c/B
GCM auth | 1.30 ns/B 732.5 MiB/s 1.50 c/B
OCB enc | 19.67 ns/B 48.48 MiB/s 22.66 c/B
OCB dec | 19.73 ns/B 48.34 MiB/s 22.72 c/B
OCB auth | 19.46 ns/B 49.00 MiB/s 22.42 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B
ECB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B
CBC enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B
CBC dec | 22.25 ns/B 42.87 MiB/s 25.63 c/B
CFB enc | 22.03 ns/B 43.30 MiB/s 25.38 c/B
CFB dec | 22.03 ns/B 43.29 MiB/s 25.38 c/B
OFB enc | 26.26 ns/B 36.32 MiB/s 30.25 c/B
OFB dec | 26.26 ns/B 36.32 MiB/s 30.25 c/B
CTR enc | 22.30 ns/B 42.76 MiB/s 25.69 c/B
CTR dec | 22.30 ns/B 42.76 MiB/s 25.69 c/B
CCM enc | 44.38 ns/B 21.49 MiB/s 51.13 c/B
CCM dec | 44.38 ns/B 21.49 MiB/s 51.13 c/B
CCM auth | 22.20 ns/B 42.97 MiB/s 25.57 c/B
GCM enc | 23.60 ns/B 40.41 MiB/s 27.19 c/B
GCM dec | 23.60 ns/B 40.41 MiB/s 27.19 c/B
GCM auth | 1.30 ns/B 732.4 MiB/s 1.50 c/B
OCB enc | 23.09 ns/B 41.31 MiB/s 26.60 c/B
OCB dec | 23.21 ns/B 41.09 MiB/s 26.74 c/B
OCB auth | 22.88 ns/B 41.68 MiB/s 26.36 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 28.76 ns/B 33.17 MiB/s 33.13 c/B
ECB dec | 29.46 ns/B 32.37 MiB/s 33.94 c/B
CBC enc | 25.45 ns/B 37.48 MiB/s 29.31 c/B
CBC dec | 25.50 ns/B 37.40 MiB/s 29.38 c/B
CFB enc | 25.39 ns/B 37.56 MiB/s 29.25 c/B
CFB dec | 25.39 ns/B 37.56 MiB/s 29.25 c/B
OFB enc | 29.62 ns/B 32.19 MiB/s 34.13 c/B
OFB dec | 29.62 ns/B 32.19 MiB/s 34.13 c/B
CTR enc | 25.67 ns/B 37.15 MiB/s 29.57 c/B
CTR dec | 25.67 ns/B 37.15 MiB/s 29.57 c/B
CCM enc | 51.11 ns/B 18.66 MiB/s 58.88 c/B
CCM dec | 51.11 ns/B 18.66 MiB/s 58.88 c/B
CCM auth | 25.56 ns/B 37.32 MiB/s 29.44 c/B
GCM enc | 26.96 ns/B 35.37 MiB/s 31.06 c/B
GCM dec | 26.98 ns/B 35.35 MiB/s 31.08 c/B
GCM auth | 1.30 ns/B 733.4 MiB/s 1.50 c/B
OCB enc | 26.45 ns/B 36.05 MiB/s 30.47 c/B
OCB dec | 26.53 ns/B 35.95 MiB/s 30.56 c/B
OCB auth | 26.24 ns/B 36.34 MiB/s 30.23 c/B
=
After:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 4.83 ns/B 197.5 MiB/s 5.56 c/B
ECB dec | 4.99 ns/B 191.1 MiB/s 5.75 c/B
CBC enc | 1.41 ns/B 675.5 MiB/s 1.63 c/B
CBC dec | 0.911 ns/B 1046.9 MiB/s 1.05 c/B
CFB enc | 1.30 ns/B 732.2 MiB/s 1.50 c/B
CFB dec | 0.911 ns/B 1046.7 MiB/s 1.05 c/B
OFB enc | 5.81 ns/B 164.3 MiB/s 6.69 c/B
OFB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B
CTR enc | 1.03 ns/B 924.0 MiB/s 1.19 c/B
CTR dec | 1.03 ns/B 924.1 MiB/s 1.19 c/B
CCM enc | 2.50 ns/B 381.8 MiB/s 2.88 c/B
CCM dec | 2.50 ns/B 381.7 MiB/s 2.88 c/B
CCM auth | 1.57 ns/B 606.1 MiB/s 1.81 c/B
GCM enc | 2.33 ns/B 408.5 MiB/s 2.69 c/B
GCM dec | 2.34 ns/B 408.4 MiB/s 2.69 c/B
GCM auth | 1.30 ns/B 732.1 MiB/s 1.50 c/B
OCB enc | 1.29 ns/B 736.6 MiB/s 1.49 c/B
OCB dec | 1.32 ns/B 724.4 MiB/s 1.52 c/B
OCB auth | 1.16 ns/B 819.6 MiB/s 1.34 c/B
=
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.48 ns/B 174.0 MiB/s 6.31 c/B
ECB dec | 5.64 ns/B 169.0 MiB/s 6.50 c/B
CBC enc | 1.63 ns/B 585.8 MiB/s 1.88 c/B
CBC dec | 1.02 ns/B 935.8 MiB/s 1.17 c/B
CFB enc | 1.52 ns/B 627.7 MiB/s 1.75 c/B
CFB dec | 1.02 ns/B 935.9 MiB/s 1.17 c/B
OFB enc | 6.46 ns/B 147.7 MiB/s 7.44 c/B
OFB dec | 6.46 ns/B 147.7 MiB/s 7.44 c/B
CTR enc | 1.14 ns/B 836.1 MiB/s 1.31 c/B
CTR dec | 1.14 ns/B 835.9 MiB/s 1.31 c/B
CCM enc | 2.83 ns/B 337.6 MiB/s 3.25 c/B
CCM dec | 2.82 ns/B 338.0 MiB/s 3.25 c/B
CCM auth | 1.79 ns/B 532.7 MiB/s 2.06 c/B
GCM enc | 2.44 ns/B 390.3 MiB/s 2.82 c/B
GCM dec | 2.44 ns/B 390.2 MiB/s 2.82 c/B
GCM auth | 1.30 ns/B 731.9 MiB/s 1.50 c/B
OCB enc | 1.41 ns/B 674.7 MiB/s 1.63 c/B
OCB dec | 1.44 ns/B 662.0 MiB/s 1.66 c/B
OCB auth | 1.28 ns/B 746.1 MiB/s 1.47 c/B
=
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 6.13 ns/B 155.5 MiB/s 7.06 c/B
ECB dec | 6.29 ns/B 151.5 MiB/s 7.25 c/B
CBC enc | 1.85 ns/B 516.8 MiB/s 2.13 c/B
CBC dec | 1.13 ns/B 845.6 MiB/s 1.30 c/B
CFB enc | 1.74 ns/B 549.5 MiB/s 2.00 c/B
CFB dec | 1.13 ns/B 846.1 MiB/s 1.30 c/B
OFB enc | 7.11 ns/B 134.2 MiB/s 8.19 c/B
OFB dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B
CTR enc | 1.25 ns/B 763.5 MiB/s 1.44 c/B
CTR dec | 1.25 ns/B 763.4 MiB/s 1.44 c/B
CCM enc | 3.15 ns/B 302.9 MiB/s 3.63 c/B
CCM dec | 3.15 ns/B 302.9 MiB/s 3.63 c/B
CCM auth | 2.01 ns/B 474.2 MiB/s 2.32 c/B
GCM enc | 2.55 ns/B 374.2 MiB/s 2.94 c/B
GCM dec | 2.55 ns/B 373.7 MiB/s 2.94 c/B
GCM auth | 1.30 ns/B 732.2 MiB/s 1.50 c/B
OCB enc | 1.54 ns/B 617.6 MiB/s 1.78 c/B
OCB dec | 1.57 ns/B 606.8 MiB/s 1.81 c/B
OCB auth | 1.40 ns/B 679.8 MiB/s 1.62 c/B
=
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|