| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aarch64.S: Align functions to 16 bytes.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/rijndael-armv8-aarch64-ce.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sha512-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
* cipher/sm4-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-armv9-aarch64-sve-ce.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
* mpi/aarch64/mpih-add1.S: Likewise.
* mpi/aarch64/mpih-mul1.S: Likewise.
* mpi/aarch64/mpih-mul2.S: Likewise.
* mpi/aarch64/mpih-mul3.S: Likewise.
* mpi/aarch64/mpih-sub1.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (SECTION_RODATA)
(GET_DATA_POINTER): New.
(GET_LOCAL_POINTER): Remove.
* cipher/camellia-aarch64.S: Move constant data to read-only data
section; Remove unneeded '.ltorg'.
* cipher/chacha20-aarch64.S: Likewise.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/rijndael-aarch64.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
* cipher/sm4-armv9-aarch64-sve-ce.S: Likewise.
* cipher/twofish-aarch64.S: Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-common-aarch64.h (GET_DATA_POINTER): Remove.
(GET_LOCAL_POINTER): New.
* cipher/camellia-aarch64.S: Use GET_LOCAL_POINTER instead of ADR
instruction directly.
* cipher/chacha20-aarch64.S: Use GET_LOCAL_POINTER instead of
GET_DATA_POINTER.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Likewise.
* cipher/crc-armv8-aarch64-ce.S: Likewise.
* cipher/sha1-armv8-aarch64-ce.S: Likewise.
* cipher/sha256-armv8-aarch64-ce.S: Likewise.
* cipher/sm3-aarch64.S: Likewise.
* cipher/sm3-armv8-aarch64-ce.S: Likewise.
* cipher/sm4-aarch64.S: Likewise.
---
Switch to use ADR instead of ADRP/LDR or ADRP/ADD for getting
data pointers within assembly files. ADR is more portable across
targets and does not require labels to be declared in GOT tables.
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
* cipher/Makefile.am: Add 'sm4-aarch64.S'.
* cipher/sm4-aarch64.S: New.
* cipher/sm4.c (USE_AARCH64_SIMD): New.
(SM4_context) [USE_AARCH64_SIMD]: Add 'use_aarch64_simd'.
[USE_AARCH64_SIMD] (_gcry_sm4_aarch64_crypt)
(_gcry_sm4_aarch64_ctr_enc, _gcry_sm4_aarch64_cbc_dec)
(_gcry_sm4_aarch64_cfb_dec, _gcry_sm4_aarch64_crypt_blk1_8)
(sm4_aarch64_crypt_blk1_8): New.
(sm4_setkey): Enable ARMv8/AArch64 if supported by HW.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AARCH64_SIMD]:
Add ARMv8/AArch64 bulk functions.
* configure.ac: Add 'sm4-aarch64.lo'.
--
This patch adds ARMv8/AArch64 bulk encryption/decryption. Bulk
functions process eight blocks in parallel.
Benchmark on T-Head Yitian-710 2.75 GHz:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 12.10 ns/B 78.81 MiB/s 33.28 c/B 2750
CBC dec | 7.19 ns/B 132.6 MiB/s 19.77 c/B 2750
CFB enc | 12.14 ns/B 78.58 MiB/s 33.37 c/B 2750
CFB dec | 7.24 ns/B 131.8 MiB/s 19.90 c/B 2750
CTR enc | 7.24 ns/B 131.7 MiB/s 19.90 c/B 2750
CTR dec | 7.24 ns/B 131.7 MiB/s 19.91 c/B 2750
GCM enc | 9.49 ns/B 100.4 MiB/s 26.11 c/B 2750
GCM dec | 9.49 ns/B 100.5 MiB/s 26.10 c/B 2750
GCM auth | 2.25 ns/B 423.1 MiB/s 6.20 c/B 2750
OCB enc | 7.35 ns/B 129.8 MiB/s 20.20 c/B 2750
OCB dec | 7.36 ns/B 129.6 MiB/s 20.23 c/B 2750
OCB auth | 7.29 ns/B 130.8 MiB/s 20.04 c/B 2749
After (~55% faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 12.10 ns/B 78.79 MiB/s 33.28 c/B 2750
CBC dec | 4.63 ns/B 205.9 MiB/s 12.74 c/B 2749
CFB enc | 12.14 ns/B 78.58 MiB/s 33.37 c/B 2750
CFB dec | 4.64 ns/B 205.5 MiB/s 12.76 c/B 2750
CTR enc | 4.69 ns/B 203.3 MiB/s 12.90 c/B 2750
CTR dec | 4.69 ns/B 203.3 MiB/s 12.90 c/B 2750
GCM enc | 4.88 ns/B 195.4 MiB/s 13.42 c/B 2750
GCM dec | 4.88 ns/B 195.5 MiB/s 13.42 c/B 2750
GCM auth | 0.189 ns/B 5048 MiB/s 0.520 c/B 2750
OCB enc | 4.86 ns/B 196.0 MiB/s 13.38 c/B 2750
OCB dec | 4.90 ns/B 194.7 MiB/s 13.47 c/B 2750
OCB auth | 4.79 ns/B 199.0 MiB/s 13.18 c/B 2750
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
|