| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'.
* cipher/cipher.c (do_ecb_crypt): Use bulk function if available.
* cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label
'.Ldeclast' to '.Lenclast'.
(_gcry_aes_aesni_ecb_crypt): New.
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce)
(_gcry_aes_ecb_dec_armv8_ce): New.
* cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce)
(_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change
return value from void to size_t.
(ocb_crypt_fn_t, xts_crypt_fn_t): Remove.
(_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove
indirect function call; Return value from called function (allows tail
call optimization).
(_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows
tail call optimization).
(_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce)
(_gcry_aes_armv8_ce_ecb_crypt): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ecb_crypt_amd64): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64)
(_gcry_aes_vaes_ecb_crypt): New.
* cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt)
(_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New.
(do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE.
--
Benchmark on AMD Ryzen 9 7900X:
Before (OCB for reference):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1
ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608
OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625
OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625
ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm.c (_gcry_cipher_gcm_setiv_zero): New.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_authenticate): Use _gcry_cipher_gcm_setiv_zero.
* cipher/cipher-internal.h (struct gcry_cipher_handle): Add aead field.
* cipher/cipher.c (_gcry_cipher_setiv): Check calling setiv to reject
direct invocation in FIPS mode.
(_gcry_cipher_setup_geniv, _gcry_cipher_geniv): New.
* doc/gcrypt.texi: Add explanation for two new functions.
* src/gcrypt-int.h (_gcry_cipher_setup_geniv, _gcry_cipher_geniv): New.
* src/gcrypt.h.in (enum gcry_cipher_geniv_methods): New.
(gcry_cipher_setup_geniv, gcry_cipher_geniv): New.
* src/libgcrypt.def (gcry_cipher_setup_geniv, gcry_cipher_geniv): Add.
* src/libgcrypt.vers: Likewise.
* src/visibility.c (gcry_cipher_setup_geniv, gcry_cipher_geniv): Add.
* src/visibility.h: Likewise.
--
GnuPG-bug-id: 4873
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX512)
(GCM_INTEL_AGGR32_TABLE_INITIALIZED): New.
(ghash_setup_aggr16_avx2): Store H16 for aggr32 setup.
[GCM_USE_INTEL_VPCLMUL_AVX512] (GFMUL_AGGR32_ASM_VPCMUL_AVX512)
(gfmul_vpclmul_avx512_aggr32, gfmul_vpclmul_avx512_aggr32_le)
(gfmul_pclmul_avx512, gcm_lsh_avx512, load_h1h4_to_zmm1)
(ghash_setup_aggr8_avx512, ghash_setup_aggr16_avx512)
(ghash_setup_aggr32_avx512, swap128b_perm): New.
(_gcry_ghash_setup_intel_pclmul) [GCM_USE_INTEL_VPCLMUL_AVX512]: Enable
AVX512 implementation based on HW features.
(_gcry_ghash_intel_pclmul, _gcry_polyval_intel_pclmul): Add
VPCLMUL/AVX512 code path; Small tweaks to VPCLMUL/AVX2 code path; Tweaks
on register clearing.
--
Patch adds VPCLMUL/AVX512 accelerated implementation for GHASH (GCM) and
POLYVAL (GCM-SIV).
Benchmark on Intel Core i3-1115G4:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.063 ns/B 15200 MiB/s 0.257 c/B 4090
GCM-SIV auth | 0.061 ns/B 15704 MiB/s 0.248 c/B 4090
After (ghash ~41% faster, polyval ~34% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.044 ns/B 21614 MiB/s 0.181 c/B 4096±3
GCM-SIV auth | 0.045 ns/B 21108 MiB/s 0.185 c/B 4097±3
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 0.084 ns/B 11306 MiB/s 0.346 c/B 4097±3
GCM-SIV enc | 0.086 ns/B 11026 MiB/s 0.354 c/B 4096±3
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX2)
(GCM_INTEL_AGGR8_TABLE_INITIALIZED)
(GCM_INTEL_AGGR16_TABLE_INITIALIZED): New.
(gfmul_pclmul): Fixes to comments.
[GCM_USE_INTEL_VPCLMUL_AVX2] (GFMUL_AGGR16_ASM_VPCMUL_AVX2)
(gfmul_vpclmul_avx2_aggr16, gfmul_vpclmul_avx2_aggr16_le)
(gfmul_pclmul_avx2, gcm_lsh_avx2, load_h1h2_to_ymm1)
(ghash_setup_aggr8_avx2, ghash_setup_aggr16_avx2): New.
(_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter; Setup
ghash and polyval function pointers for context; Add VPCLMUL/AVX2 code
path; Defer aggr8 and aggr16 table initialization to until first use in
'_gcry_ghash_intel_pclmul' or '_gcry_polyval_intel_pclmul'.
[__x86_64__] (ghash_setup_aggr8): New.
(_gcry_ghash_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for
aggr8 table initialization.
(_gcry_polyval_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for
aggr8 table initialization.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_intel_pclmul)
(_gcry_polyval_intel_pclmul): Remove.
[GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul): Add
'hw_features' parameter.
(setupM) [GCM_USE_INTEL_PCLMUL]: Pass HW features to
'_gcry_ghash_setup_intel_pclmul'; Let '_gcry_ghash_setup_intel_pclmul'
setup function pointers.
* cipher/cipher-internal.h (GCM_USE_INTEL_VPCLMUL_AVX2): New.
(gcry_cipher_handle): Add member 'gcm.hw_impl_flags'.
--
Patch adds VPCLMUL/AVX2 accelerated implementation for GHASH (GCM) and
POLYVAL (GCM-SIV).
Benchmark on AMD Ryzen 5800X (zen3):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.088 ns/B 10825 MiB/s 0.427 c/B 4850
GCM-SIV auth | 0.083 ns/B 11472 MiB/s 0.403 c/B 4850
After: (~1.93x faster)
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.045 ns/B 21098 MiB/s 0.219 c/B 4850
GCM-SIV auth | 0.043 ns/B 22181 MiB/s 0.209 c/B 4850
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 0.079 ns/B 12073 MiB/s 0.383 c/B 4850
GCM-SIV enc | 0.076 ns/B 12500 MiB/s 0.370 c/B 4850
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.080 ns/B 11919 MiB/s 0.327 c/B 4090
GCM-SIV auth | 0.075 ns/B 12643 MiB/s 0.309 c/B 4090
After: (~1.28x faster)
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM auth | 0.062 ns/B 15348 MiB/s 0.254 c/B 4090
GCM-SIV auth | 0.058 ns/B 16381 MiB/s 0.238 c/B 4090
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM enc | 0.101 ns/B 9441 MiB/s 0.413 c/B 4090
GCM-SIV enc | 0.098 ns/B 9692 MiB/s 0.402 c/B 4089
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-aeswrap.c (_gcry_cipher_keywrap_decrypt)
(_gcry_cipher_keywrap_decrypt_padding): Merged into...
(_gcry_cipher_keywrap_decrypt_auto): ... this.
Write length information to struct gcry_cipher_handle.
* cipher/cipher-internal.h (struct gcry_cipher_handle): Add
u_mode.wrap.
* cipher/cipher.c (_gcry_cipher_setup_mode_ops): Use
_gcry_cipher_keywrap_decrypt_auto.
(_gcry_cipher_info): Support GCRYCTL_GET_KEYLEN for
GCRY_CIPHER_MODE_AESWRAP. Not that it's not length of KEK,
but length of unwrapped key.
* tests/aeswrap.c (check_one_with_padding): Add check
for length of unwrapped key.
--
Fixes-commit: 2914f169f95467b9c789000105773b38ad2dea5a
GnuPG-bug-id: 5752
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/gcrypt.h.in (GCRY_CIPHER_EXTENDED): New enum value.
* cipher/cipher-aeswrap.c (wrap): New.
(_gcry_cipher_keywrap_encrypt, unwrap): Use wrap.
(_gcry_cipher_keywrap_encrypt_padding): New.
(_gcry_cipher_keywrap_decrypt): Use unwrap.
(_gcry_cipher_keywrap_decrypt_padding): New.
* cipher/cipher-internal.h: Add declarations.
* cipher/cipher.c (_gcry_cipher_open_internal): Support
GCRY_CIPHER_EXTENDED.
(_gcry_cipher_setup_mode_ops): Extend for GCRY_CIPHER_MODE_AESWRAP.
* tests/aeswrap.c: Add two tests from RFC5649.
--
GnuPG-bug-id: 5752
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c (gfmul_pclmul_aggr4)
(gfmul_pclmul_aggr8): Move assembly to new GFMUL_AGGRx_ASM* macros.
(GFMUL_AGGR4_ASM_1, GFMUL_AGGR4_ASM_2, gfmul_pclmul_aggr4_le)
(GFMUL_AGGR8_ASM, gfmul_pclmul_aggr8_le)
(_gcry_polyval_intel_pclmul): New.
* cipher/cipher-gcm-siv.c (do_polyval_buf): Use polyval function
if available.
* cipher/cipher-gcm.c (_gcry_polyval_intel_pclmul): New.
(setupM): Setup 'c->u_mode.gcm.polyval_fn' with accelerated polyval
function if available.
* cipher/cipher-internal.h (gcry_cipher_handle): Add member
'u_mode.gcm.polyval_fn'.
--
Benchmark on AMD Ryzen 7 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 0.150 ns/B 6337 MiB/s 0.730 c/B 4849
GCM-SIV dec | 0.163 ns/B 5862 MiB/s 0.789 c/B 4850
GCM-SIV auth | 0.119 ns/B 8022 MiB/s 0.577 c/B 4850
After (enc/dec ~26% faster, auth ~43% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 0.117 ns/B 8138 MiB/s 0.568 c/B 4850
GCM-SIV dec | 0.128 ns/B 7429 MiB/s 0.623 c/B 4850
GCM-SIV auth | 0.083 ns/B 11507 MiB/s 0.402 c/B 4851
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-siv.c (do_ctr_le32): Use bulk function if
available.
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ctr32le_enc'.
* cipher/rijndael-aesni.c (_gcry_aes_aesni_ctr32le_enc): New.
* cipher/rijndael-vaes-avx2-amd64.S
(_gcry_vaes_avx2_ctr32le_enc_amd64, .Lle_addd_*): New.
* cipher/rijndael-vaes.c (_gcry_vaes_avx2_ctr32le_enc_amd64)
(_gcry_aes_vaes_ctr32le_enc): New.
* cipher/rijndael.c (_gcry_aes_aesni_ctr32le_enc)
(_gcry_aes_vaes_ctr32le_enc): New prototypes.
(do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for AES-NI and
VAES.
* tests/basic.c (check_gcm_siv_cipher): Add large test-vector for
bulk ops testing.
--
Counter mode in GCM-SIV is little-endian on first 4 bytes of
of counter block, unlike regular CTR mode which works on
big-endian full block.
Benchmark on AMD Ryzen 7 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 1.00 ns/B 953.2 MiB/s 4.85 c/B 4850
GCM-SIV dec | 1.01 ns/B 940.1 MiB/s 4.92 c/B 4850
GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.575 c/B 4850
After (~6x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GCM-SIV enc | 0.150 ns/B 6367 MiB/s 0.727 c/B 4850
GCM-SIV dec | 0.161 ns/B 5909 MiB/s 0.783 c/B 4850
GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.574 c/B 4850
GnuPG-bug-id: T4485
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-siv.c'.
* cipher/cipher-gcm-siv.c: New.
* cipher/cipher-gcm.c (_gcry_cipher_gcm_setupM): New.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'siv_keylen'.
(_gcry_cipher_gcm_setupM, _gcry_cipher_gcm_siv_encrypt)
(_gcry_cipher_gcm_siv_decrypt, _gcry_cipher_gcm_siv_set_nonce)
(_gcry_cipher_gcm_siv_authenticate)
(_gcry_cipher_gcm_siv_set_decryption_tag)
(_gcry_cipher_gcm_siv_get_tag, _gcry_cipher_gcm_siv_check_tag)
(_gcry_cipher_gcm_siv_setkey): New prototypes.
(cipher_block_bswap): New helper function.
* cipher/cipher.c (_gcry_cipher_open_internal): Add
'GCRY_CIPHER_MODE_GCM_SIV'; Refactor mode requirement checks for
better size optimization (check pointers & blocksize in same order
for all).
(cipher_setkey, cipher_reset, _gcry_cipher_setup_mode_ops)
(_gcry_cipher_setup_mode_ops, _gcry_cipher_info): Add GCM-SIV.
(_gcry_cipher_ctl): Handle 'set decryption tag' for GCM-SIV.
* doc/gcrypt.texi: Add GCM-SIV.
* src/gcrypt.h.in (GCRY_CIPHER_MODE_GCM_SIV): New.
(GCRY_SIV_BLOCK_LEN, gcry_cipher_set_decryption_tag): Add to comment
that these are also for GCM-SIV in addition to SIV mode.
* tests/basic.c (check_gcm_siv_cipher): New.
(check_cipher_modes): Check for GCM-SIV.
* tests/bench-slope.c (bench_gcm_siv_encrypt_do_bench)
(bench_gcm_siv_decrypt_do_bench, bench_gcm_siv_authenticate_do_bench)
(gcm_siv_encrypt_ops, gcm_siv_decrypt_ops)
(gcm_siv_authenticate_ops): New.
(cipher_modes): Add GCM-SIV.
(cipher_bench_one): Check key length requirement for GCM-SIV.
--
GnuPG-bug-id: T4485
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-siv.c'.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Rename to
_gcry_cipher_ctr_encrypt_ctx and add algo context parameter.
(_gcry_cipher_ctr_encrypt): New using _gcry_cipher_ctr_encrypt_ctx.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.siv'.
(_gcry_cipher_ctr_encrypt_ctx, _gcry_cipher_siv_encrypt)
(_gcry_cipher_siv_decrypt, _gcry_cipher_siv_set_nonce)
(_gcry_cipher_siv_authenticate, _gcry_cipher_siv_set_decryption_tag)
(_gcry_cipher_siv_get_tag, _gcry_cipher_siv_check_tag)
(_gcry_cipher_siv_setkey): New.
* cipher/cipher-siv.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, _gcry_cipher_setup_mode_ops, _gcry_cipher_info): Add
GCRY_CIPHER_MODE_SIV handling.
(_gcry_cipher_ctl): Add GCRYCTL_SET_DECRYPTION_TAG handling.
* doc/gcrypt.texi: Add documentation for SIV mode.
* src/gcrypt.h.in (GCRYCTL_SET_DECRYPTION_TAG): New.
(GCRY_CIPHER_MODE_SIV): New.
(gcry_cipher_set_decryption_tag): New.
* tests/basic.c (check_siv_cipher): New.
(check_cipher_modes): Add call for 'check_siv_cipher'.
* tests/bench-slope.c (bench_encrypt_init): Use double size key for
SIV mode.
(bench_aead_encrypt_do_bench, bench_aead_decrypt_do_bench)
(bench_aead_authenticate_do_bench): Reset cipher context on each run.
(bench_aead_authenticate_do_bench): Support nonce-less operation.
(bench_siv_encrypt_do_bench, bench_siv_decrypt_do_bench)
(bench_siv_authenticate_do_bench, siv_encrypt_ops)
(siv_decrypt_ops, siv_authenticate_ops): New.
(cipher_modes): Add SIV mode benchmarks.
(cipher_bench_one): Restrict SIV mode testing to 16 byte block-size.
--
GnuPG-bug-id: T4486
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-ppc.c (ALIGNED_16): New.
(vec_store_he, vec_load_he): Remove WORDS_BIGENDIAN ifdef.
(vec_dup_byte_elem): New.
(_gcry_ghash_setup_ppc_vpmsum): Match function declaration with
prototype in cipher-gcm.c; Load C2 with VEC_LOAD_BE; Use
vec_dup_byte_elem; Align constants to 16 bytes.
(_gcry_ghash_ppc_vpmsum): Match function declaration with
prototype in cipher-gcm.c; Align constant to 16 bytes.
* cipher/cipher-gcm.c (ghash_ppc_vpmsum): Return value from
_gcry_ghash_ppc_vpmsum.
* cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): Remove requirement
for !WORDS_BIGENDIAN.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-ppc.c'.
* cipher/cipher-gcm-ppc.c: New.
* cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum)
(_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New.
(setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if
HW feature "ppc-vcrypto" is available.
* cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New.
(gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align
'gcm_table' to 16 bytes.
* configure.ac: Add 'cipher-gcm-ppc.lo'.
* tests/basic.c (_check_gcm_cipher): New AES256 test vector.
* AUTHORS: Add 'CRYPTOGAMS'.
* LICENSES: Add original license to 3-clause-BSD section.
--
https://dev.gnupg.org/D501:
10-20X speed.
However this Power 9 machine is faster than the last Power 9 benchmarks
on the optimized versions, so while better than the last patch, it is
not all due to the code.
Before:
GCM enc | 4.23 ns/B 225.3 MiB/s - c/B
GCM dec | 3.58 ns/B 266.2 MiB/s - c/B
GCM auth | 3.34 ns/B 285.3 MiB/s - c/B
After:
GCM enc | 0.370 ns/B 2578 MiB/s - c/B
GCM dec | 0.371 ns/B 2571 MiB/s - c/B
GCM auth | 0.159 ns/B 6003 MiB/s - c/B
Signed-off-by: Shawn Landden <shawn@git.icu>
[jk: coding style fixes, Makefile.am integration, patch from Differential
to git, commit changelog, fixed few compiler warnings]
GnuPG-bug-id: 5040
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'asm-inline-s390x.h'.
* cipher/asm-inline-s390x.h: New.
* cipher/cipher-gcm.c [GCM_USE_S390X_CRYPTO] (ghash_s390x_kimd): New.
(setupM) [GCM_USE_S390X_CRYPTO]: Add setup for s390x GHASH function.
* cipher/cipher-internal.h (GCM_USE_S390X_CRYPTO): New.
* cipher/rijndael-s390x.c (u128_t, km_functions_e): Move to
'asm-inline-s390x.h'.
(aes_s390x_gcm_crypt): New.
(_gcry_aes_s390x_setup_acceleration): Use 'km_function_to_mask'; Add
setup for GCM bulk function.
--
This patch adds zSeries acceleration for GHASH and AES-GCM.
Benchmarks (z15, 5.2Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B
GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B
GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
GCM enc | 0.059 ns/B 16066 MiB/s 0.309 c/B
GCM dec | 0.059 ns/B 16114 MiB/s 0.308 c/B
GCM auth | 0.057 ns/B 16747 MiB/s 0.296 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm.c (do_ghash_buf): Proper handling for the case
where 'unused' gets filled to full blocksize.
(gcm_crypt_inner): New.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt): Use
'gcm_crypt_inner'.
* cipher/cipher-internal.h (cipher_bulk_ops_t): Add 'gcm_crypt'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_bulk_ops): Add 'ofb_enc'.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt): Use bulk encryption
function if defined.
* cipher/basic.c (check_bulk_cipher_modes): Add OFB-AES test vectors.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_mode_ops_t, cipher_bulk_ops_t): New.
(gcry_cipher_handle): Define members 'mode_ops' and 'bulk' using new
types.
* cipher/cipher.c (_gcry_cipher_open_internal): Remove bulk function
setup.
(cipher_setkey): Pass context bulk function pointer to algorithm setkey
function.
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk
function parameter; Use bulk function returned by setkey function.
* cipher/cipher-selftest.h (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk
function parameter.
* cipher/arcfour.c (arcfour_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/blowfish.c (bf_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/camellia.c (camellia_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec)
(_gcry_camellia_cfb_dec, _gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/cast5.c (cast_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Make
static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/chacha20.c (chacha20_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/cast5.c (do_tripledes_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec): Make
static.
(bulk_selftest_setkey): Change 'hd' parameter to 'bulk_ops'.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(do_des_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/gost28147.c (gost_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/idea.c (idea_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/rfc2268.c (do_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/rijndael.c (do_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(rijndael_setkey): Change 'hd' parameter to 'bulk_ops'.
(_gcry_aes_cfb_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_enc)
(_gcry_aes_cbc_dec, _gcry_aes_ctr_enc, _gcry_aes_ocb_crypt)
(_gcry_aes_ocb_auth, _gcry_aes_xts_crypt): Make static.
(selftest_basic_128, selftest_basic_192, selftest_basic_256): Pass
'bulk_ops' to setkey function.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
* cipher/salsa20.c (salsa20_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/seed.c (seed_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/serpent.c (serpent_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec, _gcry_serpent_cfb_dec)
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Make static.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass
bulk function to selftest helper.
* cipher/sm4.c (sm4_setkey): Change 'hd' parameter to 'bulk_ops'; Setup
'bulk_ops' with bulk acceleration functions.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Make static.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass
bulk function to selftest helper.
* cipher/twofish.c (twofish_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(_gcry_twofish_cfb_dec, _gcry_twofish_ocb_crypt)
(_gcry_twofish_ocb_auth): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest, main): Pass 'bulk_ops' to setkey function.
* src/cipher-proto.h: Forward declare 'cipher_bulk_ops_t'.
(gcry_cipher_setkey_t): Replace 'hd' with 'bulk_ops'.
* src/cipher.h: Remove bulk acceleration function prototypes for
'aes', 'blowfish', 'cast5', 'camellia', '3des', 'serpent', 'sm4' and
'twofish'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Add
'marks.allow_weak_key' flag.
* cipher/cipher.c (cipher_setkey): Do not handle weak key as error when
weak keys are allowed.
(cipher_reset): Preserve 'marks.allow_weak_key' flag on object reset.
(_gcry_cipher_ctl): Add handling for GCRYCTL_SET_ALLOW_WEAK_KEY.
* src/gcrypt.h.in (gcry_ctl_cmds): Add GCRYCTL_SET_ALLOW_WEAK_KEY.
* tests/basic.c (check_ecb_cipher): Add tests for weak key errors and
for GCRYCTL_SET_ALLOW_WEAK_KEY.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm.c [GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[32..63] values.
[GCM_TABLES_USE_U64] (do_ghash): Split processing of two 64-bit halfs
of the input to two separate loops; Use precalculated M[] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[64..127] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_ghash): Use precalculated
M[] values.
[GCM_USE_TABLES] (bshift): Avoid conditional execution for mask
calculation.
* cipher/cipher-internal.h (gcry_cipher_handle): Double gcm_table size.
--
Benchmark on Intel Haswell (amd64, --disable-hwf all):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GMAC_AES | 2.79 ns/B 341.3 MiB/s 11.17 c/B 3998
After (~36% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GMAC_AES | 2.05 ns/B 464.7 MiB/s 8.20 c/B 3998
Benchmark on Intel Haswell (win32, --disable-hwf all):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GMAC_AES | 4.90 ns/B 194.8 MiB/s 19.57 c/B 3997
After (~36% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
GMAC_AES | 3.58 ns/B 266.4 MiB/s 14.31 c/B 3999
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_block_add): New.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc): Use new helper function
for CTR block increment.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/des.c (_gcry_3des_ctr_enc): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-armv7-neon.S'.
* cipher/cipher-gcm-armv7-neon.S: New.
* cipher/cipher-gcm.c [GCM_USE_ARM_NEON] (_gcry_ghash_setup_armv7_neon)
(_gcry_ghash_armv7_neon, ghash_setup_armv7_neon)
(ghash_armv7_neon): New.
(setupM) [GCM_USE_ARM_NEON]: Use armv7/neon implementation if have
HWF_ARM_NEON.
* cipher/cipher-internal.h (GCM_USE_ARM_NEON): New.
--
Benchmark on Cortex-A53 (816 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 34.81 ns/B 27.40 MiB/s 28.41 c/B
After (3.0x faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 11.49 ns/B 82.99 MiB/s 9.38 c/B
Reported-by: Yuriy M. Kaminskiy <yumkam@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Remove OCB L0L1L0.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_setkey): Ditto.
* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec)
(_gcry_aes_aesni_ocb_auth): Replace L0L1L0 use with L1.
--
Patch fixes L0+L1+L0 thinko. This is same as L1 (L0 xor L1 xor L0).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Mark areas of
u_mode.ocb that are and are not cleared by gcry_cipher_reset.
(_gcry_cipher_ocb_setkey): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Split
L-table generation to ...
(_gcry_cipher_ocb_setkey): ... this new function.
* cipher/cipher.c (cipher_setkey): Add handling for OCB mode.
(cipher_reset): Do not clear L-values for OCB mode.
--
OCB L-tables do not depend on nonce value, but only on cipher key.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/asm-poly1305-amd64.h: New.
* cipher/Makefile.am: Add 'asm-poly1305-amd64.h'.
* cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New.
* cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes.
(chacha20_encrypt_stream): Split tail to...
(do_chacha20_encrypt_stream_tail): ... new function.
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New.
* cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New prototypes.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call
'_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20.
(_gcry_cipher_poly1305_decrypt): Call
'_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20.
* cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New
prototype.
* cipher/poly1305.c (poly1305_blocks): Make static.
(_gcry_poly1305_update): Split main function body to ...
(_gcry_poly1305_update_burn): ... new function.
--
Benchmark on Intel Skylake (i5-6500, 3200 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.378 ns/B 2526 MiB/s 1.21 c/B
STREAM dec | 0.373 ns/B 2560 MiB/s 1.19 c/B
POLY1305 enc | 0.685 ns/B 1392 MiB/s 2.19 c/B
POLY1305 dec | 0.686 ns/B 1390 MiB/s 2.20 c/B
POLY1305 auth | 0.315 ns/B 3031 MiB/s 1.01 c/B
After, 8-way AVX2 (~36% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.503 ns/B 1896 MiB/s 1.61 c/B
POLY1305 dec | 0.485 ns/B 1965 MiB/s 1.55 c/B
Benchmark on Intel Haswell (i7-4790K, 3998 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.318 ns/B 2999 MiB/s 1.27 c/B
STREAM dec | 0.317 ns/B 3004 MiB/s 1.27 c/B
POLY1305 enc | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 dec | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 auth | 0.271 ns/B 3524 MiB/s 1.08 c/B
After, 8-way AVX2 (~30% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.452 ns/B 2108 MiB/s 1.81 c/B
POLY1305 dec | 0.440 ns/B 2167 MiB/s 1.76 c/B
Before, 4-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.627 ns/B 1521 MiB/s 2.51 c/B
STREAM dec | 0.626 ns/B 1523 MiB/s 2.50 c/B
POLY1305 enc | 0.895 ns/B 1065 MiB/s 3.58 c/B
POLY1305 dec | 0.896 ns/B 1064 MiB/s 3.58 c/B
POLY1305 auth | 0.271 ns/B 3521 MiB/s 1.08 c/B
After, 4-way SSSE3 (~20% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.733 ns/B 1301 MiB/s 2.93 c/B
POLY1305 dec | 0.726 ns/B 1314 MiB/s 2.90 c/B
Before, 1-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.56 ns/B 609.6 MiB/s 6.25 c/B
POLY1305 dec | 1.56 ns/B 609.4 MiB/s 6.26 c/B
After, 1-way SSSE3 (~18% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.31 ns/B 725.4 MiB/s 5.26 c/B
POLY1305 dec | 1.31 ns/B 727.3 MiB/s 5.24 c/B
For comparison to other libraries (on Intel i7-4790K, 3998 Mhz):
bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018
Cipher:
chacha20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.301 ns/B 3166.4 MiB/s 1.20 c/B
STREAM dec | 0.300 ns/B 3174.7 MiB/s 1.20 c/B
POLY1305 enc | 0.463 ns/B 2060.6 MiB/s 1.85 c/B
POLY1305 dec | 0.462 ns/B 2063.8 MiB/s 1.85 c/B
POLY1305 auth | 0.162 ns/B 5899.3 MiB/s 0.646 c/B
bench-slope-nettle: Nettle 3.4
Cipher:
chacha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.65 ns/B 578.2 MiB/s 6.59 c/B
STREAM dec | 1.65 ns/B 578.2 MiB/s 6.59 c/B
POLY1305 enc | 2.05 ns/B 464.8 MiB/s 8.20 c/B
POLY1305 dec | 2.05 ns/B 464.7 MiB/s 8.20 c/B
POLY1305 auth | 0.404 ns/B 2359.1 MiB/s 1.62 c/B
bench-slope-botan: Botan 2.6.0
Cipher:
ChaCha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc/dec | 0.855 ns/B 1116.0 MiB/s 3.42 c/B
POLY1305 enc | 1.60 ns/B 595.4 MiB/s 6.40 c/B
POLY1305 dec | 1.60 ns/B 595.8 MiB/s 6.40 c/B
POLY1305 auth | 0.752 ns/B 1268.3 MiB/s 3.01 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB
values L0L1 and L0L1L0; Swap dimensions for OCB L table.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and
L0L1L0 values.
(ocb_crypt): Process input in 24KiB chunks for better cache locality
for checksumming.
* cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always
inlining functions, change all functions with 'inline' to use
ALWAYS_INLINE.
(NO_INLINE): New macro.
(aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to...
(aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and
adjust accordingly (xmm7 moved from *_7_15 to *_2_7).
(aesni_prepare_2_6, aesni_prepare_7_15): Rename to...
(aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust
accordingly.
(aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to...
(aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust
accordingly.
(aesni_ocb_checksum): New.
(aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum
calculation as separate pass instead of inline; Use NO_INLINE.
(_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0.
* cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add
'use_avx2' and 'use_avx'.
* cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if
Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW
feature is available.
* tests/basic.c (do_check_ocb_cipher): New test vector; increase
size of temporary buffers for new test vector.
(check_ocb_cipher_largebuf_split): Make test plaintext non-uniform
for better checksum testing.
(check_ocb_cipher_checksum): New.
(check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum.
(check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf
test runs.
--
Benchmark on Haswell i7-4970k @ 4.0Ghz:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.175 ns/B 5436 MiB/s 0.702 c/B
OCB dec | 0.184 ns/B 5184 MiB/s 0.736 c/B
OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B
After (enc +2% faster, dec +7% faster):
OCB enc | 0.172 ns/B 5547 MiB/s 0.688 c/B
OCB dec | 0.171 ns/B 5582 MiB/s 0.683 c/B
OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_get_he32, buf_put_he32, buf_get_he64)
(buf_put_he64): New.
* cipher/cipher-internal.h (cipher_block_cpy, cipher_block_xor)
(cipher_block_xor_1, cipher_block_xor_2dst, cipher_block_xor_n_copy_2)
(cipher_block_xor_n_copy): New.
* cipher/cipher-gcm-intel-pclmul.c
(_gcry_ghash_setup_intel_pclmul): Use assembly for swapping endianness
instead of buf_get_be64 and buf_cpy.
* cipher/blowfish.c: Use new cipher_block_* functions for cipher block
sized buf_cpy/xor* operations.
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/cipher-aeswrap.c: Ditto.
* cipher/cipher-cbc.c: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-cfb.c: Ditto.
* cipher/cipher-cmac.c: Ditto.
* cipher/cipher-ctr.c: Ditto.
* cipher/cipher-eax.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-ocb.c: Ditto.
* cipher/cipher-ofb.c: Ditto.
* cipher/cipher-xts.c: Ditto.
* cipher/des.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
--
This commit adds size-optimized functions for copying and xoring
cipher block sized buffers. These functions also allow GCC to use
inline auto-vectorization for block cipher copying and xoring on
higher optimization levels.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Add function pointers
for mode operations.
(_gcry_cipher_xts_crypt): Remove.
(_gcry_cipher_xts_encrypt, _gcry_cipher_xts_decrypt): New.
* cipher/cipher-xts.c (_gcry_cipher_xts_encrypt)
(_gcry_cipher_xts_decrypt): New.
* cipher/cipher.c (_gcry_cipher_setup_mode_ops): New.
(_gcry_cipher_open_internal): Setup mode routines.
(cipher_encrypt, cipher_decrypt): Remove.
(do_stream_encrypt, do_stream_decrypt, do_encrypt_none_unknown)
(do_decrypt_none_unknown): New.
(_gcry_cipher_encrypt, _gcry_cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Adapted to use mode routines through pointers.
--
Change to use mode operations through pointers to reduce per call
overhead for cipher operations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-cbc.c (cbc_encrypt_inner, cbc_decrypt_inner)
(_gcry_cipher_cbc_cts_encrypt, _gcry_cipher_cbc_cts_decrypt): New.
(_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt): Remove CTS
handling.
* cipher/cipher-internal.h (_gcry_cipher_cbc_cts_encrypt)
(_gcry_cipher_cbc_cts_decrypt): New.
* cipher/cipher.c (cipher_encrypt, cipher_decrypt): Call CBC-CTS
handler if CBC-CTS flag is set.
--
Separate CTS handling to separate function for small decrease in
CBC per call overhead.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (_gcry_blocksize_shift): New.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipherp_cbc_decrypt): Use bit-level operations instead of
division to get number of blocks and check input length against
blocksize.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-cmac.c (_gcry_cmac_write): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_crypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
--
Integer division was causing 10 to 20 cycles per call overhead
for cipher modes on x86-64.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-eax.c'.
* cipher/cipher-cmac.c (cmac_write): Rename to ...
(_gcry_cmac_write): ... this; Take CMAC context as new input
parameter; Return error code.
(cmac_generate_subkeys): Rename to ...
(_gcry_cmac_generate_subkeys): ... this; Take CMAC context as new
input parameter; Return error code.
(cmac_final): Rename to ...
(_gcry_cmac_final): ... this; Take CMAC context as new input
parameter; Return error code.
(cmac_tag): Take CMAC context as new input parameter.
(_gcry_cmac_reset): New.
(_gcry_cipher_cmac_authenticate): Remove duplicate tag flag check;
Adapt to changes above.
(_gcry_cipher_cmac_get_tag): Adapt to changes above.
(_gcry_cipher_cmac_check_tag): Ditto.
(_gcry_cipher_cmac_set_subkeys): Ditto.
* cipher-eax.c: New.
* cipher-internal.h (gcry_cmac_context_t): New.
(gcry_cipher_handle): Update u_mode.cmac; Add u_mode.eax.
(_gcry_cmac_write, _gcry_cmac_generate_subkeys, _gcry_cmac_final)
(_gcry_cmac_reset, _gcry_cipher_eax_encrypt, _gcry_cipher_eax_decrypt)
(_gcry_cipher_eax_set_nonce, _gcry_cipher_eax_authenticate)
(_gcry_cipher_eax_get_tag, _gcry_cipher_eax_check_tag)
(_gcry_cipher_eax_setkey): New prototypes.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag, _gcry_cipher_checktag)
(_gcry_cipher_info): Add EAX mode.
* doc/gcrypt.texi: Add EAX mode.
* src/gcrypt.h.in (GCRY_CIPHER_MODE_EAX): New.
* tests/basic.c (_check_gcm_cipher, _check_poly1305_cipher): Constify
test vectors array.
(_check_eax_cipher, check_eax_cipher): New.
(check_ciphers, check_cipher_modes): Add EAX mode.
* tests/bench-slope.c (bench_eax_encrypt_do_bench)
(bench_eax_decrypt_do_bench, bench_eax_authenticate_do_bench)
(eax_encrypt_ops, eax_decrypt_ops, eax_authenticate_ops): New.
(cipher_modes): Add EAX mode.
* tests/benchmark.c (cipher_bench): Add EAX mode.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Change bulk
XTS function to take cipher context.
* cipher/cipher-xts.c (_gcry_cipher_xts_crypt): Ditto.
* cipher/cipher.c (_gcry_cipher_open_internal): Setup AES-NI
XTS bulk function.
* cipher/rijndael-aesni.c (xts_gfmul_const, _gcry_aes_aesni_xts_enc)
(_gcry_aes_aesni_xts_enc, _gcry_aes_aesni_xts_crypt): New.
* cipher/rijndael.c (_gcry_aes_aesni_xts_crypt)
(_gcry_aes_xts_crypt): New.
* src/cipher.h (_gcry_aes_xts_crypt): New.
--
Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo):
Before:
XTS enc | 1.66 ns/B 575.7 MiB/s 6.63 c/B
XTS dec | 1.66 ns/B 575.5 MiB/s 6.63 c/B
After (~6x faster):
XTS enc | 0.270 ns/B 3528.5 MiB/s 1.08 c/B
XTS dec | 0.272 ns/B 3511.5 MiB/s 1.09 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
--
GnuPG-bug-id: 3120
Reported-by: ka7 (klemens)
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-cfb.c (_gcry_cipher_cfb8_encrypt)
(_gcry_cipher_cfg8_decrypt): Add 8-bit variants of decrypt/encrypt
functions.
* cipher/cipher-internal.h (_gcry_cipher_cfb8_encrypt)
(_gcry_cipher_cfg8_decrypt): Ditto.
* cipher/cipher.c: Adjust code flow to work with GCRY_CIPHER_MODE_CFB8.
* tests/basic.c: Add tests for cfb8 with AES and 3DES.
--
Signed-off-by: Mathias L. Baumann <mathias.baumann at sociomantic.com>
[JK: edit changelog, fix email malformed patch]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-xts.c'.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'bulk.xts_crypt'
and 'u_mode.xts' members.
(_gcry_cipher_xts_crypt): New prototype.
* cipher/cipher-xts.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt): Add XTS mode handling.
* doc/gcrypt.texi: Add XTS mode to documentation.
* src/gcrypt.h.in (GCRY_CIPHER_MODE_XTS, GCRY_XTS_BLOCK_LEN): New.
* tests/basic.c (do_check_xts_cipher, check_xts_cipher): New.
(check_bulk_cipher_modes): Add XTS test-vectors.
(check_one_cipher_core, check_one_cipher, check_ciphers): Add XTS
testing support.
(check_cipher_modes): Add XTS test.
* tests/bench-slope.c (bench_xts_encrypt_init)
(bench_xts_encrypt_do_bench, bench_xts_decrypt_do_bench)
(xts_encrypt_ops, xts_decrypt_ops): New.
(cipher_modes, cipher_bench_one): Add XTS.
* tests/benchmark.c (cipher_bench): Add XTS testing.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c (_gcry_cipher_ocb_get_l): Remove.
(ocb_get_L_big): New.
(_gcry_cipher_ocb_authenticate): L-big handling done in upper
processing loop, so that lower level never sees the case where
'aad_nblocks % 65536 == 0'; Add missing stack burn.
(ocb_aad_finalize): Add missing stack burn.
(ocb_crypt): L-big handling done in upper processing loop, so that
lower level never sees the case where 'data_nblocks % 65536 == 0'.
* cipher/cipher-internal.h (_gcry_cipher_ocb_get_l): Remove.
(ocb_get_l): Remove 'l_tmp' usage and simplify since input
is more limited now, 'N is not multiple of 65536'.
* cipher/rijndael-aesni.c (get_l): Remove.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Remove
l_tmp; Use 'ocb_get_l'.
* cipher/rijndael-ssse3-amd64.c (get_l): Remove.
(ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Remove
l_tmp; Use 'ocb_get_l'.
* cipher/camellia-glue.c: Remove OCB l_tmp usage.
* cipher/rijndael-armv8-ce.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
--
Move large L value generation to up-most level to simplify lower level
ocb_get_l for greater performance and simpler implementation. This helps
implementing OCB in assembly as 'ocb_get_l' no longer has function call
on slow-path.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch64-ce.S'.
* cipher/cipher-gcm-armv8-aarch64-ce.S: New.
* cipher/cipher-internal.h (GCM_USE_ARM_PMULL): Enable on
ARMv8/AArch64.
--
Benchmark on Cortex-A53 (1152 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 15.54 ns/B 61.36 MiB/s 17.91 c/B
After (11.9x faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 1.30 ns/B 731.5 MiB/s 1.50 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch32-ce.S'.
* cipher/cipher-gcm-armv8-aarch32-ce.S: New.
* cipher/cipher-gcm.c [GCM_USE_ARM_PMULL]
(_gcry_ghash_setup_armv8_ce_pmull, _gcry_ghash_armv8_ce_pmull)
(ghash_setup_armv8_ce_pmull, ghash_armv8_ce_pmull): New.
(setupM) [GCM_USE_ARM_PMULL]: Enable ARM PMULL implementation if
HWF_ARM_PULL HW feature flag is enabled.
* cipher/cipher-gcm.h (GCM_USE_ARM_PMULL): New.
--
Benchmark on Cortex-A53 (1152 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 24.10 ns/B 39.57 MiB/s 27.76 c/B
After (~26x faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.924 ns/B 1032.2 MiB/s 1.06 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Add fields
aad_leftover and aad_nleftover to u_mode.ocb.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Clear
aad_nleftover.
(_gcry_cipher_ocb_authenticate): Add buffering and facor some code out
to ...
(ocb_aad_finalize): new.
(compute_tag_if_needed): Call new function.
* tests/basic.c (check_ocb_cipher_splitaad): New.
(check_ocb_cipher): Call new function.
(main): Also call check_cipher_modes with --ciper-modes.
--
It is more convenient to not require full blocks for
gcry_cipher_authenticate. Other modes than OCB do this as well.
Note that the size of the context structure is not increased because
other modes require more context data.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (available_digests_64): Merge with available_digests.
(available_kdfs_64): Merge with available_kdfs.
<64 bit datatype test>: Bail out if no such type is available.
* src/types.h: Emit #error if no u64 can be defined.
(PROPERLY_ALIGNED_TYPE): Always add u64 type.
* cipher/bithelp.h: Remove all code paths which handle the
case of !HAVE_U64_TYPEDEF.
* cipher/bufhelp.h: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-internal.h: Ditto.
* cipher/cipher.c: Ditto.
* cipher/hash-common.h: Ditto.
* cipher/md.c: Ditto.
* cipher/poly1305.c: Ditto.
* cipher/scrypt.c: Ditto.
* cipher/tiger.c: Ditto.
* src/g10lib.h: Ditto.
* tests/basic.c: Ditto.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
--
Given that SHA-2 and some other algorithms require a 64 bit type it
does not make anymore sense to conditionally compile some part when
the platform does not provide such a type.
GnuPG-bug-id: 1815.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (ocb_get_l): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/camellia-glue.c (get_l): Remove.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common
offsets.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate
offset array when block count matches parallel operation size.
* cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most
common offsets.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use
'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/serpent.c (get_l): Remove.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/twofish.c (get_l): Remove.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l'
instead of 'get_l'.
--
Patch optimizes OCB offset calculation for generic code and
assembly implementations with parallel block processing.
Benchmark of OCB AES-NI on Intel Haswell:
$ tests/bench-slope --cpu-mhz 3201 cipher aes
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B
CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B
OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B
OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B
OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B
CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B
OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B
OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B
OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Change bulk function to return number of unprocessed
blocks.
* src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth)
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth)
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth)
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type
to 'size_t'.
* cipher/camellia-glue.c (get_l): Only if USE_AESNI_AVX or
USE_AESNI_AVX2 defined.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_AESNI_AVX or
USE_AESNI_AVX2 defined; Remove unaccelerated common code.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Change
return type to 'size_t' and return zero.
* cipher/serpent.c (get_l): Only if USE_SSE2, USE_AVX2 or USE_NEON
defined.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_SSE2, USE_AVX2 or
USE_NEON defined; Remove unaccelerated common code.
* cipher/twofish.c (get_l): Only if USE_AMD64_ASM defined.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_AMD64_ASM defined;
Remove unaccelerated common code.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul)
( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector
registers before use and restore after.
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency
on !defined(__WIN64__).
* cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable,
aesni_prepare, aesni_prepare_2_6, aesni_cleanup)
( aesni_cleanup_2_6): New.
[!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New.
(_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc)
(_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec)
(_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use
'aesni_prepare_2_6'.
* cipher/rijndael-internal.h (USE_SSSE3): Enable if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS.
(USE_AESNI): Remove dependency on !defined(__WIN64__)
* cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS]
(vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New.
[!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New.
(vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use
'vpaes_ssse3_prepare'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use
'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'.
[HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to
exclude '.type' and '.size' markers from assembly code, as they are
not support on WIN64/COFF objects.
* configure.ac (gcry_cv_gcc_attribute_ms_abi)
(gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi)
(gcry_cv_gcc_default_abi_is_sysv_abi)
(gcry_cv_gcc_win64_platform_as_ok): New checks.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when
__WIN64__ defined.
* cipher/rijndael-internal.h (USE_AESNI): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (gcry_cipher_handle): Add bulk.ocb_crypt
and bulk.ocb_auth.
(_gcry_cipher_ocb_get_l): New prototype.
* cipher/cipher-ocb.c (get_l): Rename to ...
(_gcry_cipher_ocb_get_l): ... this.
(_gcry_cipher_ocb_authenticate, ocb_crypt): Use bulk function when
available.
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk
functions for AES.
* cipher/rijndael-aesni.c (get_l, aesni_ocb_enc, aes_ocb_dec)
(_gcry_aes_aesni_ocb_crypt, _gcry_aes_aesni_ocb_auth): New.
* cipher/rijndael.c [USE_AESNI] (_gcry_aes_aesni_ocb_crypt)
(_gcry_aes_aesni_ocb_auth): New prototypes.
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New.
* src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New
prototypes.
* tests/basic.c (check_ocb_cipher_largebuf): New.
(check_ocb_cipher): Add large buffer encryption/decryption test.
--
Patch adds bulk encryption/decryption/authentication code for AES-NI
accelerated AES.
Benchmark on Intel i5-4570 (3200 Mhz, turbo off):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 2.12 ns/B 449.7 MiB/s 6.79 c/B
OCB dec | 2.12 ns/B 449.6 MiB/s 6.79 c/B
OCB auth | 2.07 ns/B 459.9 MiB/s 6.64 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
OCB enc | 0.292 ns/B 3262.5 MiB/s 0.935 c/B
OCB dec | 0.297 ns/B 3212.2 MiB/s 0.950 c/B
OCB auth | 0.260 ns/B 3666.1 MiB/s 0.832 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c: New.
* cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c
* cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New.
(gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb.
* cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode.
(_gcry_cipher_open_internal): Setup default taglen of OCB.
(cipher_reset): Clear OCB specific data.
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions.
(_gcry_cipher_setiv): Add OCB specific nonce setting.
(_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN
* src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New.
(gcry_cipher_final): New.
* cipher/bufhelp.h (buf_xor_1): New.
* tests/basic.c (hex2buffer): New.
(check_ocb_cipher): New.
(main): Call it here. Add option --cipher-modes.
* tests/bench-slope.c (bench_aead_encrypt_do_bench): Call
gcry_cipher_final.
(bench_aead_decrypt_do_bench): Ditto.
(bench_aead_authenticate_do_bench): Ditto. Check error code.
(bench_ocb_encrypt_do_bench): New.
(bench_ocb_decrypt_do_bench): New.
(bench_ocb_authenticate_do_bench): New.
(ocb_encrypt_ops): New.
(ocb_decrypt_ops): New.
(ocb_authenticate_ops): New.
(cipher_modes): Add them.
(cipher_bench_one): Skip wrong block length for OCB.
* tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add
OCB support.
--
See the comments on top of cipher/cipher-ocb.c for the patent status
of the OCB mode.
The implementation has not yet been optimized and as such is not faster
that the other AEAD modes. A first candidate for optimization is the
double_block function. Large improvements can be expected by writing
an AES ECB function to work on multiple blocks.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
draft-irtf-cfrg-chacha20-poly1305-03
* cipher/cipher-internal.h (gcry_cipher_handle): Use separate byte
counters for AAD and data in Poly1305.
* cipher/cipher-poly1305.c (poly1305_fill_bytecount): Remove.
(poly1305_fill_bytecounts, poly1305_do_padding): New.
(poly1305_aad_finish): Fill padding to Poly1305 and do not fill AAD
length.
(_gcry_cipher_poly1305_authenticate, _gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt): Update AAD and data length separately.
(_gcry_cipher_poly1305_tag): Fill padding and bytecounts to Poly1305.
(_gcry_cipher_poly1305_setkey, _gcry_cipher_poly1305_setiv): Reset
AAD and data byte counts; only allow 96-bit IV.
* cipher/cipher.c (_gcry_cipher_open_internal): Limit Poly1305-AEAD to
ChaCha20 cipher.
* tests/basic.c (_check_poly1305_cipher): Update test-vectors.
(check_ciphers): Limit Poly1305-AEAD checks to ChaCha20.
* tests/bench-slope.c (cipher_bench_one): Ditto.
--
Latest Internet-Draft version for "ChaCha20 and Poly1305 for IETF protocols"
has added additional padding to Poly1305-AEAD and limited support IV size to
96-bits:
https://www.ietf.org/rfcdiff?url1=draft-nir-cfrg-chacha20-poly1305-03&difftype=--html&submit=Go!&url2=draft-irtf-cfrg-chacha20-poly1305-03
Patch makes Poly1305-AEAD implementation to match the changes and limits
Poly1305-AEAD to ChaCha20 only.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'.
* cipher/cipher-gcm-intel-pclmul.c: New.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL]
(_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New
prototypes.
[GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move
to 'cipher-gcm-intel-pclmul.c'.
(ghash): Rename to...
(ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new
function in 'cipher-gcm-intel-pclmul.c'.
(setupM): Move GCM_USE_INTEL_PCLMUL part to new function in
'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based
on available HW acceleration.
(do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'.
* cipher/internal.h (ghash_fn_t): New.
(gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'cipher-poly1305.c'.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'.
(_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt)
(_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate)
(_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New.
* cipher/cipher-poly1305.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'.
(cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ...
(_gcry_cipher_setiv): ... here, as with other modes.
* src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'.
* tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New.
(check_ciphers): Add Poly1305 check.
(check_cipher_modes): Call 'check_poly1305_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to
bench_aead_... and take nonce as argument.
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto.
(bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench)
(bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench)
(bench_poly1305_decrypt_do_bench)
(bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops)
(poly1305_decrypt_ops, poly1305_authenticate_ops): New.
(cipher_modes): Add Poly1305.
(cipher_bench_one): Add special handling for Poly1305.
--
Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant
of this mode is proposed for use in TLS and ipsec:
https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ccm.c: Move code inside [HAVE_U64_TYPEDEF].
[HAVE_U64_TYPEDEF] (_gcry_cipher_ccm_set_lengths): Use 'u64' for
data lengths.
[!HAVE_U64_TYPEDEF] (_gcry_cipher_ccm_encrypt)
(_gcry_cipher_ccm_decrypt, _gcry_cipher_ccm_set_nonce)
(_gcry_cipher_ccm_authenticate, _gcry_cipher_ccm_get_tag)
(_gcry_cipher_ccm_check_tag): Dummy functions returning
GPG_ERROR_NOT_SUPPORTED.
* cipher/cipher-internal.h (gcry_cipher_handle.u_mode.ccm)
(_gcry_cipher_ccm_set_lengths): Move inside [HAVE_U64_TYPEDEF] and use
u64 instead of size_t for CCM data lengths.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_reset)
(_gcry_cipher_ctl) [!HAVE_U64_TYPEDEF]: Return GPG_ERR_NOT_SUPPORTED
for CCM.
(_gcry_cipher_ctl) [HAVE_U64_TYPEDEF]: Use u64 for
GCRYCTL_SET_CCM_LENGTHS length parameters.
* tests/basic.c: Do not use CCM if !HAVE_U64_TYPEDEF.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm.c: Change all 'c->u_iv.iv' to
'c->u_mode.gcm.u_ghash_key.key'.
(_gcry_cipher_gcm_setkey): New.
(_gcry_cipher_gcm_initiv): Move ghash initialization to function above.
* cipher/cipher-internal.h (gcry_cipher_handle): Add
'u_mode.gcm.u_ghash_key'; Reorder 'u_mode.gcm' members for partial
clearing in gcry_cipher_reset.
(_gcry_cipher_gcm_setkey): New prototype.
* cipher/cipher.c (cipher_setkey): Add GCM setkey.
(cipher_reset): Clear 'u_mode' only partially for GCM.
--
GHASH tables can be generated at setkey time. No need to regenerate
for every new IV.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-gcm.c (do_ghash_buf): Add buffering for less than
blocksize length input and padding handling.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt): Add handling
for AAD padding and check if data has already being padded.
(_gcry_cipher_gcm_authenticate): Check that AAD or data has not being
padded yet.
(_gcry_cipher_gcm_initiv): Clear padding marks.
(_gcry_cipher_gcm_tag): Add finalization and padding; Clear sensitive
data from cipher handle, since they are not used after generating tag.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.gcm.macbuf',
'u_mode.gcm.mac_unused', 'u_mode.gcm.ghash_data_finalized' and
'u_mode.gcm.ghash_aad_finalized'.
* tests/basic.c (check_gcm_cipher): Rename to...
(_check_gcm_cipher): ...this and add handling for different buffer step
lengths; Enable per byte buffer testing.
(check_gcm_cipher): Call _check_gcm_cipher with different buffer step
sizes.
--
Until now, GCM was expecting full data to be input in one go. This patch adds
support for feeding data continuously (for encryption/decryption/aad).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|