summaryrefslogtreecommitdiff
path: root/cipher/cipher-internal.h
Commit message (Collapse)AuthorAgeFilesLines
* rijndael: add ECB acceleration (for benchmarking purposes)Jussi Kivilinna2022-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ecb_crypt'. * cipher/cipher.c (do_ecb_crypt): Use bulk function if available. * cipher/rijndael-aesni.c (do_aesni_enc_vec8): Change asm label '.Ldeclast' to '.Lenclast'. (_gcry_aes_aesni_ecb_crypt): New. * cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_ecb_enc_armv8_ce) (_gcry_aes_ecb_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_ocb_enc_armv8_ce) (_gcry_aes_ocb_dec_armv8_ce, _gcry_aes_ocb_auth_armv8_ce): Change return value from void to size_t. (ocb_crypt_fn_t, xts_crypt_fn_t): Remove. (_gcry_aes_armv8_ce_ocb_crypt, _gcry_aes_armv8_ce_xts_crypt): Remove indirect function call; Return value from called function (allows tail call optimization). (_gcry_aes_armv8_ce_ocb_auth): Return value from called function (allows tail call optimization). (_gcry_aes_ecb_enc_armv8_ce, _gcry_aes_ecb_dec_armv8_ce) (_gcry_aes_armv8_ce_ecb_crypt): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ecb_crypt_amd64): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ecb_crypt_amd64) (_gcry_aes_vaes_ecb_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_ecb_crypt) (_gcry_aes_vaes_ecb_crypt, _gcry_aes_armv8_ce_ecb_crypt): New. (do_setkey): Setup ECB bulk function for x86 AESNI/VAES and ARM CE. -- Benchmark on AMD Ryzen 9 7900X: Before (OCB for reference): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.128 ns/B 7460 MiB/s 0.720 c/B 5634±1 ECB dec | 0.134 ns/B 7103 MiB/s 0.753 c/B 5608 OCB enc | 0.029 ns/B 32930 MiB/s 0.163 c/B 5625 OCB dec | 0.029 ns/B 32738 MiB/s 0.164 c/B 5625 After: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz ECB enc | 0.028 ns/B 33761 MiB/s 0.159 c/B 5625 ECB dec | 0.028 ns/B 33917 MiB/s 0.158 c/B 5625 GnuPG-bug-id: T6242 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: Support internal generation of IV for AEAD cipher mode.NIIBE Yutaka2022-08-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm.c (_gcry_cipher_gcm_setiv_zero): New. (_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt) (_gcry_cipher_gcm_authenticate): Use _gcry_cipher_gcm_setiv_zero. * cipher/cipher-internal.h (struct gcry_cipher_handle): Add aead field. * cipher/cipher.c (_gcry_cipher_setiv): Check calling setiv to reject direct invocation in FIPS mode. (_gcry_cipher_setup_geniv, _gcry_cipher_geniv): New. * doc/gcrypt.texi: Add explanation for two new functions. * src/gcrypt-int.h (_gcry_cipher_setup_geniv, _gcry_cipher_geniv): New. * src/gcrypt.h.in (enum gcry_cipher_geniv_methods): New. (gcry_cipher_setup_geniv, gcry_cipher_geniv): New. * src/libgcrypt.def (gcry_cipher_setup_geniv, gcry_cipher_geniv): Add. * src/libgcrypt.vers: Likewise. * src/visibility.c (gcry_cipher_setup_geniv, gcry_cipher_geniv): Add. * src/visibility.h: Likewise. -- GnuPG-bug-id: 4873 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementationJussi Kivilinna2022-03-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX512) (GCM_INTEL_AGGR32_TABLE_INITIALIZED): New. (ghash_setup_aggr16_avx2): Store H16 for aggr32 setup. [GCM_USE_INTEL_VPCLMUL_AVX512] (GFMUL_AGGR32_ASM_VPCMUL_AVX512) (gfmul_vpclmul_avx512_aggr32, gfmul_vpclmul_avx512_aggr32_le) (gfmul_pclmul_avx512, gcm_lsh_avx512, load_h1h4_to_zmm1) (ghash_setup_aggr8_avx512, ghash_setup_aggr16_avx512) (ghash_setup_aggr32_avx512, swap128b_perm): New. (_gcry_ghash_setup_intel_pclmul) [GCM_USE_INTEL_VPCLMUL_AVX512]: Enable AVX512 implementation based on HW features. (_gcry_ghash_intel_pclmul, _gcry_polyval_intel_pclmul): Add VPCLMUL/AVX512 code path; Small tweaks to VPCLMUL/AVX2 code path; Tweaks on register clearing. -- Patch adds VPCLMUL/AVX512 accelerated implementation for GHASH (GCM) and POLYVAL (GCM-SIV). Benchmark on Intel Core i3-1115G4: Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.063 ns/B 15200 MiB/s 0.257 c/B 4090 GCM-SIV auth | 0.061 ns/B 15704 MiB/s 0.248 c/B 4090 After (ghash ~41% faster, polyval ~34% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.044 ns/B 21614 MiB/s 0.181 c/B 4096±3 GCM-SIV auth | 0.045 ns/B 21108 MiB/s 0.185 c/B 4097±3 AES128-GCM / AES128-GCM-SIV encryption: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.084 ns/B 11306 MiB/s 0.346 c/B 4097±3 GCM-SIV enc | 0.086 ns/B 11026 MiB/s 0.354 c/B 4096±3 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementationJussi Kivilinna2022-03-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX2) (GCM_INTEL_AGGR8_TABLE_INITIALIZED) (GCM_INTEL_AGGR16_TABLE_INITIALIZED): New. (gfmul_pclmul): Fixes to comments. [GCM_USE_INTEL_VPCLMUL_AVX2] (GFMUL_AGGR16_ASM_VPCMUL_AVX2) (gfmul_vpclmul_avx2_aggr16, gfmul_vpclmul_avx2_aggr16_le) (gfmul_pclmul_avx2, gcm_lsh_avx2, load_h1h2_to_ymm1) (ghash_setup_aggr8_avx2, ghash_setup_aggr16_avx2): New. (_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter; Setup ghash and polyval function pointers for context; Add VPCLMUL/AVX2 code path; Defer aggr8 and aggr16 table initialization to until first use in '_gcry_ghash_intel_pclmul' or '_gcry_polyval_intel_pclmul'. [__x86_64__] (ghash_setup_aggr8): New. (_gcry_ghash_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for aggr8 table initialization. (_gcry_polyval_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for aggr8 table initialization. * cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_intel_pclmul) (_gcry_polyval_intel_pclmul): Remove. [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter. (setupM) [GCM_USE_INTEL_PCLMUL]: Pass HW features to '_gcry_ghash_setup_intel_pclmul'; Let '_gcry_ghash_setup_intel_pclmul' setup function pointers. * cipher/cipher-internal.h (GCM_USE_INTEL_VPCLMUL_AVX2): New. (gcry_cipher_handle): Add member 'gcm.hw_impl_flags'. -- Patch adds VPCLMUL/AVX2 accelerated implementation for GHASH (GCM) and POLYVAL (GCM-SIV). Benchmark on AMD Ryzen 5800X (zen3): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.088 ns/B 10825 MiB/s 0.427 c/B 4850 GCM-SIV auth | 0.083 ns/B 11472 MiB/s 0.403 c/B 4850 After: (~1.93x faster) | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.045 ns/B 21098 MiB/s 0.219 c/B 4850 GCM-SIV auth | 0.043 ns/B 22181 MiB/s 0.209 c/B 4850 AES128-GCM / AES128-GCM-SIV encryption: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.079 ns/B 12073 MiB/s 0.383 c/B 4850 GCM-SIV enc | 0.076 ns/B 12500 MiB/s 0.370 c/B 4850 Benchmark on Intel Core i3-1115G4 (tigerlake): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.080 ns/B 11919 MiB/s 0.327 c/B 4090 GCM-SIV auth | 0.075 ns/B 12643 MiB/s 0.309 c/B 4090 After: (~1.28x faster) | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.062 ns/B 15348 MiB/s 0.254 c/B 4090 GCM-SIV auth | 0.058 ns/B 16381 MiB/s 0.238 c/B 4090 AES128-GCM / AES128-GCM-SIV encryption: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.101 ns/B 9441 MiB/s 0.413 c/B 4090 GCM-SIV enc | 0.098 ns/B 9692 MiB/s 0.402 c/B 4089 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: Add an API to retrieve unwrapped key length for KWP.NIIBE Yutaka2022-01-051-7/+7
| | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-aeswrap.c (_gcry_cipher_keywrap_decrypt) (_gcry_cipher_keywrap_decrypt_padding): Merged into... (_gcry_cipher_keywrap_decrypt_auto): ... this. Write length information to struct gcry_cipher_handle. * cipher/cipher-internal.h (struct gcry_cipher_handle): Add u_mode.wrap. * cipher/cipher.c (_gcry_cipher_setup_mode_ops): Use _gcry_cipher_keywrap_decrypt_auto. (_gcry_cipher_info): Support GCRYCTL_GET_KEYLEN for GCRY_CIPHER_MODE_AESWRAP. Not that it's not length of KEK, but length of unwrapped key. * tests/aeswrap.c (check_one_with_padding): Add check for length of unwrapped key. -- Fixes-commit: 2914f169f95467b9c789000105773b38ad2dea5a GnuPG-bug-id: 5752 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* cipher: Add support of Key wrap with padding (KWP).NIIBE Yutaka2022-01-031-2/+11
| | | | | | | | | | | | | | | | | | | * src/gcrypt.h.in (GCRY_CIPHER_EXTENDED): New enum value. * cipher/cipher-aeswrap.c (wrap): New. (_gcry_cipher_keywrap_encrypt, unwrap): Use wrap. (_gcry_cipher_keywrap_encrypt_padding): New. (_gcry_cipher_keywrap_decrypt): Use unwrap. (_gcry_cipher_keywrap_decrypt_padding): New. * cipher/cipher-internal.h: Add declarations. * cipher/cipher.c (_gcry_cipher_open_internal): Support GCRY_CIPHER_EXTENDED. (_gcry_cipher_setup_mode_ops): Extend for GCRY_CIPHER_MODE_AESWRAP. * tests/aeswrap.c: Add two tests from RFC5649. -- GnuPG-bug-id: 5752 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Add intel-pclmul accelerated POLYVAL for GCM-SIVJussi Kivilinna2021-11-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c (gfmul_pclmul_aggr4) (gfmul_pclmul_aggr8): Move assembly to new GFMUL_AGGRx_ASM* macros. (GFMUL_AGGR4_ASM_1, GFMUL_AGGR4_ASM_2, gfmul_pclmul_aggr4_le) (GFMUL_AGGR8_ASM, gfmul_pclmul_aggr8_le) (_gcry_polyval_intel_pclmul): New. * cipher/cipher-gcm-siv.c (do_polyval_buf): Use polyval function if available. * cipher/cipher-gcm.c (_gcry_polyval_intel_pclmul): New. (setupM): Setup 'c->u_mode.gcm.polyval_fn' with accelerated polyval function if available. * cipher/cipher-internal.h (gcry_cipher_handle): Add member 'u_mode.gcm.polyval_fn'. -- Benchmark on AMD Ryzen 7 5800X: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 0.150 ns/B 6337 MiB/s 0.730 c/B 4849 GCM-SIV dec | 0.163 ns/B 5862 MiB/s 0.789 c/B 4850 GCM-SIV auth | 0.119 ns/B 8022 MiB/s 0.577 c/B 4850 After (enc/dec ~26% faster, auth ~43% faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 0.117 ns/B 8138 MiB/s 0.568 c/B 4850 GCM-SIV dec | 0.128 ns/B 7429 MiB/s 0.623 c/B 4850 GCM-SIV auth | 0.083 ns/B 11507 MiB/s 0.402 c/B 4851 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add x86 HW acceleration for GCM-SIV counter modeJussi Kivilinna2021-08-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-siv.c (do_ctr_le32): Use bulk function if available. * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ctr32le_enc'. * cipher/rijndael-aesni.c (_gcry_aes_aesni_ctr32le_enc): New. * cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ctr32le_enc_amd64, .Lle_addd_*): New. * cipher/rijndael-vaes.c (_gcry_vaes_avx2_ctr32le_enc_amd64) (_gcry_aes_vaes_ctr32le_enc): New. * cipher/rijndael.c (_gcry_aes_aesni_ctr32le_enc) (_gcry_aes_vaes_ctr32le_enc): New prototypes. (do_setkey): Add setup of 'bulk_ops->ctr32le_enc' for AES-NI and VAES. * tests/basic.c (check_gcm_siv_cipher): Add large test-vector for bulk ops testing. -- Counter mode in GCM-SIV is little-endian on first 4 bytes of of counter block, unlike regular CTR mode which works on big-endian full block. Benchmark on AMD Ryzen 7 5800X: Before: AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 1.00 ns/B 953.2 MiB/s 4.85 c/B 4850 GCM-SIV dec | 1.01 ns/B 940.1 MiB/s 4.92 c/B 4850 GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.575 c/B 4850 After (~6x faster): AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM-SIV enc | 0.150 ns/B 6367 MiB/s 0.727 c/B 4850 GCM-SIV dec | 0.161 ns/B 5909 MiB/s 0.783 c/B 4850 GCM-SIV auth | 0.118 ns/B 8051 MiB/s 0.574 c/B 4850 GnuPG-bug-id: T4485 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add AES-GCM-SIV mode (RFC 8452)Jussi Kivilinna2021-08-261-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-siv.c'. * cipher/cipher-gcm-siv.c: New. * cipher/cipher-gcm.c (_gcry_cipher_gcm_setupM): New. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'siv_keylen'. (_gcry_cipher_gcm_setupM, _gcry_cipher_gcm_siv_encrypt) (_gcry_cipher_gcm_siv_decrypt, _gcry_cipher_gcm_siv_set_nonce) (_gcry_cipher_gcm_siv_authenticate) (_gcry_cipher_gcm_siv_set_decryption_tag) (_gcry_cipher_gcm_siv_get_tag, _gcry_cipher_gcm_siv_check_tag) (_gcry_cipher_gcm_siv_setkey): New prototypes. (cipher_block_bswap): New helper function. * cipher/cipher.c (_gcry_cipher_open_internal): Add 'GCRY_CIPHER_MODE_GCM_SIV'; Refactor mode requirement checks for better size optimization (check pointers & blocksize in same order for all). (cipher_setkey, cipher_reset, _gcry_cipher_setup_mode_ops) (_gcry_cipher_setup_mode_ops, _gcry_cipher_info): Add GCM-SIV. (_gcry_cipher_ctl): Handle 'set decryption tag' for GCM-SIV. * doc/gcrypt.texi: Add GCM-SIV. * src/gcrypt.h.in (GCRY_CIPHER_MODE_GCM_SIV): New. (GCRY_SIV_BLOCK_LEN, gcry_cipher_set_decryption_tag): Add to comment that these are also for GCM-SIV in addition to SIV mode. * tests/basic.c (check_gcm_siv_cipher): New. (check_cipher_modes): Check for GCM-SIV. * tests/bench-slope.c (bench_gcm_siv_encrypt_do_bench) (bench_gcm_siv_decrypt_do_bench, bench_gcm_siv_authenticate_do_bench) (gcm_siv_encrypt_ops, gcm_siv_decrypt_ops) (gcm_siv_authenticate_ops): New. (cipher_modes): Add GCM-SIV. (cipher_bench_one): Check key length requirement for GCM-SIV. -- GnuPG-bug-id: T4485 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add SIV mode (RFC 5297)Jussi Kivilinna2021-08-261-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-siv.c'. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Rename to _gcry_cipher_ctr_encrypt_ctx and add algo context parameter. (_gcry_cipher_ctr_encrypt): New using _gcry_cipher_ctr_encrypt_ctx. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.siv'. (_gcry_cipher_ctr_encrypt_ctx, _gcry_cipher_siv_encrypt) (_gcry_cipher_siv_decrypt, _gcry_cipher_siv_set_nonce) (_gcry_cipher_siv_authenticate, _gcry_cipher_siv_set_decryption_tag) (_gcry_cipher_siv_get_tag, _gcry_cipher_siv_check_tag) (_gcry_cipher_siv_setkey): New. * cipher/cipher-siv.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, _gcry_cipher_setup_mode_ops, _gcry_cipher_info): Add GCRY_CIPHER_MODE_SIV handling. (_gcry_cipher_ctl): Add GCRYCTL_SET_DECRYPTION_TAG handling. * doc/gcrypt.texi: Add documentation for SIV mode. * src/gcrypt.h.in (GCRYCTL_SET_DECRYPTION_TAG): New. (GCRY_CIPHER_MODE_SIV): New. (gcry_cipher_set_decryption_tag): New. * tests/basic.c (check_siv_cipher): New. (check_cipher_modes): Add call for 'check_siv_cipher'. * tests/bench-slope.c (bench_encrypt_init): Use double size key for SIV mode. (bench_aead_encrypt_do_bench, bench_aead_decrypt_do_bench) (bench_aead_authenticate_do_bench): Reset cipher context on each run. (bench_aead_authenticate_do_bench): Support nonce-less operation. (bench_siv_encrypt_do_bench, bench_siv_decrypt_do_bench) (bench_siv_authenticate_do_bench, siv_encrypt_ops) (siv_decrypt_ops, siv_authenticate_ops): New. (cipher_modes): Add SIV mode benchmarks. (cipher_bench_one): Restrict SIV mode testing to 16 byte block-size. -- GnuPG-bug-id: T4486 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher-gcm-ppc: add big-endian supportJussi Kivilinna2021-04-011-1/+1
| | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-ppc.c (ALIGNED_16): New. (vec_store_he, vec_load_he): Remove WORDS_BIGENDIAN ifdef. (vec_dup_byte_elem): New. (_gcry_ghash_setup_ppc_vpmsum): Match function declaration with prototype in cipher-gcm.c; Load C2 with VEC_LOAD_BE; Use vec_dup_byte_elem; Align constants to 16 bytes. (_gcry_ghash_ppc_vpmsum): Match function declaration with prototype in cipher-gcm.c; Align constant to 16 bytes. * cipher/cipher-gcm.c (ghash_ppc_vpmsum): Return value from _gcry_ghash_ppc_vpmsum. * cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): Remove requirement for !WORDS_BIGENDIAN. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* VPMSUMD acceleration for GCM mode on PPCShawn Landden2021-03-071-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-ppc.c'. * cipher/cipher-gcm-ppc.c: New. * cipher/cipher-gcm.c [GCM_USE_PPC_VPMSUM] (_gcry_ghash_setup_ppc_vpmsum) (_gcry_ghash_ppc_vpmsum, ghash_setup_ppc_vpsum, ghash_ppc_vpmsum): New. (setupM) [GCM_USE_PPC_VPMSUM]: Select ppc-vpmsum implementation if HW feature "ppc-vcrypto" is available. * cipher/cipher-internal.h (GCM_USE_PPC_VPMSUM): New. (gcry_cipher_handle): Move 'ghash_fn' at end of 'gcm' block to align 'gcm_table' to 16 bytes. * configure.ac: Add 'cipher-gcm-ppc.lo'. * tests/basic.c (_check_gcm_cipher): New AES256 test vector. * AUTHORS: Add 'CRYPTOGAMS'. * LICENSES: Add original license to 3-clause-BSD section. -- https://dev.gnupg.org/D501: 10-20X speed. However this Power 9 machine is faster than the last Power 9 benchmarks on the optimized versions, so while better than the last patch, it is not all due to the code. Before: GCM enc | 4.23 ns/B 225.3 MiB/s - c/B GCM dec | 3.58 ns/B 266.2 MiB/s - c/B GCM auth | 3.34 ns/B 285.3 MiB/s - c/B After: GCM enc | 0.370 ns/B 2578 MiB/s - c/B GCM dec | 0.371 ns/B 2571 MiB/s - c/B GCM auth | 0.159 ns/B 6003 MiB/s - c/B Signed-off-by: Shawn Landden <shawn@git.icu> [jk: coding style fixes, Makefile.am integration, patch from Differential to git, commit changelog, fixed few compiler warnings] GnuPG-bug-id: 5040 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add bulk AES-GCM acceleration for s390x/zSeriesJussi Kivilinna2020-12-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'asm-inline-s390x.h'. * cipher/asm-inline-s390x.h: New. * cipher/cipher-gcm.c [GCM_USE_S390X_CRYPTO] (ghash_s390x_kimd): New. (setupM) [GCM_USE_S390X_CRYPTO]: Add setup for s390x GHASH function. * cipher/cipher-internal.h (GCM_USE_S390X_CRYPTO): New. * cipher/rijndael-s390x.c (u128_t, km_functions_e): Move to 'asm-inline-s390x.h'. (aes_s390x_gcm_crypt): New. (_gcry_aes_s390x_setup_acceleration): Use 'km_function_to_mask'; Add setup for GCM bulk function. -- This patch adds zSeries acceleration for GHASH and AES-GCM. Benchmarks (z15, 5.2Ghz): Before: AES | nanosecs/byte mebibytes/sec cycles/byte GCM enc | 2.64 ns/B 361.6 MiB/s 13.71 c/B GCM dec | 2.64 ns/B 361.3 MiB/s 13.72 c/B GCM auth | 2.58 ns/B 370.1 MiB/s 13.40 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte GCM enc | 0.059 ns/B 16066 MiB/s 0.309 c/B GCM dec | 0.059 ns/B 16114 MiB/s 0.308 c/B GCM auth | 0.057 ns/B 16747 MiB/s 0.296 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add bulk function interface for GCM modeJussi Kivilinna2020-12-181-0/+2
| | | | | | | | | | | | * cipher/cipher-gcm.c (do_ghash_buf): Proper handling for the case where 'unused' gets filled to full blocksize. (gcm_crypt_inner): New. (_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt): Use 'gcm_crypt_inner'. * cipher/cipher-internal.h (cipher_bulk_ops_t): Add 'gcm_crypt'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add bulk function interface for OFB modeJussi Kivilinna2020-12-181-0/+2
| | | | | | | | | | * cipher/cipher-internal.h (cipher_bulk_ops): Add 'ofb_enc'. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt): Use bulk encryption function if defined. * cipher/basic.c (check_bulk_cipher_modes): Add OFB-AES test vectors. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: setup bulk functions at each algorithms key setupJussi Kivilinna2020-09-271-45/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_mode_ops_t, cipher_bulk_ops_t): New. (gcry_cipher_handle): Define members 'mode_ops' and 'bulk' using new types. * cipher/cipher.c (_gcry_cipher_open_internal): Remove bulk function setup. (cipher_setkey): Pass context bulk function pointer to algorithm setkey function. * cipher/cipher-selftest.c (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk function parameter; Use bulk function returned by setkey function. * cipher/cipher-selftest.h (_gcry_selftest_helper_cbc) (_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk function parameter. * cipher/arcfour.c (arcfour_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/blowfish.c (bf_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (_gcry_blowfish_cfb_dec): Make static. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. (selftest): Pass 'bulk_ops' to setkey function. * cipher/camellia.c (camellia_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec) (_gcry_camellia_cfb_dec, _gcry_camellia_ocb_crypt) (_gcry_camellia_ocb_auth): Make static. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. (selftest): Pass 'bulk_ops' to setkey function. * cipher/cast5.c (cast_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Make static. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. (selftest): Pass 'bulk_ops' to setkey function. * cipher/chacha20.c (chacha20_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/cast5.c (do_tripledes_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec): Make static. (bulk_selftest_setkey): Change 'hd' parameter to 'bulk_ops'. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. (do_des_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/gost28147.c (gost_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/idea.c (idea_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/rfc2268.c (do_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/rijndael.c (do_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (rijndael_setkey): Change 'hd' parameter to 'bulk_ops'. (_gcry_aes_cfb_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_enc) (_gcry_aes_cbc_dec, _gcry_aes_ctr_enc, _gcry_aes_ocb_crypt) (_gcry_aes_ocb_auth, _gcry_aes_xts_crypt): Make static. (selftest_basic_128, selftest_basic_192, selftest_basic_256): Pass 'bulk_ops' to setkey function. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. * cipher/salsa20.c (salsa20_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/seed.c (seed_setkey): Change 'hd' parameter to 'bulk_ops'. * cipher/serpent.c (serpent_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec, _gcry_serpent_cfb_dec) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Make static. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass bulk function to selftest helper. * cipher/sm4.c (sm4_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Make static. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass bulk function to selftest helper. * cipher/twofish.c (twofish_setkey): Change 'hd' parameter to 'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions. (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec) (_gcry_twofish_cfb_dec, _gcry_twofish_ocb_crypt) (_gcry_twofish_ocb_auth): Make static. (selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function to selftest helper. (selftest, main): Pass 'bulk_ops' to setkey function. * src/cipher-proto.h: Forward declare 'cipher_bulk_ops_t'. (gcry_cipher_setkey_t): Replace 'hd' with 'bulk_ops'. * src/cipher.h: Remove bulk acceleration function prototypes for 'aes', 'blowfish', 'cast5', 'camellia', '3des', 'serpent', 'sm4' and 'twofish'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add gcry_cipher_ctl command to allow weak keys in testing use-casesJussi Kivilinna2020-02-021-0/+1
| | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Add 'marks.allow_weak_key' flag. * cipher/cipher.c (cipher_setkey): Do not handle weak key as error when weak keys are allowed. (cipher_reset): Preserve 'marks.allow_weak_key' flag on object reset. (_gcry_cipher_ctl): Add handling for GCRYCTL_SET_ALLOW_WEAK_KEY. * src/gcrypt.h.in (gcry_ctl_cmds): Add GCRYCTL_SET_ALLOW_WEAK_KEY. * tests/basic.c (check_ecb_cipher): Add tests for weak key errors and for GCRYCTL_SET_ALLOW_WEAK_KEY. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Optimizations for generic table-based GCM implementationsJussi Kivilinna2019-04-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm.c [GCM_TABLES_USE_U64] (do_fillM): Precalculate M[32..63] values. [GCM_TABLES_USE_U64] (do_ghash): Split processing of two 64-bit halfs of the input to two separate loops; Use precalculated M[] values. [GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_fillM): Precalculate M[64..127] values. [GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_ghash): Use precalculated M[] values. [GCM_USE_TABLES] (bshift): Avoid conditional execution for mask calculation. * cipher/cipher-internal.h (gcry_cipher_handle): Double gcm_table size. -- Benchmark on Intel Haswell (amd64, --disable-hwf all): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 2.79 ns/B 341.3 MiB/s 11.17 c/B 3998 After (~36% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 2.05 ns/B 464.7 MiB/s 8.20 c/B 3998 Benchmark on Intel Haswell (win32, --disable-hwf all): Before: | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 4.90 ns/B 194.8 MiB/s 19.57 c/B 3997 After (~36% faster): | nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 3.58 ns/B 266.4 MiB/s 14.31 c/B 3999 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add helper function for adding value to cipher blockJussi Kivilinna2019-03-311-0/+23
| | | | | | | | | | | | | | | | * cipher/cipher-internal.h (cipher_block_add): New. * cipher/blowfish.c (_gcry_blowfish_ctr_enc): Use new helper function for CTR block increment. * cipher/camellia-glue.c (_gcry_camellia_ctr_enc): Ditto. * cipher/cast5.c (_gcry_cast5_ctr_enc): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto. * cipher/des.c (_gcry_3des_ctr_enc): Ditto. * cipher/rijndael.c (_gcry_aes_ctr_enc): Ditto. * cipher/serpent.c (_gcry_serpent_ctr_enc): Ditto. * cipher/twofish.c (_gcry_twofish_ctr_enc): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv7/NEON accelerated GCM implementationJussi Kivilinna2019-03-231-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-armv7-neon.S'. * cipher/cipher-gcm-armv7-neon.S: New. * cipher/cipher-gcm.c [GCM_USE_ARM_NEON] (_gcry_ghash_setup_armv7_neon) (_gcry_ghash_armv7_neon, ghash_setup_armv7_neon) (ghash_armv7_neon): New. (setupM) [GCM_USE_ARM_NEON]: Use armv7/neon implementation if have HWF_ARM_NEON. * cipher/cipher-internal.h (GCM_USE_ARM_NEON): New. -- Benchmark on Cortex-A53 (816 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 34.81 ns/B 27.40 MiB/s 28.41 c/B After (3.0x faster): | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 11.49 ns/B 82.99 MiB/s 9.38 c/B Reported-by: Yuriy M. Kaminskiy <yumkam@gmail.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Do not precalculate OCB offset L0+L1+L0Jussi Kivilinna2019-01-271-1/+0
| | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Remove OCB L0L1L0. * cipher/cipher-ocb.c (_gcry_cipher_ocb_setkey): Ditto. * cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec) (_gcry_aes_aesni_ocb_auth): Replace L0L1L0 use with L1. -- Patch fixes L0+L1+L0 thinko. This is same as L1 (L0 xor L1 xor L0). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Calculate OCB L-tables when setting key instead of when setting nonceJussi Kivilinna2019-01-271-0/+6
| | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Mark areas of u_mode.ocb that are and are not cleared by gcry_cipher_reset. (_gcry_cipher_ocb_setkey): New. * cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Split L-table generation to ... (_gcry_cipher_ocb_setkey): ... this new function. * cipher/cipher.c (cipher_setkey): Add handling for OCB mode. (cipher_reset): Do not clear L-values for OCB mode. -- OCB L-tables do not depend on nonce value, but only on cipher key. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementationsJussi Kivilinna2019-01-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/asm-poly1305-amd64.h: New. * cipher/Makefile.am: Add 'asm-poly1305-amd64.h'. * cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave operators. (_gcry_chacha20_poly1305_amd64_avx2_blocks8): New. * cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave operators. (_gcry_chacha20_poly1305_amd64_ssse3_blocks4) (_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New. * cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4) (_gcry_chacha20_poly1305_amd64_ssse3_blocks1) (_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes. (chacha20_encrypt_stream): Split tail to... (do_chacha20_encrypt_stream_tail): ... new function. (_gcry_chacha20_poly1305_encrypt) (_gcry_chacha20_poly1305_decrypt): New. * cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt) (_gcry_chacha20_poly1305_decrypt): New prototypes. * cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call '_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20. (_gcry_cipher_poly1305_decrypt): Call '_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20. * cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New prototype. * cipher/poly1305.c (poly1305_blocks): Make static. (_gcry_poly1305_update): Split main function body to ... (_gcry_poly1305_update_burn): ... new function. -- Benchmark on Intel Skylake (i5-6500, 3200 Mhz): Before, 8-way AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.378 ns/B 2526 MiB/s 1.21 c/B STREAM dec | 0.373 ns/B 2560 MiB/s 1.19 c/B POLY1305 enc | 0.685 ns/B 1392 MiB/s 2.19 c/B POLY1305 dec | 0.686 ns/B 1390 MiB/s 2.20 c/B POLY1305 auth | 0.315 ns/B 3031 MiB/s 1.01 c/B After, 8-way AVX2 (~36% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.503 ns/B 1896 MiB/s 1.61 c/B POLY1305 dec | 0.485 ns/B 1965 MiB/s 1.55 c/B Benchmark on Intel Haswell (i7-4790K, 3998 Mhz): Before, 8-way AVX2: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.318 ns/B 2999 MiB/s 1.27 c/B STREAM dec | 0.317 ns/B 3004 MiB/s 1.27 c/B POLY1305 enc | 0.586 ns/B 1627 MiB/s 2.34 c/B POLY1305 dec | 0.586 ns/B 1627 MiB/s 2.34 c/B POLY1305 auth | 0.271 ns/B 3524 MiB/s 1.08 c/B After, 8-way AVX2 (~30% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.452 ns/B 2108 MiB/s 1.81 c/B POLY1305 dec | 0.440 ns/B 2167 MiB/s 1.76 c/B Before, 4-way SSSE3: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.627 ns/B 1521 MiB/s 2.51 c/B STREAM dec | 0.626 ns/B 1523 MiB/s 2.50 c/B POLY1305 enc | 0.895 ns/B 1065 MiB/s 3.58 c/B POLY1305 dec | 0.896 ns/B 1064 MiB/s 3.58 c/B POLY1305 auth | 0.271 ns/B 3521 MiB/s 1.08 c/B After, 4-way SSSE3 (~20% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 0.733 ns/B 1301 MiB/s 2.93 c/B POLY1305 dec | 0.726 ns/B 1314 MiB/s 2.90 c/B Before, 1-way SSSE3: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.56 ns/B 609.6 MiB/s 6.25 c/B POLY1305 dec | 1.56 ns/B 609.4 MiB/s 6.26 c/B After, 1-way SSSE3 (~18% faster): CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte POLY1305 enc | 1.31 ns/B 725.4 MiB/s 5.26 c/B POLY1305 dec | 1.31 ns/B 727.3 MiB/s 5.24 c/B For comparison to other libraries (on Intel i7-4790K, 3998 Mhz): bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018 Cipher: chacha20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.301 ns/B 3166.4 MiB/s 1.20 c/B STREAM dec | 0.300 ns/B 3174.7 MiB/s 1.20 c/B POLY1305 enc | 0.463 ns/B 2060.6 MiB/s 1.85 c/B POLY1305 dec | 0.462 ns/B 2063.8 MiB/s 1.85 c/B POLY1305 auth | 0.162 ns/B 5899.3 MiB/s 0.646 c/B bench-slope-nettle: Nettle 3.4 Cipher: chacha | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 1.65 ns/B 578.2 MiB/s 6.59 c/B STREAM dec | 1.65 ns/B 578.2 MiB/s 6.59 c/B POLY1305 enc | 2.05 ns/B 464.8 MiB/s 8.20 c/B POLY1305 dec | 2.05 ns/B 464.7 MiB/s 8.20 c/B POLY1305 auth | 0.404 ns/B 2359.1 MiB/s 1.62 c/B bench-slope-botan: Botan 2.6.0 Cipher: ChaCha | nanosecs/byte mebibytes/sec cycles/byte STREAM enc/dec | 0.855 ns/B 1116.0 MiB/s 3.42 c/B POLY1305 enc | 1.60 ns/B 595.4 MiB/s 6.40 c/B POLY1305 dec | 1.60 ns/B 595.8 MiB/s 6.40 c/B POLY1305 auth | 0.752 ns/B 1268.3 MiB/s 3.01 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Optimizations for AES-NI OCBJussi Kivilinna2018-11-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB values L0L1 and L0L1L0; Swap dimensions for OCB L table. * cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and L0L1L0 values. (ocb_crypt): Process input in 24KiB chunks for better cache locality for checksumming. * cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always inlining functions, change all functions with 'inline' to use ALWAYS_INLINE. (NO_INLINE): New macro. (aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to... (aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and adjust accordingly (xmm7 moved from *_7_15 to *_2_7). (aesni_prepare_2_6, aesni_prepare_7_15): Rename to... (aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust accordingly. (aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to... (aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust accordingly. (aesni_ocb_checksum): New. (aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum calculation as separate pass instead of inline; Use NO_INLINE. (_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel with help of pre-computed offsets L0+L1 ja L0+L1+L0. * cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add 'use_avx2' and 'use_avx'. * cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW feature is available. * tests/basic.c (do_check_ocb_cipher): New test vector; increase size of temporary buffers for new test vector. (check_ocb_cipher_largebuf_split): Make test plaintext non-uniform for better checksum testing. (check_ocb_cipher_checksum): New. (check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum. (check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf test runs. -- Benchmark on Haswell i7-4970k @ 4.0Ghz: Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 0.175 ns/B 5436 MiB/s 0.702 c/B OCB dec | 0.184 ns/B 5184 MiB/s 0.736 c/B OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B After (enc +2% faster, dec +7% faster): OCB enc | 0.172 ns/B 5547 MiB/s 0.688 c/B OCB dec | 0.171 ns/B 5582 MiB/s 0.683 c/B OCB auth | 0.156 ns/B 6097 MiB/s 0.626 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add size optimized cipher block copy and xor functionsJussi Kivilinna2018-07-211-0/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/bufhelp.h (buf_get_he32, buf_put_he32, buf_get_he64) (buf_put_he64): New. * cipher/cipher-internal.h (cipher_block_cpy, cipher_block_xor) (cipher_block_xor_1, cipher_block_xor_2dst, cipher_block_xor_n_copy_2) (cipher_block_xor_n_copy): New. * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_setup_intel_pclmul): Use assembly for swapping endianness instead of buf_get_be64 and buf_cpy. * cipher/blowfish.c: Use new cipher_block_* functions for cipher block sized buf_cpy/xor* operations. * cipher/camellia-glue.c: Ditto. * cipher/cast5.c: Ditto. * cipher/cipher-aeswrap.c: Ditto. * cipher/cipher-cbc.c: Ditto. * cipher/cipher-ccm.c: Ditto. * cipher/cipher-cfb.c: Ditto. * cipher/cipher-cmac.c: Ditto. * cipher/cipher-ctr.c: Ditto. * cipher/cipher-eax.c: Ditto. * cipher/cipher-gcm.c: Ditto. * cipher/cipher-ocb.c: Ditto. * cipher/cipher-ofb.c: Ditto. * cipher/cipher-xts.c: Ditto. * cipher/des.c: Ditto. * cipher/rijndael.c: Ditto. * cipher/serpent.c: Ditto. * cipher/twofish.c: Ditto. -- This commit adds size-optimized functions for copying and xoring cipher block sized buffers. These functions also allow GCC to use inline auto-vectorization for block cipher copying and xoring on higher optimization levels. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Access cipher mode routines through routine pointersJussi Kivilinna2018-06-191-2/+24
| | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Add function pointers for mode operations. (_gcry_cipher_xts_crypt): Remove. (_gcry_cipher_xts_encrypt, _gcry_cipher_xts_decrypt): New. * cipher/cipher-xts.c (_gcry_cipher_xts_encrypt) (_gcry_cipher_xts_decrypt): New. * cipher/cipher.c (_gcry_cipher_setup_mode_ops): New. (_gcry_cipher_open_internal): Setup mode routines. (cipher_encrypt, cipher_decrypt): Remove. (do_stream_encrypt, do_stream_decrypt, do_encrypt_none_unknown) (do_decrypt_none_unknown): New. (_gcry_cipher_encrypt, _gcry_cipher_decrypt, _gcry_cipher_setiv) (_gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag): Adapted to use mode routines through pointers. -- Change to use mode operations through pointers to reduce per call overhead for cipher operations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add separate handlers for CBC-CTS variantJussi Kivilinna2018-06-191-0/+8
| | | | | | | | | | | | | | | | | * cipher/cipher-cbc.c (cbc_encrypt_inner, cbc_decrypt_inner) (_gcry_cipher_cbc_cts_encrypt, _gcry_cipher_cbc_cts_decrypt): New. (_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt): Remove CTS handling. * cipher/cipher-internal.h (_gcry_cipher_cbc_cts_encrypt) (_gcry_cipher_cbc_cts_decrypt): New. * cipher/cipher.c (cipher_encrypt, cipher_decrypt): Call CBC-CTS handler if CBC-CTS flag is set. -- Separate CTS handling to separate function for small decrease in CBC per call overhead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Avoid division by spec->blocksize in cipher mode handlersJussi Kivilinna2018-06-191-0/+10
| | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (_gcry_blocksize_shift): New. * cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt) (_gcry_cipherp_cbc_decrypt): Use bit-level operations instead of division to get number of blocks and check input length against blocksize. * cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) (_gcry_cipher_cfb_decrypt): Ditto. * cipher/cipher-cmac.c (_gcry_cmac_write): Ditto. * cipher/cipher-ctr.c (_gcry_cipher_ctr_crypt): Ditto. * cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt) (_gcry_cipher_ofb_decrypt): Ditto. -- Integer division was causing 10 to 20 cycles per call overhead for cipher modes on x86-64. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add EAX modeJussi Kivilinna2018-01-201-7/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-eax.c'. * cipher/cipher-cmac.c (cmac_write): Rename to ... (_gcry_cmac_write): ... this; Take CMAC context as new input parameter; Return error code. (cmac_generate_subkeys): Rename to ... (_gcry_cmac_generate_subkeys): ... this; Take CMAC context as new input parameter; Return error code. (cmac_final): Rename to ... (_gcry_cmac_final): ... this; Take CMAC context as new input parameter; Return error code. (cmac_tag): Take CMAC context as new input parameter. (_gcry_cmac_reset): New. (_gcry_cipher_cmac_authenticate): Remove duplicate tag flag check; Adapt to changes above. (_gcry_cipher_cmac_get_tag): Adapt to changes above. (_gcry_cipher_cmac_check_tag): Ditto. (_gcry_cipher_cmac_set_subkeys): Ditto. * cipher-eax.c: New. * cipher-internal.h (gcry_cmac_context_t): New. (gcry_cipher_handle): Update u_mode.cmac; Add u_mode.eax. (_gcry_cmac_write, _gcry_cmac_generate_subkeys, _gcry_cmac_final) (_gcry_cmac_reset, _gcry_cipher_eax_encrypt, _gcry_cipher_eax_decrypt) (_gcry_cipher_eax_set_nonce, _gcry_cipher_eax_authenticate) (_gcry_cipher_eax_get_tag, _gcry_cipher_eax_check_tag) (_gcry_cipher_eax_setkey): New prototypes. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv) (_gcry_cipher_authenticate, _gcry_cipher_gettag, _gcry_cipher_checktag) (_gcry_cipher_info): Add EAX mode. * doc/gcrypt.texi: Add EAX mode. * src/gcrypt.h.in (GCRY_CIPHER_MODE_EAX): New. * tests/basic.c (_check_gcm_cipher, _check_poly1305_cipher): Constify test vectors array. (_check_eax_cipher, check_eax_cipher): New. (check_ciphers, check_cipher_modes): Add EAX mode. * tests/bench-slope.c (bench_eax_encrypt_do_bench) (bench_eax_decrypt_do_bench, bench_eax_authenticate_do_bench) (eax_encrypt_ops, eax_decrypt_ops, eax_authenticate_ops): New. (cipher_modes): Add EAX mode. * tests/benchmark.c (cipher_bench): Add EAX mode. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add AES-NI acceleration for AES-XTSJussi Kivilinna2018-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Change bulk XTS function to take cipher context. * cipher/cipher-xts.c (_gcry_cipher_xts_crypt): Ditto. * cipher/cipher.c (_gcry_cipher_open_internal): Setup AES-NI XTS bulk function. * cipher/rijndael-aesni.c (xts_gfmul_const, _gcry_aes_aesni_xts_enc) (_gcry_aes_aesni_xts_enc, _gcry_aes_aesni_xts_crypt): New. * cipher/rijndael.c (_gcry_aes_aesni_xts_crypt) (_gcry_aes_xts_crypt): New. * src/cipher.h (_gcry_aes_xts_crypt): New. -- Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo): Before: XTS enc | 1.66 ns/B 575.7 MiB/s 6.63 c/B XTS dec | 1.66 ns/B 575.5 MiB/s 6.63 c/B After (~6x faster): XTS enc | 0.270 ns/B 3528.5 MiB/s 1.08 c/B XTS dec | 0.272 ns/B 3511.5 MiB/s 1.09 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Spelling fixes in docs and comments.NIIBE Yutaka2017-04-281-1/+1
| | | | | | | | -- GnuPG-bug-id: 3120 Reported-by: ka7 (klemens) Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
* Implement CFB with 8-bit modeMathias L. Baumann2017-02-041-0/+8
| | | | | | | | | | | | | | | * cipher/cipher-cfb.c (_gcry_cipher_cfb8_encrypt) (_gcry_cipher_cfg8_decrypt): Add 8-bit variants of decrypt/encrypt functions. * cipher/cipher-internal.h (_gcry_cipher_cfb8_encrypt) (_gcry_cipher_cfg8_decrypt): Ditto. * cipher/cipher.c: Adjust code flow to work with GCRY_CIPHER_MODE_CFB8. * tests/basic.c: Add tests for cfb8 with AES and 3DES. -- Signed-off-by: Mathias L. Baumann <mathias.baumann at sociomantic.com> [JK: edit changelog, fix email malformed patch] Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add XTS cipher modeJussi Kivilinna2017-01-061-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-xts.c'. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'bulk.xts_crypt' and 'u_mode.xts' members. (_gcry_cipher_xts_crypt): New prototype. * cipher/cipher-xts.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt): Add XTS mode handling. * doc/gcrypt.texi: Add XTS mode to documentation. * src/gcrypt.h.in (GCRY_CIPHER_MODE_XTS, GCRY_XTS_BLOCK_LEN): New. * tests/basic.c (do_check_xts_cipher, check_xts_cipher): New. (check_bulk_cipher_modes): Add XTS test-vectors. (check_one_cipher_core, check_one_cipher, check_ciphers): Add XTS testing support. (check_cipher_modes): Add XTS test. * tests/bench-slope.c (bench_xts_encrypt_init) (bench_xts_encrypt_do_bench, bench_xts_decrypt_do_bench) (xts_encrypt_ops, xts_decrypt_ops): New. (cipher_modes, cipher_bench_one): Add XTS. * tests/benchmark.c (cipher_bench): Add XTS testing. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* OCB: Move large L handling from bottom to upper levelJussi Kivilinna2016-12-101-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-ocb.c (_gcry_cipher_ocb_get_l): Remove. (ocb_get_L_big): New. (_gcry_cipher_ocb_authenticate): L-big handling done in upper processing loop, so that lower level never sees the case where 'aad_nblocks % 65536 == 0'; Add missing stack burn. (ocb_aad_finalize): Add missing stack burn. (ocb_crypt): L-big handling done in upper processing loop, so that lower level never sees the case where 'data_nblocks % 65536 == 0'. * cipher/cipher-internal.h (_gcry_cipher_ocb_get_l): Remove. (ocb_get_l): Remove 'l_tmp' usage and simplify since input is more limited now, 'N is not multiple of 65536'. * cipher/rijndael-aesni.c (get_l): Remove. (aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Remove l_tmp; Use 'ocb_get_l'. * cipher/rijndael-ssse3-amd64.c (get_l): Remove. (ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Remove l_tmp; Use 'ocb_get_l'. * cipher/camellia-glue.c: Remove OCB l_tmp usage. * cipher/rijndael-armv8-ce.c: Ditto. * cipher/rijndael.c: Ditto. * cipher/serpent.c: Ditto. * cipher/twofish.c: Ditto. -- Move large L value generation to up-most level to simplify lower level ocb_get_l for greater performance and simpler implementation. This helps implementing OCB in assembly as 'ocb_get_l' no longer has function call on slow-path. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/AArch64 Crypto Extension implementation of GCMJussi Kivilinna2016-09-051-0/+4
| | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch64-ce.S'. * cipher/cipher-gcm-armv8-aarch64-ce.S: New. * cipher/cipher-internal.h (GCM_USE_ARM_PMULL): Enable on ARMv8/AArch64. -- Benchmark on Cortex-A53 (1152 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 15.54 ns/B 61.36 MiB/s 17.91 c/B After (11.9x faster): | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 1.30 ns/B 731.5 MiB/s 1.50 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add ARMv8/AArch32 Crypto Extension implementation of GCMJussi Kivilinna2016-07-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-armv8-aarch32-ce.S'. * cipher/cipher-gcm-armv8-aarch32-ce.S: New. * cipher/cipher-gcm.c [GCM_USE_ARM_PMULL] (_gcry_ghash_setup_armv8_ce_pmull, _gcry_ghash_armv8_ce_pmull) (ghash_setup_armv8_ce_pmull, ghash_armv8_ce_pmull): New. (setupM) [GCM_USE_ARM_PMULL]: Enable ARM PMULL implementation if HWF_ARM_PULL HW feature flag is enabled. * cipher/cipher-gcm.h (GCM_USE_ARM_PMULL): New. -- Benchmark on Cortex-A53 (1152 Mhz): Before: | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 24.10 ns/B 39.57 MiB/s 27.76 c/B After (~26x faster): | nanosecs/byte mebibytes/sec cycles/byte GMAC_AES | 0.924 ns/B 1032.2 MiB/s 1.06 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* cipher: Buffer data from gcry_cipher_authenticate in OCB mode.Werner Koch2016-04-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Add fields aad_leftover and aad_nleftover to u_mode.ocb. * cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Clear aad_nleftover. (_gcry_cipher_ocb_authenticate): Add buffering and facor some code out to ... (ocb_aad_finalize): new. (compute_tag_if_needed): Call new function. * tests/basic.c (check_ocb_cipher_splitaad): New. (check_ocb_cipher): Call new function. (main): Also call check_cipher_modes with --ciper-modes. -- It is more convenient to not require full blocks for gcry_cipher_authenticate. Other modes than OCB do this as well. Note that the size of the context structure is not increased because other modes require more context data. Signed-off-by: Werner Koch <wk@gnupg.org>
* Always require a 64 bit integer typeWerner Koch2016-03-181-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * configure.ac (available_digests_64): Merge with available_digests. (available_kdfs_64): Merge with available_kdfs. <64 bit datatype test>: Bail out if no such type is available. * src/types.h: Emit #error if no u64 can be defined. (PROPERLY_ALIGNED_TYPE): Always add u64 type. * cipher/bithelp.h: Remove all code paths which handle the case of !HAVE_U64_TYPEDEF. * cipher/bufhelp.h: Ditto. * cipher/cipher-ccm.c: Ditto. * cipher/cipher-gcm.c: Ditto. * cipher/cipher-internal.h: Ditto. * cipher/cipher.c: Ditto. * cipher/hash-common.h: Ditto. * cipher/md.c: Ditto. * cipher/poly1305.c: Ditto. * cipher/scrypt.c: Ditto. * cipher/tiger.c: Ditto. * src/g10lib.h: Ditto. * tests/basic.c: Ditto. * tests/bench-slope.c: Ditto. * tests/benchmark.c: Ditto. -- Given that SHA-2 and some other algorithms require a 64 bit type it does not make anymore sense to conditionally compile some part when the platform does not provide such a type. GnuPG-bug-id: 1815. Signed-off-by: Werner Koch <wk@gnupg.org>
* Optimize OCB offset calculationJussi Kivilinna2015-08-101-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (ocb_get_l): New. * cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate) (ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'. * cipher/camellia-glue.c (get_l): Remove. (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate offset array when block count matches parallel operation size; Use 'ocb_get_l' instead of 'get_l'. * cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common offsets. (aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate offset array when block count matches parallel operation size. * cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most common offsets. * cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'. * cipher/serpent.c (get_l): Remove. (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate offset array when block count matches parallel operation size; Use 'ocb_get_l' instead of 'get_l'. * cipher/twofish.c (get_l): Remove. (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l' instead of 'get_l'. -- Patch optimizes OCB offset calculation for generic code and assembly implementations with parallel block processing. Benchmark of OCB AES-NI on Intel Haswell: $ tests/bench-slope --cpu-mhz 3201 cipher aes Before: AES | nanosecs/byte mebibytes/sec cycles/byte CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Reduce amount of duplicated code in OCB bulk implementationsJussi Kivilinna2015-07-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate) (ocb_crypt): Change bulk function to return number of unprocessed blocks. * src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth) (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth) (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type to 'size_t'. * cipher/camellia-glue.c (get_l): Only if USE_AESNI_AVX or USE_AESNI_AVX2 defined. (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_AESNI_AVX or USE_AESNI_AVX2 defined; Remove unaccelerated common code. * cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Change return type to 'size_t' and return zero. * cipher/serpent.c (get_l): Only if USE_SSE2, USE_AVX2 or USE_NEON defined. (_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_SSE2, USE_AVX2 or USE_NEON defined; Remove unaccelerated common code. * cipher/twofish.c (get_l): Only if USE_AMD64_ASM defined. (_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type to 'size_t' and return remaining blocks; Remove unaccelerated common code path. Enable remaining common code only if USE_AMD64_ASM defined; Remove unaccelerated common code. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Enable AES/AES-NI, AES/SSSE3 and GCM/PCLMUL implementations on WIN64Jussi Kivilinna2015-05-011-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) ( _gcry_ghash_intel_pclmul) [__WIN64__]: Store non-volatile vector registers before use and restore after. * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Remove dependency on !defined(__WIN64__). * cipher/rijndael-aesni.c [__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare, aesni_prepare_2_6, aesni_cleanup) ( aesni_cleanup_2_6): New. [!__WIN64__] (aesni_prepare_2_6_variable, aesni_prepare_2_6): New. (_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_cbc_enc) (_gcry_aesni_ctr_enc, _gcry_aesni_cfb_dec, _gcry_aesni_cbc_dec) (_gcry_aesni_ocb_crypt, _gcry_aesni_ocb_auth): Use 'aesni_prepare_2_6'. * cipher/rijndael-internal.h (USE_SSSE3): Enable if HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS or HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS. (USE_AESNI): Remove dependency on !defined(__WIN64__) * cipher/rijndael-ssse3-amd64.c [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare, vpaes_ssse3_cleanup): New. [!HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (vpaes_ssse3_prepare): New. (vpaes_ssse3_prepare_enc, vpaes_ssse3_prepare_dec): Use 'vpaes_ssse3_prepare'. (_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption): Use 'vpaes_ssse3_prepare' and 'vpaes_ssse3_cleanup'. [HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS] (X): Add masking macro to exclude '.type' and '.size' markers from assembly code, as they are not support on WIN64/COFF objects. * configure.ac (gcry_cv_gcc_attribute_ms_abi) (gcry_cv_gcc_attribute_sysv_abi, gcry_cv_gcc_default_abi_is_ms_abi) (gcry_cv_gcc_default_abi_is_sysv_abi) (gcry_cv_gcc_win64_platform_as_ok): New checks. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Disable GCM and AES-NI assembly implementations for WIN64Jussi Kivilinna2015-05-011-1/+3
| | | | | | | | | * cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): Do not enable when __WIN64__ defined. * cipher/rijndael-internal.h (USE_AESNI): Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add OCB bulk crypt/auth functions for AES/AES-NIJussi Kivilinna2015-04-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-internal.h (gcry_cipher_handle): Add bulk.ocb_crypt and bulk.ocb_auth. (_gcry_cipher_ocb_get_l): New prototype. * cipher/cipher-ocb.c (get_l): Rename to ... (_gcry_cipher_ocb_get_l): ... this. (_gcry_cipher_ocb_authenticate, ocb_crypt): Use bulk function when available. * cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk functions for AES. * cipher/rijndael-aesni.c (get_l, aesni_ocb_enc, aes_ocb_dec) (_gcry_aes_aesni_ocb_crypt, _gcry_aes_aesni_ocb_auth): New. * cipher/rijndael.c [USE_AESNI] (_gcry_aes_aesni_ocb_crypt) (_gcry_aes_aesni_ocb_auth): New prototypes. (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New. * src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): New prototypes. * tests/basic.c (check_ocb_cipher_largebuf): New. (check_ocb_cipher): Add large buffer encryption/decryption test. -- Patch adds bulk encryption/decryption/authentication code for AES-NI accelerated AES. Benchmark on Intel i5-4570 (3200 Mhz, turbo off): Before: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 2.12 ns/B 449.7 MiB/s 6.79 c/B OCB dec | 2.12 ns/B 449.6 MiB/s 6.79 c/B OCB auth | 2.07 ns/B 459.9 MiB/s 6.64 c/B After: AES | nanosecs/byte mebibytes/sec cycles/byte OCB enc | 0.292 ns/B 3262.5 MiB/s 0.935 c/B OCB dec | 0.297 ns/B 3212.2 MiB/s 0.950 c/B OCB auth | 0.260 ns/B 3666.1 MiB/s 0.832 c/B Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add OCB cipher modeWerner Koch2015-01-161-2/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-ocb.c: New. * cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c * cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New. (gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb. * cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode. (_gcry_cipher_open_internal): Setup default taglen of OCB. (cipher_reset): Clear OCB specific data. (cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate) (_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions. (_gcry_cipher_setiv): Add OCB specific nonce setting. (_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN * src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New. (gcry_cipher_final): New. * cipher/bufhelp.h (buf_xor_1): New. * tests/basic.c (hex2buffer): New. (check_ocb_cipher): New. (main): Call it here. Add option --cipher-modes. * tests/bench-slope.c (bench_aead_encrypt_do_bench): Call gcry_cipher_final. (bench_aead_decrypt_do_bench): Ditto. (bench_aead_authenticate_do_bench): Ditto. Check error code. (bench_ocb_encrypt_do_bench): New. (bench_ocb_decrypt_do_bench): New. (bench_ocb_authenticate_do_bench): New. (ocb_encrypt_ops): New. (ocb_decrypt_ops): New. (ocb_authenticate_ops): New. (cipher_modes): Add them. (cipher_bench_one): Skip wrong block length for OCB. * tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add OCB support. -- See the comments on top of cipher/cipher-ocb.c for the patent status of the OCB mode. The implementation has not yet been optimized and as such is not faster that the other AEAD modes. A first candidate for optimization is the double_block function. Large improvements can be expected by writing an AES ECB function to work on multiple blocks. Signed-off-by: Werner Koch <wk@gnupg.org>
* Poly1305-AEAD: updated implementation to match ↵Jussi Kivilinna2014-12-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | draft-irtf-cfrg-chacha20-poly1305-03 * cipher/cipher-internal.h (gcry_cipher_handle): Use separate byte counters for AAD and data in Poly1305. * cipher/cipher-poly1305.c (poly1305_fill_bytecount): Remove. (poly1305_fill_bytecounts, poly1305_do_padding): New. (poly1305_aad_finish): Fill padding to Poly1305 and do not fill AAD length. (_gcry_cipher_poly1305_authenticate, _gcry_cipher_poly1305_encrypt) (_gcry_cipher_poly1305_decrypt): Update AAD and data length separately. (_gcry_cipher_poly1305_tag): Fill padding and bytecounts to Poly1305. (_gcry_cipher_poly1305_setkey, _gcry_cipher_poly1305_setiv): Reset AAD and data byte counts; only allow 96-bit IV. * cipher/cipher.c (_gcry_cipher_open_internal): Limit Poly1305-AEAD to ChaCha20 cipher. * tests/basic.c (_check_poly1305_cipher): Update test-vectors. (check_ciphers): Limit Poly1305-AEAD checks to ChaCha20. * tests/bench-slope.c (cipher_bench_one): Ditto. -- Latest Internet-Draft version for "ChaCha20 and Poly1305 for IETF protocols" has added additional padding to Poly1305-AEAD and limited support IV size to 96-bits: https://www.ietf.org/rfcdiff?url1=draft-nir-cfrg-chacha20-poly1305-03&difftype=--html&submit=Go!&url2=draft-irtf-cfrg-chacha20-poly1305-03 Patch makes Poly1305-AEAD implementation to match the changes and limits Poly1305-AEAD to ChaCha20 only. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* GCM: move Intel PCLMUL accelerated implementation to separate fileJussi Kivilinna2014-12-121-5/+8
| | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'. * cipher/cipher-gcm-intel-pclmul.c: New. * cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New prototypes. [GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move to 'cipher-gcm-intel-pclmul.c'. (ghash): Rename to... (ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'. (setupM): Move GCM_USE_INTEL_PCLMUL part to new function in 'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based on available HW acceleration. (do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'. * cipher/internal.h (ghash_fn_t): New. (gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Add Poly1305 based cipher AEAD modeJussi Kivilinna2014-05-121-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/Makefile.am: Add 'cipher-poly1305.c'. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'. (_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt) (_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate) (_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New. * cipher/cipher-poly1305.c: New. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey) (cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv) (_gcry_cipher_authenticate, _gcry_cipher_gettag) (_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'. (cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ... (_gcry_cipher_setiv): ... here, as with other modes. * src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'. * tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New. (check_ciphers): Add Poly1305 check. (check_cipher_modes): Call 'check_poly1305_cipher'. * tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to bench_aead_... and take nonce as argument. (bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto. (bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench) (bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench) (bench_poly1305_decrypt_do_bench) (bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops) (poly1305_decrypt_ops, poly1305_authenticate_ops): New. (cipher_modes): Add Poly1305. (cipher_bench_one): Add special handling for Poly1305. -- Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant of this mode is proposed for use in TLS and ipsec: https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04 http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* Use u64 for CCM data lengthsJussi Kivilinna2013-12-151-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-ccm.c: Move code inside [HAVE_U64_TYPEDEF]. [HAVE_U64_TYPEDEF] (_gcry_cipher_ccm_set_lengths): Use 'u64' for data lengths. [!HAVE_U64_TYPEDEF] (_gcry_cipher_ccm_encrypt) (_gcry_cipher_ccm_decrypt, _gcry_cipher_ccm_set_nonce) (_gcry_cipher_ccm_authenticate, _gcry_cipher_ccm_get_tag) (_gcry_cipher_ccm_check_tag): Dummy functions returning GPG_ERROR_NOT_SUPPORTED. * cipher/cipher-internal.h (gcry_cipher_handle.u_mode.ccm) (_gcry_cipher_ccm_set_lengths): Move inside [HAVE_U64_TYPEDEF] and use u64 instead of size_t for CCM data lengths. * cipher/cipher.c (_gcry_cipher_open_internal, cipher_reset) (_gcry_cipher_ctl) [!HAVE_U64_TYPEDEF]: Return GPG_ERR_NOT_SUPPORTED for CCM. (_gcry_cipher_ctl) [HAVE_U64_TYPEDEF]: Use u64 for GCRYCTL_SET_CCM_LENGTHS length parameters. * tests/basic.c: Do not use CCM if !HAVE_U64_TYPEDEF. * tests/bench-slope.c: Ditto. * tests/benchmark.c: Ditto. -- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* GCM: Move gcm_table initialization to setkeyJussi Kivilinna2013-11-211-9/+21
| | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm.c: Change all 'c->u_iv.iv' to 'c->u_mode.gcm.u_ghash_key.key'. (_gcry_cipher_gcm_setkey): New. (_gcry_cipher_gcm_initiv): Move ghash initialization to function above. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.gcm.u_ghash_key'; Reorder 'u_mode.gcm' members for partial clearing in gcry_cipher_reset. (_gcry_cipher_gcm_setkey): New prototype. * cipher/cipher.c (cipher_setkey): Add GCM setkey. (cipher_reset): Clear 'u_mode' only partially for GCM. -- GHASH tables can be generated at setkey time. No need to regenerate for every new IV. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
* GCM: Add support for split data buffers and online operationJussi Kivilinna2013-11-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | * cipher/cipher-gcm.c (do_ghash_buf): Add buffering for less than blocksize length input and padding handling. (_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt): Add handling for AAD padding and check if data has already being padded. (_gcry_cipher_gcm_authenticate): Check that AAD or data has not being padded yet. (_gcry_cipher_gcm_initiv): Clear padding marks. (_gcry_cipher_gcm_tag): Add finalization and padding; Clear sensitive data from cipher handle, since they are not used after generating tag. * cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.gcm.macbuf', 'u_mode.gcm.mac_unused', 'u_mode.gcm.ghash_data_finalized' and 'u_mode.gcm.ghash_aad_finalized'. * tests/basic.c (check_gcm_cipher): Rename to... (_check_gcm_cipher): ...this and add handling for different buffer step lengths; Enable per byte buffer testing. (check_gcm_cipher): Call _check_gcm_cipher with different buffer step sizes. -- Until now, GCM was expecting full data to be input in one go. This patch adds support for feeding data continuously (for encryption/decryption/aad). Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>