| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* build-aux/db2any: Update copyright notice.
* cipher/arcfour.c, cipher/blowfish.ccipher/cast5.c: Likewise.
* cipher/crc-armv8-ce.c, cipher/crc-intel-pclmul.c: Likewise.
* cipher/crc-ppc.c, cipher/crc.c, cipher/des.c: Likewise.
* cipher/md2.c, cipher/md4.c, cipher/md5.c: Likewise.
* cipher/primegen.c, cipher/rfc2268.c, cipher/rmd160.c: Likewise.
* cipher/seed.c, cipher/serpent.c, cipher/tiger.c: Likewise.
* cipher/twofish.c: Likewise.
* mpi/alpha/mpih-add1.S, mpi/alpha/mpih-lshift.S: Likewise.
* mpi/alpha/mpih-mul1.S, mpi/alpha/mpih-mul2.S: Likewise.
* mpi/alpha/mpih-mul3.S, mpi/alpha/mpih-rshift.S: Likewise.
* mpi/alpha/mpih-sub1.S, mpi/alpha/udiv-qrnnd.S: Likewise.
* mpi/amd64/mpih-add1.S, mpi/amd64/mpih-lshift.S: Likewise.
* mpi/amd64/mpih-mul1.S, mpi/amd64/mpih-mul2.S: Likewise.
* mpi/amd64/mpih-mul3.S, mpi/amd64/mpih-rshift.S: Likewise.
* mpi/amd64/mpih-sub1.S, mpi/config.links: Likewise.
* mpi/generic/mpih-add1.c, mpi/generic/mpih-lshift.c: Likewise.
* mpi/generic/mpih-mul1.c, mpi/generic/mpih-mul2.c: Likewise.
* mpi/generic/mpih-mul3.c, mpi/generic/mpih-rshift.c: Likewise.
* mpi/generic/mpih-sub1.c, mpi/generic/udiv-w-sdiv.c: Likewise.
* mpi/hppa/mpih-add1.S, mpi/hppa/mpih-lshift.S: Likewise.
* mpi/hppa/mpih-rshift.S, mpi/hppa/mpih-sub1.S: Likewise.
* mpi/hppa/udiv-qrnnd.S, mpi/hppa1.1/mpih-mul1.S: Likewise.
* mpi/hppa1.1/mpih-mul2.S, mpi/hppa1.1/mpih-mul3.S: Likewise.
* mpi/hppa1.1/udiv-qrnnd.S, mpi/i386/mpih-add1.S: Likewise.
* mpi/i386/mpih-lshift.S, mpi/i386/mpih-mul1.S: Likewise.
* mpi/i386/mpih-mul2.S, mpi/i386/mpih-mul3.S: Likewise.
* mpi/i386/mpih-rshift.S, mpi/i386/mpih-sub1.S: Likewise.
* mpi/i386/syntax.h, mpi/longlong.h: Likewise.
* mpi/m68k/mc68020/mpih-mul1.S, mpi/m68k/mc68020/mpih-mul2.S: Likewise.
* mpi/m68k/mc68020/mpih-mul3.S, mpi/m68k/mpih-add1.S: Likewise.
* mpi/m68k/mpih-lshift.S, mpi/m68k/mpih-rshift.S: Likewise.
* mpi/m68k/mpih-sub1.S, mpi/m68k/syntax.h: Likewise.
* mpi/mips3/mpih-add1.S, mpi/mips3/mpih-lshift.S: Likewise.
* mpi/mips3/mpih-mul1.S, mpi/mips3/mpih-mul2.S: Likewise.
* mpi/mips3/mpih-mul3.S, mpi/mips3/mpih-rshift.S: Likewise.
* mpi/mips3/mpih-sub1.S, mpi/mpi-add.c: Likewise.
* mpi/mpi-bit.c, mpi/mpi-cmp.c, mpi/mpi-div.c: Likewise.
* mpi/mpi-gcd.c, mpi/mpi-inline.c, mpi/mpi-inline.h: Likewise.
* mpi/mpi-internal.h, mpi/mpi-mpow.c, mpi/mpi-mul.c: Likewise.
* mpi/mpi-scan.c, mpi/mpih-div.c, mpi/mpih-mul.c: Likewise.
* mpi/pa7100/mpih-lshift.S, mpi/pa7100/mpih-rshift.S: Likewise.
* mpi/power/mpih-add1.S, mpi/power/mpih-lshift.S: Likewise.
* mpi/power/mpih-mul1.S, mpi/power/mpih-mul2.S: Likewise.
* mpi/power/mpih-mul3.S, mpi/power/mpih-rshift.S: Likewise.
* mpi/power/mpih-sub1.S, mpi/powerpc32/mpih-add1.S: Likewise.
* mpi/powerpc32/mpih-lshift.S, mpi/powerpc32/mpih-mul1.S: Likewise.
* mpi/powerpc32/mpih-mul2.S, mpi/powerpc32/mpih-mul3.S: Likewise.
* mpi/powerpc32/mpih-rshift.S, mpi/powerpc32/mpih-sub1.S: Likewise.
* mpi/powerpc32/syntax.h, mpi/sparc32/mpih-add1.S: Likewise.
* mpi/sparc32/mpih-lshift.S, mpi/sparc32/mpih-rshift.S: Likewise.
* mpi/sparc32/udiv.S, mpi/sparc32v8/mpih-mul1.S: Likewise.
* mpi/sparc32v8/mpih-mul2.S, mpi/sparc32v8/mpih-mul3.S: Likewise.
* mpi/supersparc/udiv.S: Likewise.
* random/random.h, random/rndegd.c: Likewise.
* src/cipher.h, src/libgcrypt.def, src/libgcrypt.vers: Likewise.
* src/missing-string.c, src/mpi.h, src/secmem.h: Likewise.
* src/stdmem.h, src/types.h: Likewise.
* tests/aeswrap.c, tests/curves.c, tests/hmac.c: Likewise.
* tests/keygrip.c, tests/prime.c, tests/random.c: Likewise.
* tests/t-kdf.c, tests/testapi.c: Likewise.
--
GnuPG-bug-id: 6271
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bulkhelp.h (bulk_crypt_fn_t): Make 'ctx' non-constant and
change 'num_blks' from 'unsigned int' to 'size_t'.
* cipher/camellia-glue.c (camellia_encrypt_blk1_32)
(camellia_encrypt_blk1_64, camellia_decrypt_blk1_32)
(camellia_decrypt_blk1_64): Adjust to match 'bulk_crypt_fn_t'.
* cipher/serpent.c (serpent_crypt_blk1_16, serpent_encrypt_blk1_16)
(serpent_decrypt_blk1_16): Likewise.
* cipher/sm4.c (crypt_blk1_16_fn_t, _gcry_sm4_aesni_avx_crypt_blk1_8)
(sm4_aesni_avx_crypt_blk1_16, _gcry_sm4_aesni_avx2_crypt_blk1_16)
(sm4_aesni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx2_crypt_blk1_16)
(sm4_gfni_avx2_crypt_blk1_16, _gcry_sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_gfni_avx512_crypt_blk32, sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_aarch64_crypt_blk1_8, sm4_aarch64_crypt_blk1_16)
(_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_16)
(_gcry_sm4_armv9_sve_ce_crypt, sm4_armv9_sve_ce_crypt_blk1_16)
(sm4_crypt_blocks, sm4_crypt_blk1_32, sm4_encrypt_blk1_32)
(sm4_decrypt_blk1_32): Likewise.
* cipher/twofish.c (twofish_crypt_blk1_16, twofish_encrypt_blk1_16)
(twofish_decrypt_blk1_16): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent-armv7-neon.S (_gcry_serpent_neon_blk8): New.
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_blk16): New.
* cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_blk8): New.
* cipher/serpent.c (_gcry_serpent_sse2_blk8)
(_gcry_serpent_avx2_blk16, _gcry_serpent_neon_blk8)
(_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt)
(serpent_crypt_blk1_16, serpent_encrypt_blk1_16)
(serpent_decrypt_blk1_16): New.
(serpent_setkey): Setup XTS and ECB bulk functions.
--
Benchmark on AMD Ryzen 9 7900X:
Before:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 5.42 ns/B 176.0 MiB/s 30.47 c/B 5625
ECB dec | 4.82 ns/B 197.9 MiB/s 27.11 c/B 5625
XTS enc | 5.57 ns/B 171.3 MiB/s 31.31 c/B 5625
XTS dec | 4.99 ns/B 191.1 MiB/s 28.07 c/B 5625
After:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.708 ns/B 1347 MiB/s 3.98 c/B 5625
ECB dec | 0.694 ns/B 1373 MiB/s 3.91 c/B 5625
XTS enc | 0.766 ns/B 1246 MiB/s 4.31 c/B 5625
XTS dec | 0.754 ns/B 1264 MiB/s 4.24 c/B 5625
GnuPG-bug-id: T6242
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth) [USE_NEON]: Cast "Ls" to 'const void **'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Remove 'cipher-selftest.c' and 'cipher-selftest.h'.
* cipher/cipher-selftest.c: Remove (refactor these tests to
tests/basic.c).
* cipher/cipher-selftest.h: Remove.
* cipher/blowfish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/camellia-glue.c (selftest_ctr_128, selftest_cbc_128)
(selftest_cfb_128): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/cast5.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/des.c (bulk_selftest_setkey, selftest_ctr, selftest_cbc)
(selftest_cfb): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/rijndael.c (selftest_basic_128, selftest_basic_192)
(selftest_basic_256): Allocate context from stack instead of heap and
handle alignment manually.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/serpent.c (selftest_ctr_128, selftest_cbc_128)
(selftest_cfb_128): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/sm4.c (selftest_ctr_128, selftest_cbc_128)
(selftest_cfb_128): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* cipher/twofish.c (selftest_ctr, selftest_cbc, selftest_cfb): Remove.
(selftest): Remove CTR/CBC/CFB bulk self-tests.
* tests/basic.c (buf_xor, cipher_cbc_bulk_test, buf_xor_2dst)
(cipher_cfb_bulk_test, cipher_ctr_bulk_test): New.
(check_ciphers): Run cipher_cbc_bulk_test(), cipher_cfb_bulk_test() and
cipher_ctr_bulk_test() for block ciphers.
---
CBC/CFB/CTR bulk self-tests are quite computationally heavy and
slow down use cases where application opens cipher context once,
does processing and exits. Better place for these tests is in
`tests/basic`.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bulkhelp.h: New.
* cipher/camellia-glue.c (_gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_crypt): Use new
`bulk_ocb_prepare_L_pointers_array_blkXX` function for OCB L pointer
array setup.
* cipher/serpent.c (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth): Likewise.
* cipher/sm4.c (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Likewise.
* cipher/twofish.c (_gcry_twofish_ocb_crypt)
(_gcry_twofish_ocb_auth): Likewise.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/blake2.c: Use const.
* cipher/camellia-glue.c, cipher/cipher.c, cipher/crc.c: Likewise.
* cipher/des.c, cipher/gost28147.c, cipher/gostr3411-94.c: Likewise.
* cipher/keccak.c, cipher/mac-cmac.c, cipher/mac-gmac.c: Likewise.
* cipher/mac-hmac.c, cipher/mac-internal.h: Likewise.
* cipher/mac-poly1305.c, cipher/mac.c, cipher/md.c: Likewise.
* cipher/md.c, cipher/md2.c, cipher/md4.c, cipher/md5.c: Likewise.
* cipher/pubkey.c, cipher/rfc2268.c, cipher/rijndael.c: Likewise.
* cipher/rmd160.c, cipher/seed.c, cipher/serpent.c: Likewise.
* cipher/sha1.c, cipher/sha256.c, cipher/sha512.c: Likewise.
* cipher/sm3.c, cipher/sm4.c, cipher/stribog.c: Likewise.
* cipher/pubkey.c, cipher/rfc2268.c, cipher/rijndael.c: Likewise.
* src/cipher-proto.h, src/cipher.h: Likewise.
--
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/idea.c (do_setkey): Return GPG_ERR_INV_KEYLEN.
* cipher/rfc2268.c (setkey_core): Likewise.
* cipher/serpent.c (serpent_setkey_internal): Likewise.
(serpent_setkey): Likewise.
--
Reported-by: Guido Vranken <guidovranken@gmail.com>
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_mode_ops_t, cipher_bulk_ops_t): New.
(gcry_cipher_handle): Define members 'mode_ops' and 'bulk' using new
types.
* cipher/cipher.c (_gcry_cipher_open_internal): Remove bulk function
setup.
(cipher_setkey): Pass context bulk function pointer to algorithm setkey
function.
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk
function parameter; Use bulk function returned by setkey function.
* cipher/cipher-selftest.h (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Remove bulk
function parameter.
* cipher/arcfour.c (arcfour_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/blowfish.c (bf_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/camellia.c (camellia_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec)
(_gcry_camellia_cfb_dec, _gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/cast5.c (cast_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Make
static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest): Pass 'bulk_ops' to setkey function.
* cipher/chacha20.c (chacha20_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/cast5.c (do_tripledes_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec): Make
static.
(bulk_selftest_setkey): Change 'hd' parameter to 'bulk_ops'.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(do_des_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/gost28147.c (gost_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/idea.c (idea_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/rfc2268.c (do_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/rijndael.c (do_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(rijndael_setkey): Change 'hd' parameter to 'bulk_ops'.
(_gcry_aes_cfb_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_enc)
(_gcry_aes_cbc_dec, _gcry_aes_ctr_enc, _gcry_aes_ocb_crypt)
(_gcry_aes_ocb_auth, _gcry_aes_xts_crypt): Make static.
(selftest_basic_128, selftest_basic_192, selftest_basic_256): Pass
'bulk_ops' to setkey function.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
* cipher/salsa20.c (salsa20_setkey): Change 'hd' parameter to
'bulk_ops'.
* cipher/seed.c (seed_setkey): Change 'hd' parameter to 'bulk_ops'.
* cipher/serpent.c (serpent_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec, _gcry_serpent_cfb_dec)
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Make static.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass
bulk function to selftest helper.
* cipher/sm4.c (sm4_setkey): Change 'hd' parameter to 'bulk_ops'; Setup
'bulk_ops' with bulk acceleration functions.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth): Make static.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Do not pass
bulk function to selftest helper.
* cipher/twofish.c (twofish_setkey): Change 'hd' parameter to
'bulk_ops'; Setup 'bulk_ops' with bulk acceleration functions.
(_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(_gcry_twofish_cfb_dec, _gcry_twofish_ocb_crypt)
(_gcry_twofish_ocb_auth): Make static.
(selftest_ctr, selftest_cbc, selftest_cfb): Do not pass bulk function
to selftest helper.
(selftest, main): Pass 'bulk_ops' to setkey function.
* src/cipher-proto.h: Forward declare 'cipher_bulk_ops_t'.
(gcry_cipher_setkey_t): Replace 'hd' with 'bulk_ops'.
* src/cipher.h: Remove bulk acceleration function prototypes for
'aes', 'blowfish', 'cast5', 'camellia', '3des', 'serpent', 'sm4' and
'twofish'.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (cipher_block_add): New.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc): Use new helper function
for CTR block increment.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/des.c (_gcry_3des_ctr_enc): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_get_he32, buf_put_he32, buf_get_he64)
(buf_put_he64): New.
* cipher/cipher-internal.h (cipher_block_cpy, cipher_block_xor)
(cipher_block_xor_1, cipher_block_xor_2dst, cipher_block_xor_n_copy_2)
(cipher_block_xor_n_copy): New.
* cipher/cipher-gcm-intel-pclmul.c
(_gcry_ghash_setup_intel_pclmul): Use assembly for swapping endianness
instead of buf_get_be64 and buf_cpy.
* cipher/blowfish.c: Use new cipher_block_* functions for cipher block
sized buf_cpy/xor* operations.
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/cipher-aeswrap.c: Ditto.
* cipher/cipher-cbc.c: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-cfb.c: Ditto.
* cipher/cipher-cmac.c: Ditto.
* cipher/cipher-ctr.c: Ditto.
* cipher/cipher-eax.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-ocb.c: Ditto.
* cipher/cipher-ofb.c: Ditto.
* cipher/cipher-xts.c: Ditto.
* cipher/des.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
--
This commit adds size-optimized functions for copying and xoring
cipher block sized buffers. These functions also allow GCC to use
inline auto-vectorization for block cipher copying and xoring on
higher optimization levels.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (cipher_setkey): Pass cipher object pointer to
cipher's setkey function.
* cipher/arcfour.c: Add gcry_cipher_hd_t parameter for setkey
functions and update selftests to pass NULL pointer.
* cipher/blowfish.c: Ditto.
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/chacha20.c: Ditto.
* cipher/cipher-selftest.c: Ditto.
* cipher/des.c: Ditto.
* cipher/gost28147.c: Ditto.
* cipher/idea.c: Ditto.
* cipher/rfc2268.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/salsa20.c: Ditto.
* cipher/seed.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
* src/cipher-proto.h: Ditto.
--
This allows setkey function to replace bulk cipher operations
with faster alternative.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c (_gcry_cipher_ocb_get_l): Remove.
(ocb_get_L_big): New.
(_gcry_cipher_ocb_authenticate): L-big handling done in upper
processing loop, so that lower level never sees the case where
'aad_nblocks % 65536 == 0'; Add missing stack burn.
(ocb_aad_finalize): Add missing stack burn.
(ocb_crypt): L-big handling done in upper processing loop, so that
lower level never sees the case where 'data_nblocks % 65536 == 0'.
* cipher/cipher-internal.h (_gcry_cipher_ocb_get_l): Remove.
(ocb_get_l): Remove 'l_tmp' usage and simplify since input
is more limited now, 'N is not multiple of 65536'.
* cipher/rijndael-aesni.c (get_l): Remove.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Remove
l_tmp; Use 'ocb_get_l'.
* cipher/rijndael-ssse3-amd64.c (get_l): Remove.
(ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Remove
l_tmp; Use 'ocb_get_l'.
* cipher/camellia-glue.c: Remove OCB l_tmp usage.
* cipher/rijndael-armv8-ce.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
--
Move large L value generation to up-most level to simplify lower level
ocb_get_l for greater performance and simpler implementation. This helps
implementing OCB in assembly as 'ocb_get_l' no longer has function call
on slow-path.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (serpent128_oids, serpent192_oids)
(serpent256_oids): New. Add them to the specs blow.
(serpent128_aliases): Add "SERPENT-128".
(serpent256_aliases, serpent192_aliases): New.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-glue.c (_gcry_camellia_aesni_avx_ocb_enc)
(_gcry_camellia_aesni_avx_ocb_dec, _gcry_camellia_aesni_avx_ocb_auth)
(_gcry_camellia_aesni_avx2_ocb_enc, _gcry_camellia_aesni_avx2_ocb_dec)
(_gcry_camellia_aesni_avx2_ocb_auth, _gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Change 'Ls' from pointer array to u64 array.
* cipher/serpent.c (_gcry_serpent_sse2_ocb_enc)
(_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth)
(_gcry_serpent_avx2_ocb_enc, _gcry_serpent_avx2_ocb_dec)
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Ditto.
* cipher/twofish.c (_gcry_twofish_amd64_ocb_enc)
(_gcry_twofish_amd64_ocb_dec, _gcry_twofish_amd64_ocb_auth)
(twofish_amd64_ocb_enc, twofish_amd64_ocb_dec, twofish_amd64_ocb_auth)
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Ditto.
--
Pointers on x32 are 32-bit, but amd64 assembly implementations
expect 64-bit pointers. Pass 'Ls' array to 64-bit integers so
that input arrays has correct format for assembly functions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-glue.c (_gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Precalculate Ls array always, instead of
just if 'blkn % <parallel blocks> == 0'.
* cipher/serpent.c (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth): Ditto.
* cipher/rijndael-aesni.c (get_l): Remove low-bit checks.
(aes_ocb_enc, aes_ocb_dec, _gcry_aes_aesni_ocb_auth): Handle leading
blocks until block counter is multiple of 4, so that parallel block
processing loop can use 'c->u_mode.ocb.L' array directly.
* tests/basic.c (check_ocb_cipher_largebuf): Rename to...
(check_ocb_cipher_largebuf_split): ...this and add option to process
large buffer as two split buffers.
(check_ocb_cipher_largebuf): New.
--
Patch simplifies source and reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-internal.h (ocb_get_l): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Use 'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/camellia-glue.c (get_l): Remove.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/rijndael-aesni.c (get_l): Add fast path for 75% most common
offsets.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Precalculate
offset array when block count matches parallel operation size.
* cipher/rijndael-ssse3-amd64.c (get_l): Add fast path for 75% most
common offsets.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Use
'ocb_get_l' instead of '_gcry_cipher_ocb_get_l'.
* cipher/serpent.c (get_l): Remove.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Precalculate
offset array when block count matches parallel operation size; Use
'ocb_get_l' instead of 'get_l'.
* cipher/twofish.c (get_l): Remove.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Use 'ocb_get_l'
instead of 'get_l'.
--
Patch optimizes OCB offset calculation for generic code and
assembly implementations with parallel block processing.
Benchmark of OCB AES-NI on Intel Haswell:
$ tests/bench-slope --cpu-mhz 3201 cipher aes
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.274 ns/B 3483.9 MiB/s 0.876 c/B
CTR dec | 0.273 ns/B 3490.0 MiB/s 0.875 c/B
OCB enc | 0.289 ns/B 3296.1 MiB/s 0.926 c/B
OCB dec | 0.299 ns/B 3189.9 MiB/s 0.957 c/B
OCB auth | 0.260 ns/B 3670.0 MiB/s 0.832 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.4 MiB/s 0.875 c/B
CTR dec | 0.273 ns/B 3487.5 MiB/s 0.875 c/B
OCB enc | 0.248 ns/B 3852.8 MiB/s 0.792 c/B
OCB dec | 0.261 ns/B 3659.5 MiB/s 0.834 c/B
OCB auth | 0.227 ns/B 4205.5 MiB/s 0.726 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate)
(ocb_crypt): Change bulk function to return number of unprocessed
blocks.
* src/cipher.h (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth)
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth)
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth)
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type
to 'size_t'.
* cipher/camellia-glue.c (get_l): Only if USE_AESNI_AVX or
USE_AESNI_AVX2 defined.
(_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_AESNI_AVX or
USE_AESNI_AVX2 defined; Remove unaccelerated common code.
* cipher/rijndael.c (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth): Change
return type to 'size_t' and return zero.
* cipher/serpent.c (get_l): Only if USE_SSE2, USE_AVX2 or USE_NEON
defined.
(_gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_SSE2, USE_AVX2 or
USE_NEON defined; Remove unaccelerated common code.
* cipher/twofish.c (get_l): Only if USE_AMD64_ASM defined.
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Change return type
to 'size_t' and return remaining blocks; Remove unaccelerated common
code path. Enable remaining common code only if USE_AMD64_ASM defined;
Remove unaccelerated common code.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (_gcry_cipher_open_internal): Setup OCB bulk
functions for Serpent.
* cipher/serpent-armv7-neon.S: Add OCB assembly functions.
* cipher/serpent-avx2-amd64.S: Add OCB assembly functions.
* cipher/serpent-sse2-amd64.S: Add OCB assembly functions.
* cipher/serpent.c (_gcry_serpent_sse2_ocb_enc)
(_gcry_serpent_sse2_ocb_dec, _gcry_serpent_sse2_ocb_auth)
(_gcry_serpent_neon_ocb_enc, _gcry_serpent_neon_ocb_dec)
(_gcry_serpent_neon_ocb_auth, _gcry_serpent_avx2_ocb_enc)
(_gcry_serpent_avx2_ocb_dec, _gcry_serpent_avx2_ocb_auth): New
prototypes.
(get_l, _gcry_serpent_ocb_crypt, _gcry_serpent_ocb_auth): New.
* src/cipher.h (_gcry_serpent_ocb_crypt)
(_gcry_serpent_ocb_auth): New.
* tests/basic.c (check_ocb_cipher): Add test-vector for serpent.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent-avx2-amd64.S: Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(ELF): New macro to mask lines with ELF specific commands.
* cipher/serpent-sse2-amd64.S: Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
(ELF): New macro to mask lines with ELF specific commands.
* cipher/chacha20.c (USE_SSE2, USE_AVX2): Enable when
HAVE_COMPATIBLE_GCC_WIN64_PLATFORM_AS defined.
[USE_SSE2 || USE_AVX2] (ASM_FUNC_ABI): New.
(_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec)
(_gcry_serpent_sse2_cfb_dec, _gcry_serpent_avx2_ctr_enc)
(_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Add
ASM_FUNC_ABI.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/salsa20.c (USE_ARM_NEON_ASM): Define only if
ENABLE_NEON_SUPPORT is defined.
* cipher/serpent.c (USE_NEON): Ditto.
* cipher/sha1.c (USE_NEON): Ditto.
* cipher/sha512.c (USE_ARM_NEON_ASM): Ditto.
--
The generic C source files must only include NEON support if that is
enabled. The dedicated ASM files are conditionally compiled and thus
do not need to use it.
GnuPG-bug-id: 1603
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/arcfour.c (do_encrypt_stream, encrypt_stream): Use 'size_t'
for buffer lengths.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Ditto.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc)
(_gcry_camellia_cbc_dec, _gcry_blowfish_cfb_dec): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec)
(_gcry_cast5_cfb_dec): Ditto.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Ditto.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Ditto.
* cipher/cipher-ccm.c (_gcry_cipher_ccm_encrypt)
(_gcry_cipher_ccm_decrypt): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/cipher-internal.h (gcry_cipher_handle->bulk)
(_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt)
(_gcry_cipher_cfb_encrypt, _gcry_cipher_cfb_decrypt)
(_gcry_cipher_ofb_encrypt, _gcry_cipher_ctr_encrypt)
(_gcry_cipher_aeswrap_encrypt, _gcry_cipher_aeswrap_decrypt)
(_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_cbc_encrypt): Ditto.
* cipher/cipher-selftest.h (gcry_cipher_bulk_cbc_dec_t)
(gcry_cipher_bulk_cfb_dec_t, gcry_cipher_bulk_ctr_enc_t): Ditto.
* cipher/cipher.c (cipher_setkey, cipher_setiv, do_ecb_crypt)
(do_ecb_encrypt, do_ecb_decrypt, cipher_encrypt)
(cipher_decrypt): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc, _gcry_aes_cbc_dec)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_enc, _gcry_aes_cfb_enc): Ditto.
* cipher/salsa20.c (salsa20_setiv, salsa20_do_encrypt_stream)
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_cfb_dec): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(_gcry_twofish_cfb_dec): Ditto.
* src/cipher-proto.h (gcry_cipher_stencrypt_t)
(gcry_cipher_stdecrypt_t, cipher_setiv_fuct_t): Ditto.
* src/cipher.h (_gcry_aes_cfb_enc, _gcry_aes_cfb_dec)
(_gcry_aes_cbc_enc, _gcry_aes_cbc_dec, _gcry_aes_ctr_enc)
(_gcry_blowfish_cfb_dec, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_ctr_enc, _gcry_cast5_cfb_dec, _gcry_cast5_cbc_dec)
(_gcry_cast5_ctr_enc, _gcry_camellia_cfb_dec, _gcry_camellia_cbc_dec)
(_gcry_camellia_ctr_enc, _gcry_serpent_cfb_dec, _gcry_serpent_cbc_dec)
(_gcry_serpent_ctr_enc, _gcry_twofish_cfb_dec, _gcry_twofish_cbc_dec)
(_gcry_twofish_ctr_enc): Ditto.
--
On 64-bit platforms, cipher module internally converts 64-bit size_t values
to 32-bit unsigned integers.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/camellia-aesni-avx2-amd64.S
(_gcry_camellia_aesni_avx2_ctr_enc): Byte-swap before checking for
overflow handling.
* cipher/camellia-glue.c (selftest_ctr_128, selftest_cfb_128)
(selftest_cbc_128): Add 16 to nblocks.
* cipher/cipher-selftest.c (_gcry_selftest_helper_ctr): Add test with
non-overflowing IV and modify overflow IV to detect broken endianness
handling.
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_ctr_enc): Byte-swap
before checking for overflow handling; Fix crazy-mixed-endian IV
construction to big-endian.
* cipher/serpent.c (selftest_ctr_128, selftest_cfb_128)
(selftest_cbc_128): Add 8 to nblocks.
--
The selftest for CTR was setting counter-IV to all '0xff' except last byte.
This had the effect that even with broken endianness handling Serpent-AVX2 and
Camellia-AVX2 passed the tests.
Patch corrects the CTR selftest and fixes the broken implementations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (SBOX, SBOX_INVERSE): Remove index argument.
(serpent_subkeys_generate): Use smaller temporary arrays for subkey
generation and perform stack clearing locally.
(serpent_setkey_internal): Use wipememory to clear stack and remove
_gcry_burn_stack.
(serpent_setkey): Remove unneeded _gcry_burn_stack.
--
Avoid using large arrays and large stack burning to gain extra speed for
key setup.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'serpent-armv7-neon.S'.
* cipher/serpent-armv7-neon.S: New.
* cipher/serpent.c (USE_NEON): New macro.
(serpent_context_t) [USE_NEON]: Add 'use_neon'.
[USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec): New prototypes.
(serpent_setkey_internal) [USE_NEON]: Detect NEON support.
(_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations
to process eight blocks in parallel.
* configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'.
--
Patch adds ARM NEON optimized implementation of Serpent cipher
to speed up parallelizable bulk operations.
Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz):
Old:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 43.53 ns/B 21.91 MiB/s 43.88 c/B
CFB dec | 44.77 ns/B 21.30 MiB/s 45.13 c/B
CTR enc | 45.21 ns/B 21.10 MiB/s 45.57 c/B
CTR dec | 45.21 ns/B 21.09 MiB/s 45.57 c/B
New:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 26.26 ns/B 36.32 MiB/s 26.47 c/B
CFB dec | 26.21 ns/B 36.38 MiB/s 26.42 c/B
CTR enc | 26.20 ns/B 36.40 MiB/s 26.41 c/B
CTR dec | 26.20 ns/B 36.40 MiB/s 26.41 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_cpy): New.
(buf_xor, buf_xor_2dst): If buffers unaligned, always jump to per-byte
processing.
(buf_xor_n_copy_2): New.
(buf_xor_n_copy): Use 'buf_xor_n_copy_2'.
* cipher/blowfish.c (_gcry_blowfish_cbc_dec): Avoid extra memory copy
and use new 'buf_xor_n_copy_2'.
* cipher/camellia-glue.c (_gcry_camellia_cbc_dec): Ditto.
* cipher/cast5.c (_gcry_cast_cbc_dec): Ditto.
* cipher/serpent.c (_gcry_serpent_cbc_dec): Ditto.
* cipher/twofish.c (_gcry_twofish_cbc_dec): Ditto.
* cipher/rijndael.c (_gcry_aes_cbc_dec): Ditto.
(do_encrypt, do_decrypt): Use 'buf_cpy' instead of 'memcpy'.
(_gcry_aes_cbc_enc): Avoid copying IV, use 'last_iv' pointer instead.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt): Avoid copying IV,
update pointer to IV instead.
(_gcry_cipher_cbc_decrypt): Avoid extra memory copy and use new
'buf_xor_n_copy_2'.
(_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt): Avoid extra
accesses to c->spec, use 'buf_cpy' instead of memcpy.
* cipher/cipher-ccm.c (do_cbc_mac): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
* cipher/cipher.c (do_ecb_encrypt, do_ecb_decrypt): Ditto.
--
Patch improves the speed of the generic block cipher mode code. Especially on
targets without faster unaligned memory accesses, the generic code was slower
than the algorithm specific bulk versions. With this patch, this issue should
be solved.
Tests on Cortex-A8; compiled for ARMv4, without unaligned-accesses:
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 490ms 500ms 560ms 580ms 530ms 540ms 560ms 560ms 550ms 540ms 1080ms 1080ms
TWOFISH 230ms 230ms 290ms 300ms 260ms 240ms 290ms 290ms 240ms 240ms 520ms 510ms
DES 720ms 720ms 800ms 860ms 770ms 770ms 810ms 820ms 770ms 780ms - -
CAST5 340ms 340ms 440ms 250ms 390ms 250ms 440ms 430ms 260ms 250ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 500ms 490ms 520ms 520ms 530ms 520ms 530ms 540ms 500ms 520ms 1060ms 1070ms
TWOFISH 230ms 220ms 250ms 230ms 260ms 230ms 260ms 260ms 230ms 230ms 500ms 490ms
DES 720ms 720ms 750ms 760ms 740ms 750ms 770ms 770ms 760ms 760ms - -
CAST5 340ms 340ms 370ms 250ms 370ms 250ms 380ms 390ms 250ms 250ms - -
Tests on Cortex-A8; compiled for ARMv7-A, with unaligned-accesses:
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 430ms 440ms 480ms 530ms 470ms 460ms 490ms 480ms 470ms 460ms 930ms 940ms
TWOFISH 220ms 220ms 250ms 230ms 240ms 230ms 270ms 250ms 230ms 240ms 480ms 470ms
DES 550ms 540ms 620ms 690ms 570ms 540ms 630ms 650ms 590ms 580ms - -
CAST5 300ms 300ms 380ms 230ms 330ms 230ms 380ms 370ms 230ms 230ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 430ms 430ms 460ms 450ms 460ms 450ms 470ms 470ms 460ms 470ms 900ms 930ms
TWOFISH 220ms 210ms 240ms 230ms 230ms 230ms 250ms 250ms 230ms 230ms 470ms 470ms
DES 540ms 540ms 580ms 570ms 570ms 570ms 560ms 620ms 580ms 570ms - -
CAST5 300ms 290ms 310ms 230ms 320ms 230ms 350ms 350ms 230ms 230ms - -
Tests on Intel Atom N160 (i386):
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 380ms 380ms 410ms 420ms 400ms 400ms 410ms 410ms 390ms 400ms 820ms 800ms
TWOFISH 340ms 340ms 370ms 350ms 360ms 340ms 370ms 370ms 330ms 340ms 710ms 700ms
DES 660ms 650ms 710ms 740ms 680ms 700ms 700ms 710ms 680ms 680ms - -
CAST5 340ms 340ms 380ms 330ms 360ms 330ms 390ms 390ms 320ms 330ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 380ms 380ms 390ms 410ms 400ms 390ms 410ms 400ms 400ms 390ms 810ms 800ms
TWOFISH 330ms 340ms 350ms 360ms 350ms 340ms 380ms 370ms 340ms 360ms 700ms 710ms
DES 630ms 640ms 660ms 690ms 680ms 680ms 700ms 690ms 680ms 680ms - -
CAST5 340ms 330ms 350ms 330ms 370ms 340ms 380ms 390ms 330ms 330ms - -
Tests in Intel i5-4570 (x86-64):
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 560ms 560ms 600ms 590ms 600ms 570ms 570ms 570ms 580ms 590ms 1200ms 1180ms
TWOFISH 240ms 240ms 270ms 160ms 260ms 160ms 250ms 250ms 160ms 160ms 430ms 430ms
DES 570ms 570ms 640ms 590ms 630ms 580ms 600ms 600ms 610ms 620ms - -
CAST5 410ms 410ms 470ms 150ms 470ms 150ms 450ms 450ms 150ms 160ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 560ms 560ms 590ms 570ms 580ms 570ms 570ms 570ms 590ms 590ms 1200ms 1200ms
TWOFISH 240ms 240ms 260ms 160ms 250ms 170ms 250ms 250ms 160ms 160ms 430ms 430ms
DES 570ms 570ms 620ms 580ms 630ms 570ms 600ms 590ms 620ms 620ms - -
CAST5 410ms 410ms 460ms 150ms 460ms 160ms 450ms 450ms 150ms 150ms - -
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/gcrypt-module.h (gcry_cipher_spec_t): Move to ...
* src/cipher-proto.h (gcry_cipher_spec_t): here. Merge with
cipher_extra_spec_t. Add fields ALGO and FLAGS. Set these fields in
all cipher modules.
* cipher/cipher.c: Change most code to replace the former module
system by a simpler system to gain information about the algorithms.
(disable_pubkey_algo): Simplified. Not anymore thread-safe, though.
* cipher/md.c (_gcry_md_selftest): Use correct structure. Not a real
problem because both define the same function as their first field.
* cipher/pubkey.c (_gcry_pk_selftest): Take care of the disabled flag.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bithelp.h (bswap32, bswap64, le_bswap32, be_bswap32)
(le_bswap64, be_bswap64): New.
* cipher/bufhelp.h (buf_get_be32, buf_get_le32, buf_put_le32)
(buf_put_be32, buf_get_be64, buf_get_le64, buf_put_be64)
(buf_put_le64): New.
* cipher/blowfish.c (do_encrypt_block, do_decrypt_block): Use new
endian conversion helpers.
(do_bf_setkey): Turn endian specific code to generic.
* cipher/camellia.c (GETU32, PUTU32): Use new endian conversion
helpers.
* cipher/cast5.c (rol): Remove, use rol from bithelp.
(F1, F2, F3): Fix to use rol from bithelp.
(do_encrypt_block, do_decrypt_block, do_cast_setkey): Use new endian
conversion helpers.
* cipher/des.c (READ_64BIT_DATA, WRITE_64BIT_DATA): Ditto.
* cipher/md4.c (transform, md4_final): Ditto.
* cipher/md5.c (transform, md5_final): Ditto.
* cipher/rmd160.c (transform, rmd160_final): Ditto.
* cipher/salsa20.c (LE_SWAP32, LE_READ_UINT32): Ditto.
* cipher/scrypt.c (READ_UINT64, LE_READ_UINT64, LE_SWAP32): Ditto.
* cipher/seed.c (GETU32, PUTU32): Ditto.
* cipher/serpent.c (byte_swap_32): Remove.
(serpent_key_prepare, serpent_encrypt_internal)
(serpent_decrypt_internal): Use new endian conversion helpers.
* cipher/sha1.c (transform, sha1_final): Ditto.
* cipher/sha256.c (transform, sha256_final): Ditto.
* cipher/sha512.c (__transform, sha512_final): Ditto.
* cipher/stribog.c (transform, stribog_final): Ditto.
* cipher/tiger.c (transform, tiger_final): Ditto.
* cipher/twofish.c (INPACK, OUTUNPACK): Ditto.
* cipher/whirlpool.c (buffer_to_block, block_to_buffer): Ditto.
* configure.ac (gcry_cv_have_builtin_bswap32): Check for compiler
provided __builtin_bswap32.
(gcry_cv_have_builtin_bswap64): Check for compiler provided
__builtin_bswap64.
--
Patch add helper functions that provide conversions to/from integers and
buffers of different endianess. Benefits are code cleanup and optimization
for architectures that have byte-swaping instructions and/or can do fast
unaligned memory accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/gcrypt-module.h (gcry_cipher_encrypt_t)
(gcry_cipher_decrypt_t): Return 'unsigned int'.
* cipher/cipher.c (dummy_encrypt_block, dummy_decrypt_block): Return
zero.
(do_ecb_encrypt, do_ecb_decrypt): Get largest stack burn depth from
block cipher crypt function and burn stack at end.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Ditto.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_cbc_encrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
* cipher/blowfish.c (encrypt_block, decrypt_block): Return burn stack
depth.
* cipher/camellia-glue.c (camellia_encrypt, camellia_decrypt): Ditto.
* cipher/cast5.c (encrypt_block, decrypt_block): Ditto.
* cipher/des.c (do_tripledes_encrypt, do_tripledes_decrypt)
(do_des_encrypt, do_des_decrypt): Ditto.
* cipher/idea.c (idea_encrypt, idea_decrypt): Ditto.
* cipher/rijndael.c (rijndael_encrypt, rijndael_decrypt): Ditto.
* cipher/seed.c (seed_encrypt, seed_decrypt): Ditto.
* cipher/serpent.c (serpent_encrypt, serpent_decrypt): Ditto.
* cipher/twofish.c (twofish_encrypt, twofish_decrypt): Ditto.
* cipher/rfc2268.c (encrypt_block, decrypt_block): New.
(_gcry_cipher_spec_rfc2268_40): Use encrypt_block and decrypt_block.
--
Patch moves stack burning from block ciphers and cipher mode loop to end of
cipher mode functions. This greatly reduces the overall CPU usage of the
problematic _gcry_burn_stack. Internal cipher module API is changed so
that encrypt/decrypt functions now return the stack burn depth as unsigned
int to cipher mode function.
(Note, patch also adds missing burn_stack for RFC2268_40 cipher).
_gcry_burn_stack CPU time (looping tests/benchmark cipher blowfish):
arch CPU Old New
i386 Intel-Haswell 4.1% 0.16%
x86_64 Intel-Haswell 3.4% 0.07%
armhf Cortex-A8 8.7% 0.14%
New vs. old (armhf/Cortex-A8):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 1.05x 1.05x 1.04x 1.04x 1.04x 1.04x 1.07x 1.05x 1.04x 1.04x
3DES 1.04x 1.03x 1.04x 1.03x 1.04x 1.04x 1.04x 1.04x 1.04x 1.04x
CAST5 1.19x 1.20x 1.15x 1.00x 1.17x 1.00x 1.15x 1.05x 1.00x 1.00x
BLOWFISH 1.21x 1.22x 1.16x 1.00x 1.18x 1.00x 1.16x 1.16x 1.00x 1.00x
AES 1.09x 1.09x 1.00x 1.00x 1.00x 1.00x 1.07x 1.07x 1.00x 1.00x
AES192 1.11x 1.11x 1.00x 1.00x 1.00x 1.00x 1.08x 1.09x 1.01x 1.00x
AES256 1.07x 1.08x 1.01x .99x 1.00x 1.00x 1.07x 1.06x 1.00x 1.00x
TWOFISH 1.10x 1.09x 1.09x 1.00x 1.09x 1.00x 1.08x 1.09x 1.00x 1.00x
ARCFOUR 1.00x 1.00x
DES 1.07x 1.11x 1.06x 1.08x 1.07x 1.07x 1.06x 1.06x 1.06x 1.06x
TWOFISH128 1.10x 1.10x 1.09x 1.00x 1.09x 1.00x 1.08x 1.08x 1.00x 1.00x
SERPENT128 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x
SERPENT192 1.07x 1.06x 1.03x 1.00x 1.06x 1.00x 1.06x 1.05x 1.00x 1.00x
SERPENT256 1.06x 1.07x 1.02x 1.00x 1.06x 1.00x 1.05x 1.06x 1.00x 1.00x
RFC2268_40 0.97x 1.01x 0.99x 0.98x 1.00x 0.97x 0.96x 0.96x 0.97x 0.97x
SEED 1.45x 1.54x 1.53x 1.56x 1.50x 1.51x 1.50x 1.50x 1.42x 1.42x
CAMELLIA128 1.08x 1.07x 1.06x 1.00x 1.07x 1.00x 1.06x 1.06x 1.00x 1.00x
CAMELLIA192 1.08x 1.08x 1.08x 1.00x 1.07x 1.00x 1.07x 1.07x 1.00x 1.00x
CAMELLIA256 1.08x 1.09x 1.07x 1.01x 1.08x 1.00x 1.07x 1.07x 1.00x 1.00x
SALSA20 .99x 1.00x
Raw data:
New (armhf/Cortex-A8):
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 8620ms 8680ms 9640ms 10010ms 9140ms 8960ms 9630ms 9660ms 9180ms 9180ms
3DES 13990ms 14000ms 14780ms 15300ms 14320ms 14370ms 14780ms 14780ms 14480ms 14480ms
CAST5 2980ms 2980ms 3780ms 2300ms 3290ms 2320ms 3770ms 4100ms 2320ms 2320ms
BLOWFISH 2740ms 2660ms 3530ms 2060ms 3050ms 2080ms 3530ms 3530ms 2070ms 2070ms
AES 2200ms 2330ms 2330ms 2450ms 2270ms 2270ms 2700ms 2690ms 2330ms 2320ms
AES192 2550ms 2670ms 2700ms 2910ms 2630ms 2640ms 3060ms 3060ms 2680ms 2690ms
AES256 2920ms 3010ms 3040ms 3190ms 3010ms 3000ms 3380ms 3420ms 3050ms 3050ms
TWOFISH 2790ms 2840ms 3300ms 2950ms 3010ms 2870ms 3310ms 3280ms 2940ms 2940ms
ARCFOUR 2050ms 2050ms
DES 5640ms 5630ms 6440ms 6970ms 5960ms 6000ms 6440ms 6440ms 6120ms 6120ms
TWOFISH128 2790ms 2840ms 3300ms 2950ms 3010ms 2890ms 3310ms 3290ms 2930ms 2930ms
SERPENT128 4530ms 4340ms 5210ms 4470ms 4740ms 4620ms 5020ms 5030ms 4680ms 4680ms
SERPENT192 4510ms 4340ms 5190ms 4460ms 4750ms 4620ms 5020ms 5030ms 4680ms 4680ms
SERPENT256 4540ms 4330ms 5220ms 4460ms 4730ms 4600ms 5030ms 5020ms 4680ms 4680ms
RFC2268_40 10530ms 7790ms 11140ms 9490ms 10650ms 10710ms 11710ms 11690ms 11000ms 11000ms
SEED 4530ms 4540ms 5050ms 5380ms 4760ms 4810ms 5060ms 5060ms 4850ms 4860ms
CAMELLIA128 2660ms 2630ms 3170ms 2750ms 2880ms 2740ms 3170ms 3170ms 2780ms 2780ms
CAMELLIA192 3430ms 3400ms 3930ms 3530ms 3650ms 3500ms 3940ms 3940ms 3570ms 3560ms
CAMELLIA256 3430ms 3390ms 3940ms 3500ms 3650ms 3510ms 3930ms 3940ms 3550ms 3550ms
SALSA20 1910ms 1900ms
Old (armhf/Cortex-A8):
Running each test 100 times.
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 9030ms 9100ms 10050ms 10410ms 9540ms 9360ms 10350ms 10190ms 9560ms 9570ms
3DES 14580ms 14460ms 15300ms 15720ms 14880ms 14900ms 15350ms 15330ms 15030ms 15020ms
CAST5 3560ms 3570ms 4350ms 2300ms 3860ms 2330ms 4340ms 4320ms 2330ms 2320ms
BLOWFISH 3320ms 3250ms 4110ms 2060ms 3610ms 2080ms 4100ms 4090ms 2070ms 2070ms
AES 2390ms 2530ms 2320ms 2460ms 2280ms 2270ms 2890ms 2880ms 2330ms 2330ms
AES192 2830ms 2970ms 2690ms 2900ms 2630ms 2650ms 3320ms 3330ms 2700ms 2690ms
AES256 3110ms 3250ms 3060ms 3170ms 3000ms 3000ms 3610ms 3610ms 3050ms 3060ms
TWOFISH 3080ms 3100ms 3600ms 2940ms 3290ms 2880ms 3560ms 3570ms 2940ms 2930ms
ARCFOUR 2060ms 2050ms
DES 6060ms 6230ms 6850ms 7540ms 6380ms 6400ms 6830ms 6840ms 6500ms 6510ms
TWOFISH128 3060ms 3110ms 3600ms 2940ms 3290ms 2890ms 3560ms 3560ms 2940ms 2930ms
SERPENT128 4820ms 4630ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4680ms
SERPENT192 4830ms 4620ms 5320ms 4460ms 5040ms 4620ms 5300ms 5300ms 4680ms 4680ms
SERPENT256 4820ms 4640ms 5330ms 4460ms 5030ms 4620ms 5300ms 5300ms 4680ms 4660ms
RFC2268_40 10260ms 7850ms 11080ms 9270ms 10620ms 10380ms 11250ms 11230ms 10690ms 10710ms
SEED 6580ms 6990ms 7710ms 8370ms 7140ms 7240ms 7600ms 7610ms 6870ms 6900ms
CAMELLIA128 2860ms 2820ms 3360ms 2750ms 3080ms 2740ms 3350ms 3360ms 2790ms 2790ms
CAMELLIA192 3710ms 3680ms 4240ms 3520ms 3910ms 3510ms 4200ms 4210ms 3560ms 3560ms
CAMELLIA256 3700ms 3680ms 4230ms 3520ms 3930ms 3510ms 4200ms 4210ms 3550ms 3560ms
SALSA20 1900ms 1900ms
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_ctr_enc)
(_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Change last
'vzeroupper' to 'vzeroall'.
* cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AVX2]: Remove register clearing with
'vzeroall'.
--
AVX2 implementation was already clearing upper halfs of YMM registers at end of
assembly functions to prevent long SSE<->AVX transition stalls present on Intel
CPUs. Patch changes these 'vzeroupper' instructions to 'vzeroall' to fully
clear YMM registers. After this change register clearing in serpent.c in not
needed.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_ctr_enc)
(_gcry_serpent_sse2_cbc_dec, _gcry_serpent_sse2_cfb_dec): Clear used
XMM registers.
cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
( _gcry_serpent_cfb_dec) [USE_SSE2]: Remove XMM register clearing from
bulk functions.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/blowfish-amd64.S: Enable only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined.
* cipher/camellia-aesni-avx-amd64.S: Ditto.
* cipher/camellia-aesni-avx2-amd64.S: Ditto.
* cipher/cast5-amd64.S: Ditto.
* cipher/rinjdael-amd64.S: Ditto.
* cipher/serpent-avx2-amd64.S: Ditto.
* cipher/serpent-sse2-amd64.S: Ditto.
* cipher/twofish-amd64.S: Ditto.
* cipher/blowfish.c: Use AMD64 assembly implementation only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
* configure.ac: Check gcc/as compatibility with AMD64 assembly
implementations.
--
Later these checks can be split and assembly implementations adapted to handle
different platforms, but for now disable AMD64 assembly implementations if
assembler does not look to be able to handle them.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'.
* cipher/serpent-avx2-amd64.S: New file.
* cipher/serpent.c (USE_AVX2): New macro.
(serpent_context_t) [USE_AVX2]: Add 'use_avx2'.
[USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec)
(_gcry_serpent_avx2_cfb_dec): New prototypes.
(serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware
and set 'use_avx2'.
(_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths are tested.
* configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'.
--
Add new AVX2 implementation of Serpent that processes 16 blocks in parallel.
Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 1.00x 1.00x 2.10x 1.00x 2.16x 1.01x 1.00x 2.16x 2.18x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc_128)
(_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed
functions from '<name>_128' to '<name>'.
(_gcry_selftest_helper_cbc, _gcry_selftest_helper_cfb)
(_gcry_selftest_helper_ctr): Make work with different block sizes.
* cipher/cipher-selftest.h (_gcry_selftest_helper_cbc_128)
(_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed
prototypes from '<name>_128' to '<name>'.
* cipher/camellia-glue.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
* cipher/rijndael.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
* cipher/serpent.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher.c (gcry_cipher_open): Add bulf CFB decryption function
for Serpent.
* cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_cfb_dec): New
function.
* cipher/serpent.c (_gcry_serpent_sse2_cfb_dec): New prototype.
(_gcry_serpent_cfb_dec) New function.
(selftest_cfb_128) New function.
(selftest) Call selftest_cfb_128.
* src/cipher.h (_gcry_serpent_cfb_dec): New prototype.
--
Patch makes Serpent-CFB decryption 4.0 times faster on Intel Sandy-Bridge and
2.7 times faster on AMD K10.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (serpent): Add 'serpent-sse2-amd64.lo'.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add
'serpent-sse2-amd64.S'.
* cipher/cipher.c (gcry_cipher_open) [USE_SERPENT]: Register bulk
functions for CBC-decryption and CTR-mode.
* cipher/serpent.c (USE_SSE2): New macro.
[USE_SSE2] (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec):
New prototypes to assembler functions.
(serpent_setkey): Set 'serpent_init_done' before calling serpent_test.
(_gcry_serpent_ctr_enc): New function.
(_gcry_serpent_cbc_dec): New function.
(selftest_ctr_128): New function.
(selftest_cbc_128): New function.
(selftest): Call selftest_ctr_128 and selftest_cbc_128.
* cipher/serpent-sse2-amd64.S: New file.
* src/cipher.h (_gcry_serpent_ctr_enc): New prototype.
(_gcry_serpent_cbc_dec): New prototype.
--
[v2]: Converted to SSE2, to support all amd64 processors (SSE2 is required
feature by AMD64 SysV ABI).
Patch adds word-sliced SSE2 implementation of Serpent for amd64 for speeding
up parallelizable workloads (CTR mode, CBC mode decryption). Implementation
processes eight blocks in parallel, with two four-block sets interleaved for
out-of-order scheduling.
Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 0.99x 1.00x 3.98x 1.00x 1.01x 1.00x 1.01x 4.04x 4.04x
Speed old vs. new on AMD Phenom II X6 1055T:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.02x 1.01x 1.00x 2.83x 1.00x 1.00x 1.00x 1.00x 2.72x 2.72x
Speed old vs. new on Intel Core2 Duo T8100:
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 1.02x 0.97x 4.02x 0.98x 1.01x 0.98x 1.00x 3.82x 3.91x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (SBOX0, SBOX1, SBOX2, SBOX3, SBOX4, SBOX5, SBOX6)
(SBOX7, SBOX0_INVERSE, SBOX1_INVERSE, SBOX2_INVERSE, SBOX3_INVERSE)
(SBOX4_INVERSE, SBOX5_INVERSE, SBOX6_INVERSE, SBOX7_INVERSE): Replace
with new definitions.
--
These new S-box definitions are from paper:
D. A. Osvik, “Speeding up Serpent,” in Third AES Candidate Conference,
(New York, New York, USA), p. 317–329, National Institute of Standards and
Technology, 2000. Available at http://www.ii.uib.no/~osvik/pub/aes3.ps.gz
Although these were optimized for two-operand instructions on i386 and for
old Pentium-1 processors, they are slightly faster on current processors
on i386 and x86-64. On ARM, the performance of these S-boxes is about the
same as with the old S-boxes.
new vs old speed ratios (AMD K10, x86-64):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.06x 1.02x 1.06x 1.02x 1.06x 1.06x 1.06x 1.05x 1.07x 1.07x
new vs old speed ratios (Intel Atom, i486):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.12x 1.15x 1.12x 1.15x 1.13x 1.11x 1.12x 1.12x 1.12x 1.13x
new vs old speed ratios (ARM Cortex A8):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
SERPENT128 1.04x 1.02x 1.02x 0.99x 1.02x 1.02x 1.03x 1.03x 1.01x 1.01x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/serpent.c (serpent_key_prepare): Fix misaligned access.
(serpent_setkey): Likewise.
(serpent_encrypt_internal): Likewise.
(serpent_decrypt_internal): Likewise.
(serpent_encrypt): Don't put an alignment-increasing cast.
(serpent_decrypt): Likewise.
(serpent_test): Likewise.
--
This is a port of the fix for the Libgcrypt code in GRUB:
http://bzr.savannah.gnu.org/lh/grub/trunk/grub/revision/3685
GRUB is FSF copyrighted and thus we can use this code without a DCO.
Note that the above fix was not correct and failed the selftests, thus
I fixed this fix.
GnuPG-bug-id: 1384
Signed-off-by: Werner Koch <wk@gnupg.org>
(cherry picked from commit 8eab66ad6852ec985bfb1e7fec35981d5e31148a)
|
|
|
|
| |
Check and install the standard git pre-commit hook.
|
|
|
|
|
| |
Ported some changes from 1.2 to here.
|
|
|
|
|
|
|
|
| |
(open_device): Set close-on-exit flags. Suggested by Max
Kellermann. Fixes Debian#403613.
Cleaned up last Makefile changes.
|
|
|
|
|
|
|
|
|
| |
* serpent.c: Updated from 1.2 branch:
s/u32_t/u32/ and s/byte_t/byte/. Too match what we have always
used and are using in all other files too
(serpent_test): Moved prototype out of a fucntion.
|
|
|
|
|
| |
outer scope.
|
|
|
|
|
|
|
| |
* serpent.c: Use "u32_t" instead of "unsigned long", do not
declare S-Box variables as "register". Fixes failure on
OpenBSD/sparc64, reported by Nikolay Sturm.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reformatted long lines. Don't include gcrypt-defs.h.
* ac.c (ac_key_identifiers): Made static.
* random.c (getfnc_gather_random,getfnc_fast_random_poll): Move
prototypes to ..
* rand-internal.h: .. here
* random.c (getfnc_gather_random): Include rndw32 gatherer.
* rndunix.c, rndw32.c, rndegd.c: Include them here.
* rndlinux.c (_gcry_rndlinux_gather_random): Prepend the _gcry_
prefix. Changed all callers.
* rndegd.c (_gcry_rndegd_gather_random): Likewise.
(_gcry_rndegd_connect_socket): Likewise.
* rndunix.c (_gcry_rndunix_gather_random): Likewise.
(waitpid): Made static.
* rndw32.c: Removed the old and unused winseed.dll cruft.
(_gcry_rndw32_gather_random_fast): Renamed from
gather_random_fast.
(_gcry_rndw32_gather_random): Renamed from gather_random. Note,
that the changes 2003-04-08 somehow got lost.
* sha512.c (sha512_init, sha384_init): Made static.
* cipher.c (do_ctr_decrypt): Removed "return" from this void
function.
* gcrypt.h (gcry_pk_testkey): Doc fix.
* libgcrypt.def: Manually wrote this file.
* build-def: This file should not be used anymore.
|
|
|
|
|
| |
* serpent.c: Fix an issue on big-endian systems.
|
|
* cipher.c: Add support for Serpent
* serpent.c: New file.
2003-08-10 Moritz Schulte <moritz@g10code.com>
* rsa.c (_gcry_rsa_blind, _gcry_rsa_unblind): Declare static.
|