| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_cpy): Skip memcpy if length is zero.
* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer copy loops with buf_cpy call.
* cipher/cipher-cmac.c (_gcry_cmac_write): Ditto.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS)
(bufhelp_int_s, buf_xor_1): Remove.
(buf_cpy, buf_xor, buf_xor_2dst, buf_xor_n_copy_2): Use
buf_put/buf_get helpers to handle unaligned memory accesses.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_get_he32, buf_put_he32, buf_get_he64)
(buf_put_he64): New.
* cipher/cipher-internal.h (cipher_block_cpy, cipher_block_xor)
(cipher_block_xor_1, cipher_block_xor_2dst, cipher_block_xor_n_copy_2)
(cipher_block_xor_n_copy): New.
* cipher/cipher-gcm-intel-pclmul.c
(_gcry_ghash_setup_intel_pclmul): Use assembly for swapping endianness
instead of buf_get_be64 and buf_cpy.
* cipher/blowfish.c: Use new cipher_block_* functions for cipher block
sized buf_cpy/xor* operations.
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/cipher-aeswrap.c: Ditto.
* cipher/cipher-cbc.c: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-cfb.c: Ditto.
* cipher/cipher-cmac.c: Ditto.
* cipher/cipher-ctr.c: Ditto.
* cipher/cipher-eax.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-ocb.c: Ditto.
* cipher/cipher-ofb.c: Ditto.
* cipher/cipher-xts.c: Ditto.
* cipher/des.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
--
This commit adds size-optimized functions for copying and xoring
cipher block sized buffers. These functions also allow GCC to use
inline auto-vectorization for block cipher copying and xoring on
higher optimization levels.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_eq_const): Rewrite logic.
--
New implementation for constant-time buffer comparing that
avoids generating conditional code in comparison loop.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/g10lib.h (LIKELY, UNLIKELY): New.
(gcry_assert): Use LIKELY for assert check.
(fast_wipememory2_unaligned_head): Use UNLIKELY for unaligned
branching.
* cipher/bufhelp.h (buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst)
(buf_xor_n_copy_2): Ditto.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (BUFHELP_UNALIGNED_ACCESS): New, defined
if attributes 'packed', 'aligned' and 'may_alias' are supported.
(BUFHELP_FAST_UNALIGNED_ACCESS): Define if have
BUFHELP_UNALIGNED_ACCESS.
--
Now that compiler is properly told that reads from these types
may do not follow strict-aliasing and may be unaligned, we
enable use of these for all architectures and compiler will
emit more optimized, yet correct, code (for example, use
special unaligned read/write instructions instead of accessing
byte-by-byte).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h [!BUFHELP_FAST_UNALIGNED_ACCESS]
(bufhelp_int_t): Add 'may_alias' attribute.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (gcry_cv_gcc_attribute_may_alias)
(HAVE_GCC_ATTRIBUTE_MAY_ALIAS): New check for 'may_alias' attribute.
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only if
HAVE_GCC_ATTRIBUTE_MAY_ALIAS is defined.
[BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_int_t, bufhelp_u32_t)
(bufhelp_u64_t): Add 'may_alias' attribute.
* src/g10lib.h (fast_wipememory_t): Add HAVE_GCC_ATTRIBUTE_MAY_ALIAS
defined check; Add 'may_alias' attribute.
--
Attribute 'may_alias' was missing from bufhelp unaligned memory access
pointer types, and was causing problems with newer GCC versions (with
more aggressive optimization). This patch fixes broken Camellia-CFB
with '-O3 -flto' flags with GCC-6 on x86-64 and generic GCM with
default '-O2' on x32.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* configure.ac (available_digests_64): Merge with available_digests.
(available_kdfs_64): Merge with available_kdfs.
<64 bit datatype test>: Bail out if no such type is available.
* src/types.h: Emit #error if no u64 can be defined.
(PROPERLY_ALIGNED_TYPE): Always add u64 type.
* cipher/bithelp.h: Remove all code paths which handle the
case of !HAVE_U64_TYPEDEF.
* cipher/bufhelp.h: Ditto.
* cipher/cipher-ccm.c: Ditto.
* cipher/cipher-gcm.c: Ditto.
* cipher/cipher-internal.h: Ditto.
* cipher/cipher.c: Ditto.
* cipher/hash-common.h: Ditto.
* cipher/md.c: Ditto.
* cipher/poly1305.c: Ditto.
* cipher/scrypt.c: Ditto.
* cipher/tiger.c: Ditto.
* src/g10lib.h: Ditto.
* tests/basic.c: Ditto.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
--
Given that SHA-2 and some other algorithms require a 64 bit type it
does not make anymore sense to conditionally compile some part when
the platform does not provide such a type.
GnuPG-bug-id: 1815.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Disable for
__powerpc__ and __powerpc64__.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_xor_1): Increment source pointer at tail
handling.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS): Enable only when
HAVE_GCC_ATTRIBUTE_PACKED and HAVE_GCC_ATTRIBUTE_ALIGNED are defined.
(bufhelp_int_t): New type.
(buf_cpy, buf_xor, buf_xor_1, buf_xor_2dst, buf_xor_n_copy_2): Use
'bufhelp_int_t'.
[BUFHELP_FAST_UNALIGNED_ACCESS] (bufhelp_u32_t, bufhelp_u64_t): New.
[BUFHELP_FAST_UNALIGNED_ACCESS] (buf_get_be32, buf_get_le32)
(buf_put_be32, buf_put_le32, buf_get_be64, buf_get_le64)
(buf_put_be64, buf_put_le64): Use 'bufhelp_uXX_t'.
* configure.ac (gcry_cv_gcc_attribute_packed): New.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h: Move include for uintptr_t to ...
* src/types.h: here. Check that config.h has been included.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/cipher-ocb.c: New.
* cipher/Makefile.am (libcipher_la_SOURCES): Add cipher-ocb.c
* cipher/cipher-internal.h (OCB_BLOCK_LEN, OCB_L_TABLE_SIZE): New.
(gcry_cipher_handle): Add fields marks.finalize and u_mode.ocb.
* cipher/cipher.c (_gcry_cipher_open_internal): Add OCB mode.
(_gcry_cipher_open_internal): Setup default taglen of OCB.
(cipher_reset): Clear OCB specific data.
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Call OCB functions.
(_gcry_cipher_setiv): Add OCB specific nonce setting.
(_gcry_cipher_ctl): Add GCRYCTL_FINALIZE and GCRYCTL_SET_TAGLEN
* src/gcrypt.h.in (GCRYCTL_SET_TAGLEN): New.
(gcry_cipher_final): New.
* cipher/bufhelp.h (buf_xor_1): New.
* tests/basic.c (hex2buffer): New.
(check_ocb_cipher): New.
(main): Call it here. Add option --cipher-modes.
* tests/bench-slope.c (bench_aead_encrypt_do_bench): Call
gcry_cipher_final.
(bench_aead_decrypt_do_bench): Ditto.
(bench_aead_authenticate_do_bench): Ditto. Check error code.
(bench_ocb_encrypt_do_bench): New.
(bench_ocb_decrypt_do_bench): New.
(bench_ocb_authenticate_do_bench): New.
(ocb_encrypt_ops): New.
(ocb_decrypt_ops): New.
(ocb_authenticate_ops): New.
(cipher_modes): Add them.
(cipher_bench_one): Skip wrong block length for OCB.
* tests/benchmark.c (cipher_bench): Add field noncelen to MODES. Add
OCB support.
--
See the comments on top of cipher/cipher-ocb.c for the patent status
of the OCB mode.
The implementation has not yet been optimized and as such is not faster
that the other AEAD modes. A first candidate for optimization is the
double_block function. Large improvements can be expected by writing
an AES ECB function to work on multiple blocks.
Signed-off-by: Werner Koch <wk@gnupg.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cipher/blowfish-amd64.S: Change utf-8 encoded copyright character to
'(C)'.
cipher/blowfish-arm.S: Ditto.
cipher/bufhelp.h: Ditto.
cipher/camellia-aesni-avx-amd64.S: Ditto.
cipher/camellia-aesni-avx2-amd64.S: Ditto.
cipher/camellia-arm.S: Ditto.
cipher/cast5-amd64.S: Ditto.
cipher/cast5-arm.S: Ditto.
cipher/cipher-ccm.c: Ditto.
cipher/cipher-cmac.c: Ditto.
cipher/cipher-gcm.c: Ditto.
cipher/cipher-selftest.c: Ditto.
cipher/cipher-selftest.h: Ditto.
cipher/mac-cmac.c: Ditto.
cipher/mac-gmac.c: Ditto.
cipher/mac-hmac.c: Ditto.
cipher/mac-internal.h: Ditto.
cipher/mac.c: Ditto.
cipher/rijndael-amd64.S: Ditto.
cipher/rijndael-arm.S: Ditto.
cipher/salsa20-amd64.S: Ditto.
cipher/salsa20-armv7-neon.S: Ditto.
cipher/serpent-armv7-neon.S: Ditto.
cipher/serpent-avx2-amd64.S: Ditto.
cipher/serpent-sse2-amd64.S: Ditto.
--
Avoid use of '©' for easier parsing of source for copyright information.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/Makefile.am: Add 'mac.c', 'mac-internal.h' and 'mac-hmac.c'.
* cipher/bufhelp.h (buf_eq_const): New.
* cipher/cipher-ccm.c (_gcry_cipher_ccm_tag): Use 'buf_eq_const' for
constant-time compare.
* cipher/mac-hmac.c: New.
* cipher/mac-internal.h: New.
* cipher/mac.c: New.
* doc/gcrypt.texi: Add documentation for MAC API.
* src/gcrypt-int.h [GPG_ERROR_VERSION_NUMBER < 1.13]
(GPG_ERR_MAC_ALGO): New.
* src/gcrypt.h.in (gcry_mac_handle, gcry_mac_hd_t, gcry_mac_algos)
(gcry_mac_flags, gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name)
(gcry_mac_reset, gcry_mac_test_algo): New.
* src/libgcrypt.def (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/libgcrypt.vers (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.c (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.h (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* tests/basic.c (check_one_mac, check_mac): New.
(main): Call 'check_mac'.
* tests/bench-slope.c (bench_print_header, bench_print_footer): Allow
variable algorithm name width.
(_cipher_bench, hash_bench): Update to above change.
(bench_hash_do_bench): Add 'gcry_md_reset'.
(bench_mac_mode, bench_mac_init, bench_mac_free, bench_mac_do_bench)
(mac_ops, mac_modes, mac_bench_one, _mac_bench, mac_bench): New.
(main): Add 'mac' benchmark options.
* tests/benchmark.c (mac_repetitions, mac_bench): New.
(main): Add 'mac' benchmark options.
--
Add MAC API, with HMAC algorithms. Internally uses HMAC functionality of the
MD module.
[v2]:
- Add documentation for MAC API.
- Change length argument for gcry_mac_read from size_t to size_t* for
returning number of written bytes.
[v3]:
- HMAC algorithm ids start from 101.
- Fix coding style for new files.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_cpy): New.
(buf_xor, buf_xor_2dst): If buffers unaligned, always jump to per-byte
processing.
(buf_xor_n_copy_2): New.
(buf_xor_n_copy): Use 'buf_xor_n_copy_2'.
* cipher/blowfish.c (_gcry_blowfish_cbc_dec): Avoid extra memory copy
and use new 'buf_xor_n_copy_2'.
* cipher/camellia-glue.c (_gcry_camellia_cbc_dec): Ditto.
* cipher/cast5.c (_gcry_cast_cbc_dec): Ditto.
* cipher/serpent.c (_gcry_serpent_cbc_dec): Ditto.
* cipher/twofish.c (_gcry_twofish_cbc_dec): Ditto.
* cipher/rijndael.c (_gcry_aes_cbc_dec): Ditto.
(do_encrypt, do_decrypt): Use 'buf_cpy' instead of 'memcpy'.
(_gcry_aes_cbc_enc): Avoid copying IV, use 'last_iv' pointer instead.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt): Avoid copying IV,
update pointer to IV instead.
(_gcry_cipher_cbc_decrypt): Avoid extra memory copy and use new
'buf_xor_n_copy_2'.
(_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt): Avoid extra
accesses to c->spec, use 'buf_cpy' instead of memcpy.
* cipher/cipher-ccm.c (do_cbc_mac): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
* cipher/cipher.c (do_ecb_encrypt, do_ecb_decrypt): Ditto.
--
Patch improves the speed of the generic block cipher mode code. Especially on
targets without faster unaligned memory accesses, the generic code was slower
than the algorithm specific bulk versions. With this patch, this issue should
be solved.
Tests on Cortex-A8; compiled for ARMv4, without unaligned-accesses:
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 490ms 500ms 560ms 580ms 530ms 540ms 560ms 560ms 550ms 540ms 1080ms 1080ms
TWOFISH 230ms 230ms 290ms 300ms 260ms 240ms 290ms 290ms 240ms 240ms 520ms 510ms
DES 720ms 720ms 800ms 860ms 770ms 770ms 810ms 820ms 770ms 780ms - -
CAST5 340ms 340ms 440ms 250ms 390ms 250ms 440ms 430ms 260ms 250ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 500ms 490ms 520ms 520ms 530ms 520ms 530ms 540ms 500ms 520ms 1060ms 1070ms
TWOFISH 230ms 220ms 250ms 230ms 260ms 230ms 260ms 260ms 230ms 230ms 500ms 490ms
DES 720ms 720ms 750ms 760ms 740ms 750ms 770ms 770ms 760ms 760ms - -
CAST5 340ms 340ms 370ms 250ms 370ms 250ms 380ms 390ms 250ms 250ms - -
Tests on Cortex-A8; compiled for ARMv7-A, with unaligned-accesses:
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 430ms 440ms 480ms 530ms 470ms 460ms 490ms 480ms 470ms 460ms 930ms 940ms
TWOFISH 220ms 220ms 250ms 230ms 240ms 230ms 270ms 250ms 230ms 240ms 480ms 470ms
DES 550ms 540ms 620ms 690ms 570ms 540ms 630ms 650ms 590ms 580ms - -
CAST5 300ms 300ms 380ms 230ms 330ms 230ms 380ms 370ms 230ms 230ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 430ms 430ms 460ms 450ms 460ms 450ms 470ms 470ms 460ms 470ms 900ms 930ms
TWOFISH 220ms 210ms 240ms 230ms 230ms 230ms 250ms 250ms 230ms 230ms 470ms 470ms
DES 540ms 540ms 580ms 570ms 570ms 570ms 560ms 620ms 580ms 570ms - -
CAST5 300ms 290ms 310ms 230ms 320ms 230ms 350ms 350ms 230ms 230ms - -
Tests on Intel Atom N160 (i386):
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 380ms 380ms 410ms 420ms 400ms 400ms 410ms 410ms 390ms 400ms 820ms 800ms
TWOFISH 340ms 340ms 370ms 350ms 360ms 340ms 370ms 370ms 330ms 340ms 710ms 700ms
DES 660ms 650ms 710ms 740ms 680ms 700ms 700ms 710ms 680ms 680ms - -
CAST5 340ms 340ms 380ms 330ms 360ms 330ms 390ms 390ms 320ms 330ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 380ms 380ms 390ms 410ms 400ms 390ms 410ms 400ms 400ms 390ms 810ms 800ms
TWOFISH 330ms 340ms 350ms 360ms 350ms 340ms 380ms 370ms 340ms 360ms 700ms 710ms
DES 630ms 640ms 660ms 690ms 680ms 680ms 700ms 690ms 680ms 680ms - -
CAST5 340ms 330ms 350ms 330ms 370ms 340ms 380ms 390ms 330ms 330ms - -
Tests in Intel i5-4570 (x86-64):
Before:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 560ms 560ms 600ms 590ms 600ms 570ms 570ms 570ms 580ms 590ms 1200ms 1180ms
TWOFISH 240ms 240ms 270ms 160ms 260ms 160ms 250ms 250ms 160ms 160ms 430ms 430ms
DES 570ms 570ms 640ms 590ms 630ms 580ms 600ms 600ms 610ms 620ms - -
CAST5 410ms 410ms 470ms 150ms 470ms 150ms 450ms 450ms 150ms 160ms - -
After:
ECB/Stream CBC CFB OFB CTR CCM
--------------- --------------- --------------- --------------- --------------- ---------------
SEED 560ms 560ms 590ms 570ms 580ms 570ms 570ms 570ms 590ms 590ms 1200ms 1200ms
TWOFISH 240ms 240ms 260ms 160ms 250ms 170ms 250ms 250ms 160ms 160ms 430ms 430ms
DES 570ms 570ms 620ms 580ms 630ms 570ms 600ms 590ms 620ms 620ms - -
CAST5 410ms 410ms 460ms 150ms 460ms 160ms 450ms 450ms 150ms 150ms - -
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h [__aarch64__] (BUFHELP_FAST_UNALIGNED_ACCESS): Set
macro on AArch64.
--
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h [__powerpc__] (BUFHELP_FAST_UNALIGNED_ACCESS): Set
macro enabled.
[__powerpc64__] (BUFHELP_FAST_UNALIGNED_ACCESS): Ditto.
--
PowerPC can handle unaligned memory accesses fast, so enable fast
buffer handling in bufhelp.h.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bithelp.h (bswap32, bswap64, le_bswap32, be_bswap32)
(le_bswap64, be_bswap64): New.
* cipher/bufhelp.h (buf_get_be32, buf_get_le32, buf_put_le32)
(buf_put_be32, buf_get_be64, buf_get_le64, buf_put_be64)
(buf_put_le64): New.
* cipher/blowfish.c (do_encrypt_block, do_decrypt_block): Use new
endian conversion helpers.
(do_bf_setkey): Turn endian specific code to generic.
* cipher/camellia.c (GETU32, PUTU32): Use new endian conversion
helpers.
* cipher/cast5.c (rol): Remove, use rol from bithelp.
(F1, F2, F3): Fix to use rol from bithelp.
(do_encrypt_block, do_decrypt_block, do_cast_setkey): Use new endian
conversion helpers.
* cipher/des.c (READ_64BIT_DATA, WRITE_64BIT_DATA): Ditto.
* cipher/md4.c (transform, md4_final): Ditto.
* cipher/md5.c (transform, md5_final): Ditto.
* cipher/rmd160.c (transform, rmd160_final): Ditto.
* cipher/salsa20.c (LE_SWAP32, LE_READ_UINT32): Ditto.
* cipher/scrypt.c (READ_UINT64, LE_READ_UINT64, LE_SWAP32): Ditto.
* cipher/seed.c (GETU32, PUTU32): Ditto.
* cipher/serpent.c (byte_swap_32): Remove.
(serpent_key_prepare, serpent_encrypt_internal)
(serpent_decrypt_internal): Use new endian conversion helpers.
* cipher/sha1.c (transform, sha1_final): Ditto.
* cipher/sha256.c (transform, sha256_final): Ditto.
* cipher/sha512.c (__transform, sha512_final): Ditto.
* cipher/stribog.c (transform, stribog_final): Ditto.
* cipher/tiger.c (transform, tiger_final): Ditto.
* cipher/twofish.c (INPACK, OUTUNPACK): Ditto.
* cipher/whirlpool.c (buffer_to_block, block_to_buffer): Ditto.
* configure.ac (gcry_cv_have_builtin_bswap32): Check for compiler
provided __builtin_bswap32.
(gcry_cv_have_builtin_bswap64): Check for compiler provided
__builtin_bswap64.
--
Patch add helper functions that provide conversions to/from integers and
buffers of different endianess. Benefits are code cleanup and optimization
for architectures that have byte-swaping instructions and/or can do fast
unaligned memory accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h [__arm__ && __ARM_FEATURE_UNALIGNED]: Enable
BUFHELP_FAST_UNALIGNED_ACCESS.
--
Newer ARM systems support unaligned memory accesses and on gcc-4.7 and onwards
this is identified by __ARM_FEATURE_UNALIGNED macro.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h (buf_xor, buf_xor_2dst, buf_xor_n_copy): Cast
to larger element pointer through (void *) to suppress -Wcast-error.
--
Patch disables bogus warnings caused by -Wcast-error. We know that byte
pointers are properly aligned at these phases, or that hardware can handle
unaligned accesses.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
|
|
|
|
|
|
|
|
| |
* cipher/bufhelp.h [HAVE_INTTYPES_H]: Include inttypes.h
--
According to the description of AC_TYPE_UINTPTR_T, this header should
also be included.
|
|
* cipher/Makefile.am (libcipher_la_SOURCES): Add 'bufhelp.h'.
* cipher/bufhelp.h: New.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Use 'buf_xor' for buffer xoring.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Use 'buf_xor' for buffer xoring and remove
resulting unused variables.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt) Use 'buf_xor_2dst'
for buffer xoring and remove resulting unused variables.
(_gcry_cipher_cfb_decrypt): Use 'buf_xor_n_copy' for buffer xoring and
remove resulting unused variables.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Use 'buf_xor' for
buffer xoring and remove resulting unused variables.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Use 'buf_xor' for buffer xoring and remove
resulting used variables.
* cipher/rijndael.c (_gry_aes_cfb_enc): Use 'buf_xor_2dst' for buffer
xoring and remove resulting unused variables.
(_gry_aes_cfb_dev): Use 'buf_xor_n_copy' for buffer xoring and remove
resulting unused variables.
(_gry_aes_cbc_enc, _gry_aes_ctr_enc, _gry_aes_cbc_dec): Use 'buf_xor'
for buffer xoring and remove resulting unused variables.
--
Add faster helper functions for buffer xoring and replace byte buffer xor
loops. This give following speed up. Note that CTR speed up is from refactoring
code to use buf_xor() and removal of integer division/modulo operations issued
per each processed byte. This removal of div/mod most likely gives even greater
speed increase on CPU architechtures that do not have hardware division unit.
Benchmark ratios (old-vs-new, AMD Phenom II, x86-64):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 0.99x 1.01x 1.06x 1.02x 1.03x 1.06x 1.04x 1.02x 1.58x 1.58x
3DES 1.00x 1.00x 1.01x 1.01x 1.02x 1.02x 1.02x 1.01x 1.22x 1.23x
CAST5 0.98x 1.00x 1.09x 1.03x 1.09x 1.09x 1.07x 1.07x 1.98x 1.95x
BLOWFISH 1.00x 1.00x 1.18x 1.05x 1.07x 1.07x 1.05x 1.05x 1.93x 1.91x
AES 1.00x 0.98x 1.18x 1.14x 1.13x 1.13x 1.14x 1.14x 1.18x 1.18x
AES192 0.98x 1.00x 1.13x 1.14x 1.13x 1.10x 1.14x 1.16x 1.15x 1.15x
AES256 0.97x 1.02x 1.09x 1.13x 1.13x 1.09x 1.10x 1.14x 1.11x 1.13x
TWOFISH 1.00x 1.00x 1.15x 1.17x 1.18x 1.16x 1.18x 1.13x 2.37x 2.31x
ARCFOUR 1.03x 0.97x
DES 1.01x 1.00x 1.04x 1.04x 1.04x 1.05x 1.05x 1.02x 1.56x 1.55x
TWOFISH128 0.97x 1.03x 1.18x 1.17x 1.18x 1.15x 1.15x 1.15x 2.37x 2.31x
SERPENT128 1.00x 1.00x 1.10x 1.11x 1.08x 1.09x 1.08x 1.06x 1.66x 1.67x
SERPENT192 1.00x 1.00x 1.07x 1.08x 1.08x 1.09x 1.08x 1.08x 1.65x 1.66x
SERPENT256 1.00x 1.00x 1.09x 1.09x 1.08x 1.09x 1.08x 1.06x 1.66x 1.67x
RFC2268_40 1.03x 0.99x 1.05x 1.02x 1.03x 1.03x 1.04x 1.03x 1.46x 1.46x
SEED 1.00x 1.00x 1.10x 1.10x 1.09x 1.09x 1.10x 1.07x 1.80x 1.76x
CAMELLIA128 1.00x 1.00x 1.23x 1.12x 1.15x 1.17x 1.15x 1.12x 2.15x 2.13x
CAMELLIA192 1.05x 1.03x 1.23x 1.21x 1.21x 1.16x 1.12x 1.25x 1.90x 1.90x
CAMELLIA256 1.03x 1.07x 1.10x 1.19x 1.08x 1.14x 1.12x 1.10x 1.90x 1.92x
Benchmark ratios (old-vs-new, AMD Phenom II, i386):
ECB/Stream CBC CFB OFB CTR
--------------- --------------- --------------- --------------- ---------------
IDEA 1.00x 1.00x 1.04x 1.05x 1.04x 1.02x 1.02x 1.02x 1.38x 1.40x
3DES 1.01x 1.00x 1.02x 1.04x 1.03x 1.01x 1.00x 1.02x 1.20x 1.20x
CAST5 1.00x 1.00x 1.03x 1.09x 1.07x 1.04x 1.13x 1.00x 1.74x 1.74x
BLOWFISH 1.04x 1.08x 1.03x 1.13x 1.07x 1.12x 1.03x 1.00x 1.78x 1.74x
AES 0.96x 1.00x 1.09x 1.08x 1.14x 1.13x 1.07x 1.03x 1.14x 1.09x
AES192 1.00x 1.03x 1.07x 1.03x 1.07x 1.07x 1.06x 1.03x 1.08x 1.11x
AES256 1.00x 1.00x 1.06x 1.06x 1.10x 1.06x 1.05x 1.03x 1.10x 1.10x
TWOFISH 0.95x 1.10x 1.13x 1.23x 1.05x 1.14x 1.09x 1.13x 1.95x 1.86x
ARCFOUR 1.00x 1.00x
DES 1.02x 0.98x 1.04x 1.04x 1.05x 1.02x 1.04x 1.00x 1.45x 1.48x
TWOFISH128 0.95x 1.10x 1.26x 1.19x 1.09x 1.14x 1.17x 1.00x 2.00x 1.91x
SERPENT128 1.02x 1.00x 1.08x 1.04x 1.10x 1.06x 1.08x 1.04x 1.42x 1.42x
SERPENT192 1.02x 1.02x 1.06x 1.06x 1.10x 1.08x 1.04x 1.06x 1.42x 1.42x
SERPENT256 1.02x 0.98x 1.06x 1.06x 1.10x 1.06x 1.04x 1.06x 1.42x 1.40x
RFC2268_40 1.00x 1.00x 1.02x 1.06x 1.04x 1.02x 1.02x 1.02x 1.35x 1.35x
SEED 1.00x 0.97x 1.11x 1.05x 1.06x 1.08x 1.08x 1.05x 1.56x 1.57x
CAMELLIA128 1.03x 0.97x 1.12x 1.14x 1.06x 1.10x 1.06x 1.06x 1.73x 1.59x
CAMELLIA192 1.06x 1.00x 1.13x 1.10x 1.11x 1.11x 1.15x 1.08x 1.57x 1.58x
CAMELLIA256 1.06x 1.03x 1.10x 1.10x 1.11x 1.11x 1.13x 1.08x 1.57x 1.62x
[v2]:
- include stdint.h only when it's available
- use uintptr_t instead of long and intptr_t
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
|