summaryrefslogtreecommitdiff
path: root/doop.c
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2017-11-15 10:19:33 -0700
committerKarl Williamson <khw@cpan.org>2017-11-23 14:18:51 -0700
commite17544a60909ed9555c0dad7cd24afc40eb736e7 (patch)
tree3e49108314dd819ad6880ebaeb4640c0e8b3494d /doop.c
parent46a08a6f3bc2ec1482773059c74749f47b161b01 (diff)
downloadperl-e17544a60909ed9555c0dad7cd24afc40eb736e7.tar.gz
Search for UTF-8 invariants by word
The functions is_utf8_invariant_string() and is_utf8_invariant_string_loc() are used in several places in the core and are part of the public API. This commit speeds them up significantly on ASCII (not EBCDIC) platforms, by changing to use word-at-a-time parsing instead of per-byte. (Per-byte is retained for any initial bytes to reach the next word boundary, and any final bytes that don't fill an entire word.) The following results were obtained parsing a long string on a 64-bit word machine: byte word ------ ------ Ir 100.00 665.35 Dr 100.00 797.03 Dw 100.00 102.12 COND 100.00 799.27 IND 100.00 97.56 COND_m 100.00 144.83 IND_m 100.00 75.00 Ir_m1 100.00 100.00 Dr_m1 100.00 100.02 Dw_m1 100.00 104.12 Ir_mm 100.00 100.00 Dr_mm 100.00 100.00 Dw_mm 100.00 100.00 100% is baseline; numbers larger than that are improvements. The COND measurement indicates, for example, that there 1/8 as many conditional branches in the word-at-a-time version.
Diffstat (limited to 'doop.c')
0 files changed, 0 insertions, 0 deletions