diff options
author | Karl Williamson <khw@cpan.org> | 2017-11-15 10:19:33 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2017-11-23 14:18:51 -0700 |
commit | e17544a60909ed9555c0dad7cd24afc40eb736e7 (patch) | |
tree | 3e49108314dd819ad6880ebaeb4640c0e8b3494d /doop.c | |
parent | 46a08a6f3bc2ec1482773059c74749f47b161b01 (diff) | |
download | perl-e17544a60909ed9555c0dad7cd24afc40eb736e7.tar.gz |
Search for UTF-8 invariants by word
The functions is_utf8_invariant_string() and
is_utf8_invariant_string_loc() are used in several places in the core
and are part of the public API. This commit speeds them up
significantly on ASCII (not EBCDIC) platforms, by changing to use
word-at-a-time parsing instead of per-byte. (Per-byte is retained for
any initial bytes to reach the next word boundary, and any final bytes
that don't fill an entire word.)
The following results were obtained parsing a long string on a 64-bit
word machine:
byte word
------ ------
Ir 100.00 665.35
Dr 100.00 797.03
Dw 100.00 102.12
COND 100.00 799.27
IND 100.00 97.56
COND_m 100.00 144.83
IND_m 100.00 75.00
Ir_m1 100.00 100.00
Dr_m1 100.00 100.02
Dw_m1 100.00 104.12
Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00
100% is baseline; numbers larger than that are improvements. The COND
measurement indicates, for example, that there 1/8 as many conditional
branches in the word-at-a-time version.
Diffstat (limited to 'doop.c')
0 files changed, 0 insertions, 0 deletions