Consolidate some regex OPS

The regular rexpression operation POSIXA works on any of the (currently) 16 posix classes (like \w and [:graph:]) under the regex modifier /a. This commit creates similar operations for the other modifiers: POSIXL (for /l), POSIXD (for /d), POSIXU (for /u), plus their complements. It causes these ops to be generated instead of the ALNUM, DIGIT, HORIZWS, SPACE, and VERTWS ops, as well as all their variants. The net saving is 22 regnode types. The reason to do this is for maintenance. As of this commit, there are now 22 fewer node types for which code has to be maintained. The code for each variant was essentially the same logic, but on different operands. It would be easy to make a change to one copy and forget to make the corresponding change in the others. Indeed, this patch fixes [perl #114272] in which one copy was out of sync with others. This patch actually reduces the number of separate code paths to 5: POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU. The complements of the last 3 use the same code path as their non-complemented version, except that a variable is initialized differently. The code then XORs this variable with its result to do the complementing or not. Further, the POSIXD branch now just checks if the target string being matched is UTF-8 or not, and then jumps to either the POSIXU or POSIXA code respectively. So, there are effectively only 4 cases that are coded: POSIXA, NPOSIXA, POSIXL, and POSIXU. (POSIXA doesn't have to worry about UTF-8, while NPOSIXA does, hence these for efficiency are coded separately.) Removing all this code saves memory. The output of the Linux size command shows that the perl executable was shrunk by 33K bytes on my platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2. The reason this patch was doable was previous work in numbering the POSIX classes, so that they could be indexed in arrays and bit positions. This is a large patch; I didn't see how to break it into smaller components. I chose to make this code more efficient as opposed to saving even more memory. Thus there is a separate loop that is jumped to after we know we have to load a swash; this just saves having to test if the swash is loaded each time through the loop. I avoid loading the swash until absolutely necessary. In places in the previous version of this code, the swash was loaded when the input was UTF-8, even if it wasn't yet needed (and might never be if the input didn't contain anything above Latin1); apparently to avoid the extra test per iteration. The Perl test suite runs slightly faster on my platform with this patch under -O0, and the speeds are indistinguishable under -O2. This is in spite of these new POSIX regops being unknown to the regex optimizer (this will be addressed in future commits), and extra machine instructions being required for each character (the xor, and some shifting and masking). I expect this is a result of better caching, and not loading swashes unless absolutely necessary.
author: Karl Williamson <public@khwilliamson.com> 2012-12-17 21:37:40 -0700
committer: Karl Williamson <public@khwilliamson.com> 2012-12-22 11:11:32 -0700
commit: 3018b823898645e44b8c37c70ac5c6302b031381 (patch)
tree: 0a26845e850bbc243726255ea67f9100c491d4ef /embed.fnc
parent: 7aee35ffd7ab21d1007b7bacdc860c9b48f32758 (diff)
download: perl-3018b823898645e44b8c37c70ac5c6302b031381.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/embed.fnc b/embed.fnc
index 5af5c97109..2a5b2b30fc 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2028,6 +2028,7 @@ Es	|U8	|regtail_study	|NN struct RExC_state_t *pRExC_state \
 
 #if defined(PERL_IN_REGEXEC_C)
 ERs	|bool	|isFOO_lc	|const U8 classnum|const U8 character
+ERs	|bool	|isFOO_utf8_lc	|const U8 classnum|NN const U8* character
 ERs	|I32	|regmatch	|NN regmatch_info *reginfo|NN char *startpos|NN regnode *prog
 ERs	|I32	|regrepeat	|NN const regexp *prog|NN char **startposp|NN const regnode *p|I32 max|int depth
 ERs	|I32	|regtry		|NN regmatch_info *reginfo|NN char **startposp
author	Karl Williamson <public@khwilliamson.com>	2012-12-17 21:37:40 -0700
committer	Karl Williamson <public@khwilliamson.com>	2012-12-22 11:11:32 -0700
commit	3018b823898645e44b8c37c70ac5c6302b031381 (patch)
tree	0a26845e850bbc243726255ea67f9100c491d4ef /embed.fnc
parent	7aee35ffd7ab21d1007b7bacdc860c9b48f32758 (diff)
download	perl-3018b823898645e44b8c37c70ac5c6302b031381.tar.gz