summaryrefslogtreecommitdiff
path: root/regcomp.h
Commit message (Collapse)AuthorAgeFilesLines
* regcomp.c: Fix more alignment problemsKarl Williamson2014-02-191-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | | I believe this will fix the remaining alignment problems recently being shown on gcc on HP-UX, It works on the procura machine. regnodes should not have stricter alignment than required by U32, for reasons given in the comments this commit adds to the beginning of regcomp.h. Commit 31f05a37 added a new ANYOF regnode struct with a pointer field. This requires stricter alignment on some 64-bit platforms, and hence doesn't work on those platforms. This commit removes that regnode struct type, and instead stores the pointer it used via a more indirect, but already existing mechanism that stores other data.. The function that returns that other data is enlarged to return this new field as well. It now needs to be called from regcomp.c, so the previous commit had renamed and made it accessible from there. The "public" function that wraps this one is unchanged. (I put "public" in quotes here, because I don't think anyone outside core is or should be using it, but since it has been publicly available for a long time, I'm treating the API as unchangeable. regcomp.c called this public function before this commit, but needs the additional data returned by the inner one).
* regcomp.h: Allow compiler to perform calculationKarl Williamson2014-02-191-1/+1
| | | | | | | | Instead of doing the calculation of how many bytes a 256 bitmap occupies, let the compiler do it. I believe we are not too far away from having the ability to allow applications to recompile Perl to increase the bitmap size trading speed for memory. ICU has an 8192 bitmap last time I checked.
* Change method of passing some info from regcomp to regexecKarl Williamson2014-02-191-14/+6
| | | | | | | | | | | | | | For the last several releases, the fact that an ANYOF node could match something outside its bitmap has been passed to regexec.c by having its ARG field not be -1 (appropriately cast). A bit was set if the match could occur even if the target string was not UTF-8 encoded. This design was used to save a bit, as previously there was a bit also for it matching UTF-8 strings. That design is no longer tenable, as a future commit will have a third (independent) reason for something to match outside the bitmap, This commits uses the current spare bit flag to indicate if the match can only occur if the target string is UTF-8.
* regcomp.h: Remove extraneous commentKarl Williamson2014-02-191-7/+0
| | | | | This is obsolete and is a partial copy of the up-to-date comment below it.
* regcomp.h: Free up flag bit in ANYOF nodesKarl Williamson2014-02-191-10/+8
| | | | The ANYOF_LOC bit was removed from final use in the previous commit.
* regexes: Remove uses of ANYOF_LOCALE flagKarl Williamson2014-02-191-4/+2
| | | | | | | | | | | | | This flag no longer adds any useful information and can be removed. An ANYOF node that depends on locale either matches a POSIX class like /d, or matches case insensitively, or both. There are flags for both these cases, and to see if something matches locale, one merely needs to see if either flag is set. Not having to keep track of this extra flag simplifies things, and will allow it to be removed. There was a time when this flag was shared with one of the remaining locale ones, and there was relict code that allowed that sharing to be reinstated, and which this commit also removes.
* regcomp.c: Simplify /l Synthetic Start Class constructionKarl Williamson2014-02-191-3/+12
| | | | | | | | | | | | | | | The ANYOF_POSIXL flag is needed in general for ANYOF nodes to indicate if the struct contains an extra U32 element used to hold the list of POSIX classes (like \w and [:punct:]) whose matches depend on the locale in effect at the time of runtime pattern matching. But the SSC always contains this U32, and so doesn't need to use the flag. Instead, if there aren't any such classes, the U32 will be zero. Removing keeping track of this flag during the assembly of the SSC simplifies things. At the completion of this process, this flag is set if the U32 is non-zero to pass that information on to regexec.c so that it doesn't have to special case things.
* Revert "Free up bit for regex ANYOF nodes"Karl Williamson2014-02-151-5/+21
| | | | | This reverts commit 34fdef848b1687b91892ba55e9e0c3430e0770f6, and adds comments referring to it, in case it is ever needed.
* Free up bit for regex ANYOF nodesKarl Williamson2014-02-151-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit frees up a bit by using an extra regnode to pass the information to the regex engine instead of the flag. I originally thought that if this was needed, it should be the ANYOF_ABOVE_LATIN1_ALL bit, as that might speed some things up. But if we need to do this again by adding another node to get another bit, we want one that is mutually exclusive of the first one we did, For otherwise we start having to make 3 nodes instead of two to get the combinations: 1 0 0 1 1 1 This combinatorial problem is avoided by using bits that are mutually exclusive, which the ABOVE_LATIN1_ALL isn't, but the one freed by this commit ANYOF_NON_UTF8_NON_ASCII_ALL is only set under /d matching, and there are other bits that are set only under /l, so if we need to do this again, we should use one of those. I wrote this code when I thought I really needed a bit. But since, I have figured out a better way to get the bit needed now. But I don't want to lose this code to posterity, so this commit is being made long enough to get the commit number, then it will be reverted, adding comments referring to the commit number, so that it can easily be reconstructed when necessary.
* regcomp.h: Rmv false commentsKarl Williamson2014-02-121-4/+4
| | | | I misread the code when I added these comments
* eliminate RXf_ANCH_SINGLEDavid Mitchell2014-02-071-2/+2
| | | | | | | | | This macro defines two flag bits: #define PREGf_ANCH_SINGLE (PREGf_ANCH_SBOL|PREGf_ANCH_GPOS) but is only used twice in core (and not on CPAN), don't really add any value, but increases cognitive complexity.
* Add RXf_UNBOUNDED_QUANTIFIER and regexp->maxlenYves Orton2014-02-031-0/+2
| | | | | | | | | The flag tells us that a pattern may match an infinitely long string. The new member in the regexp struct tells us how long the string might be. With these two items we can implement regexp based $/
* rename REG_SEEN_WHATEVER to REG_WHATEVER_SEEN to match RXf_ and PREGf_ ↵Yves Orton2014-01-311-12/+11
| | | | convention
* Move the RXf_ANCH flags to intflags as PREGf_ANCH_xxx and add ↵Yves Orton2014-01-311-2/+9
| | | | | | | | | | RXf_IS_ANCHORED as a replacement The only requirement outside of the regex engine is to identify that there is an anchor involved at all. So we move the 4 anchor flags to intflags and replace it with a single aggregate flag RXf_IS_ANCHORED in extflags. This frees up another 3 bits in extflags.
* move RXf_GPOS_SEEN and RXf_GPOS_FLOAT to intflagsYves Orton2014-01-311-1/+3
| | | | | | | | This required removing the RXf_GPOS_CHECK mask as it uses one flag that will stay in extflags for now (RXf_ANCH_GPOS), and one flag that moves to intflags (RXf_GPOS_SEEN). This mask is strange however, as you cant have RXf_ANCH_GPOS without having RXf_GPOS_SEEN so I dont know why we test both. Further investigation required.
* Rename RXf_CANY_SEEN to PREGf_CANY_SEEN and move from extflags to intflagsYves Orton2014-01-311-0/+1
|
* move RXf_NOSCAN from extflags to intflags as PREGf_NOSCANYves Orton2014-01-311-0/+5
| | | | | Includes some improvements to how we dump regexps so that when a regexp is for the standard perl engine we also show the intflags for the engine
* regcomp.c: Change a variable and flag bit namesKarl Williamson2014-01-271-1/+1
| | | | | The meaning of these was expanded two commits ago, so update the name to reflect this, to prevent future confusion
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-271-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* Rename regex internal flag bitKarl Williamson2014-01-221-1/+1
| | | | | This is a clearer name; is used internally only in regcomp.c and regexec.c
* Use bit instead of node for regex SSCKarl Williamson2014-01-221-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flag bits in regular expression ANYOF nodes are perennially in short supply. However there are still plenty of regex nodes possible. So one solution to needing to pass more information is to create a node that encapsulates what is needed. That is what commit 9aa1e39f96ac28f6ce5d814d9a1eccf1464aba4a did to tell regexec.c that a particular ANYOF node is for the synthetic start class (SSC). However this solution introduces other issues. If you have to express two things, then you need a regnode for A, a regnode for B, a regnode for both A and B, and another regnode for both not A nor B; With three things, you need 8 regnodes to express all possible combinations. This becomes unwieldy to write code for. The number of combinations goes way down if some of them are mutually exclusive. At the time of that commit, I thought that a SSC need not ever warn if matching against an above-Unicode code point. I was wrong, and that has been corrected earlier in the 5.19 series. But it finally came to me how to tell regexec that an ANYOF node is for the SSC without taking up a flag bit and without requiring a regnode type. The 'next_off' field in a regnode tells the engine the offeset in the regex program to the node it's supposed to go to after processing this one. Since the SSC stands alone, its 'next_off' field is unused, and we can put anything we want in it. That, however, is not true of other ANYOF regnodes. But it turns out that there are certain values that will never be legitimate in the 'next_off' field in these, and so this commit uses one of those to signal that this ANYOF field is an SSC. regnodes come in various sizes, and the offset is in terms of how many of the smallest ones are there to the next node to look at. Since ANYOF nodes are large, the offset is always > 1, and so this commit uses 1 to indicate an SSC.
* regcomp.h: Reorder some #definesKarl Williamson2013-12-311-8/+8
| | | | | | There are no logic changes. The previous commit changed the numbers for some of the bits. This commit re-arranges things so that the #defines are again in numerical order.
* Re-order some flag bits to avoid potential branchesKarl Williamson2013-12-311-3/+4
| | | | | | | | | | | The ANYOF_INVERT flag is used in every single pattern match of [bracketed character classes]. With backtracking, this can be a huge number. All the other flags' uses pale by comparison. I noticed that by making it the lowest bit, we don't have to use CBOOL, as the only possibilities are 0 and 1. cBOOL hopefully will be optimized away, but not always. This commit reorders some of the flag bits to make this one the lowest, and adds a compile check to make sure it isn't inadvertently changed.
* Output regex above-Unicode matching in syn strt classKarl Williamson2013-12-311-1/+1
| | | | | | | A warning is supposed to be raised under some conditions when matching an above-Unicode code point against a Unicode property. Prior to this patch, if the synthetic start class excluded the code point, the warning would be skipped, even though it was attempted to be matched.
* Convert regnode to a flag for [...]Karl Williamson2013-12-311-4/+6
| | | | | | | | | | | | | | | | | | Prior to this commit, there were 3 types of ANYOF nodes; now there are two: regular, and one for the synthetic start class (ssc). This commit converted the third type dealing with warning about matching \p{} against non-Unicode code points, into using the spare flag bit for ANYOF nodes. This allows this bit to apply to ssc ANYOF nodes, whereas previously it couldn't. There is a bug in which the warning isn't raised if the match is rejected by the optimizer, because of this inability. This bug will be fixed in a later commit. Another option would have been to create a new node-type which was an ANYOF_SSC_WARN_SUPER node. But this adds extra complications to things; and we have a spare bit that we might as well use. The comments give better possibilities for freeing up 2 bits should they be needed.
* regcomp.c: Split #define into twoKarl Williamson2013-12-311-0/+5
| | | | | | | | | | | | The syntethic start class regnode (SSC) and a bracketed character class node share much of the same data structure, including a flags field, and some of the same flag bits within it. Currently, only locale-related flags (under /l rules) are the same between the two during construction of the SSC. But a future commit will introduce another common flag. This commit creates an extra #define for use where we want the common flags, while retaining the existing one for use where we want the locale flags. The new #define is just a copy of the existing one, to be changed in the future commit.
* Avoid pointer churn in study_chunk recursion bitmap allocationYves Orton2013-11-241-0/+1
| | | | | | | | | | | | | | | | Since we can only recurse into a given paren (or the entire pattern) once, we know that the maximum recursion depth is the number of parens in the pattern (plus one for "whole pattern"). This means we can preallocate one large bitmap, and then use different chunks of it for each level. That avoids SAVEFREEPV costs for each bitmap, which are likely short anyway. (One could imagine an optimization where a flag somewhere lets us use the RExC_study_chunk_recursed pointer as a bitmap, so we dont have to allocate all when we have less than 32 parens.) This removes the "recursed" argument from study_chunk() and replaces it with a "recursive_depth" argument which counts how deep we are in the bitmap "stack".
* regcomp.c: Move bit to different data structureKarl Williamson2013-09-241-8/+17
| | | | | | | | | | | | Commit 899d20b99829f8ecdc14e1351b533bc62a354dea was used to free up a bit in a flags field that had run out of bits at the time. Further work has made that unnecessary, and this commit moves it back to the flags field, which even after this commit has a spare bit (which is intended to be used in a future commit). Doing so makes this bit "just one of the guys", so can be operated on en-masse with the others. This allows a little code to be removed, and the knowledge of this flag mostly confined to lower level subroutines.
* Teach regex optimizer to handle above-Latin1Karl Williamson2013-09-241-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until this commit, the regular expression optimizer has essentially punted on above-Latin1 code points. Under some circumstances, they would be taken into account, more or less, but often, the generated synthetic start class would end up matching all above-Latin1 code points. With the advent of inversion lists, it becomes feasible to actually fully handle such code points, as inversion lists are a convenient way to express arbitrary lists of code points and take their union, intersection, etc. This commit changes the optimizer to use inversion lists for operating on the code points the synthetic start class can match. I don't much understand the overall operation of the optimizer. I'm told that previous porters found that perturbing it caused unexpected behaviors. I had promised to get this change in 5.18, but didn't. I'm trying to get it in early enough into the 5.20 preliminary series that any problems will surface before 5.20 ships. This commit doesn't change the macro level logic, but does significantly change various micro level things. Thus the 'and' and 'or' subroutines have been rewritten to use inversion lists. I'm pretty confident that they do what their names suggest. I re-derived the equations for what these operations should do, getting the same results in some cases, but extending others where the previous code mostly punted. The derivations are given in comments in the respective routines. Some of the code is greatly simplified, as it no longer has to treat above-Latin1 specially. It is now feasible for /i matching of above-Latin1 code points to know explicitly the folds that should be in the synthetic start class. But more prepatory work needs to be done before putting that into place. ...
* regcomp.c: Add some static functionsKarl Williamson2013-09-241-0/+7
| | | | | | | | | This commit adds some functions that are currently unused, but will be used in a future commit. This commit is essentially to make the differences smaller in that commit, as 'diff' is getting confused and not outputting the logical differences. The functions are added in a block at the beginning of the file to avoid the 'diff' issues. A later white-space only commit will move them to more appropriate positions.
* Enlarge dummy regex pass1 compilation nodeKarl Williamson2013-09-241-1/+1
| | | | | | | | | | | | | | | In pass 1 of compiling regular expressions, the needed size is calculated. There is space allocated for a scratch node that can be used for the things that the real one will hold in pass 2. It is valid only while working on the current node, and gets overwritten in the next node. Until this commit, this scratch space was sized only for the smallest node type, meaning that larger types could not use it for scratch. Now it is sized to be the largest non EXACTish node. We could make it an array of 256 + overhead bytes instead to be able to hold the EXACTish nodes, but I don't see a need for that now.
* Rename regex flag bit for clarityKarl Williamson2013-09-241-9/+10
| | | | | | ANYOF_UNICODE_ALL doesn't mean every Unicode code point. It means those above the Latin1 range. Rename it, while retaining the old one for back compat.
* regcomp.h: Create new typedef synonym for clarityKarl Williamson2013-09-241-8/+8
| | | | | | | | | This commit finishes (at least for now) removing some of the overloading of the term class. A 'regnode_charclass_class' node contains space for storing the posix classes it matches that are never defined until the moment of matching because they are subject to the current run-time locale. This commit creates a typedef 'regnode_charclass_posixl' synonym that doesn't re-use the term 'class' for two different purposes.
* regcomp.h: Parenthesize macro formal parameterKarl Williamson2013-09-241-1/+1
| | | | | Not doing so can cause problems, so it is standard procedure to parenthesize all parameters within a macro definition.
* regcomp.h: Add better named synonymsKarl Williamson2013-09-241-25/+39
| | | | | | | | | | | | | | | | | This continues the process started two commits ago of removing some of the overloading of the term 'class'. In this case, this commit adds some #defines referring to the portions of the regnode associated with bracketed character classes, the ANYOF node. Specifically those portions that deal with the Posix character classes, like \w and [:punct:] under /l (locale) matching are renamed substituting POSIXL for CLASS. POSIXL is already used for POSIX-related things under /l. I remember being terribly confused when I started reading this code about this. One had a class within a class. This should clarify things somewhat. The old names are retained in case files outside the core #include and use it (there are a few such in cpan).
* regcomp.h: Move #defineKarl Williamson2013-09-241-4/+4
| | | | This moves it to be adjacent to similar #defines
* Add regnode struct for synthetic start classKarl Williamson2013-09-241-0/+11
| | | | | | | | | | | | As part of extending the regular expression optimizer to properly handle above Latin1 code points, I need an inversion list to contain which code points the synthetic start class (ssc) matches. The ssc currently is the same as a locale-aware ANYOF node, which uses the struct of a regular ANYOF node, plus some extra fields at the end. This commit creates a new typedef for ssc use, which is the locale-aware ANYOF node, plus an extra SV* at the end to hold the inversion list.
* regcomp.h: Add a couple #define synonymsKarl Williamson2013-08-141-0/+2
| | | | | Sometimes SIZE_ONLY isn't really clear as to what is going, on. This adds PASS1 and PASS2 for such instances.
* regcomp.h, sv.c, utf8.c: Comment nitsKarl Williamson2013-08-101-3/+0
| | | | | Remove obsolete comment, typos in others, plus reflow one block to fit into 79 columns
* Possessive and non greedy quantifier modifiers are mutually exclusiveYves Orton2013-06-131-7/+0
| | | | | | | | | | | | | When I added support for possessive modifiers it was possible to build perl so that they could be combined even if it made no sense to do so. Later on in relation to Perl #118375 I got confused and thought they were undocumented but legal. So to prevent further confusion, and since nobody has every mentioned it since they were added, I am removing the unusued conditional build logic, and clearly documenting why they aren't allowed.
* eliminate PL_regdummyDavid Mitchell2013-06-021-1/+1
| | | | | | | This global (per-interpreter) var is just used during regex compilation as a placeholder to point RExC_emit at during the first (non-emitting) pass, to indicate to not to emit anything. There's no need for it to be a global var: just add it as an extra field in the RExC_state_t struct instead.
* Revert "PATCH: regex longjmp flaws"Nicholas Clark2013-03-191-3/+1
| | | | | | | | | | | | | This reverts commit 595598ee1f247e72e06e4cfbe0f98406015df5cc. The netbsd - 5.0.2 compiler pointed out that the recent changes to add longjmps to speed up some regex compilations can result in clobbering a few values. These depend on the compiled code, and so didn't show up in other compiler's warnings. This patch reinitializes them after a longjmp. [With a lot of hand editing in regcomp.c, to propagate the changes through subsequent commits.]
* regex: Add pseudo-Posix class: 'cased'Karl Williamson2012-12-311-0/+3
| | | | | | | | | | | | | | | | | /[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
* handy.h, regcomp.h, regexec.c: Sort initializers, switch()Karl Williamson2012-12-311-12/+12
| | | | | | | | Until recently, these were needed to be (or it made sense to be) in numerical value of what the rhs of each #define evaluates to. But now, they are all initialized to something else, and the numerical value is not even apparent. Alphabetical order gives a logical ordering to help a reader find things.
* regcomp.c: Free up ANYOF flag bitKarl Williamson2012-12-281-18/+11
| | | | | | | | | | | | | This frees up a flag bit for ANYOF regnodes. The freed bit is currently not needed for other uses; I decided to make the change now, while how to do it was fresh in my mind. There are fewer shifts and masks as a result, as well. This commit moves the information this bit contains to the otherwise unused 'next_off' field in the synthetic start class. This paradigm could be used to pass information to the regex matching code for just the synthetic start class, but the current bit is used just during compilation.
* Add new regnode for synthetic start classKarl Williamson2012-12-281-11/+1
| | | | | | | | | | | | | This creates a regnode specifically for the synthetic start class, which is a type of ANYOF node. The flag bit previously used to denote this is removed. This paves the way for this bit to be freed up, but first the other use of this bit must also be removed, which will be done in the next commit. There are now three ANYOF-type regnodes. This one should be called only in one place in regexec.c. The other special one is ANYOF_WARN_SUPER. A synthetic start class node should not do any warning, so there is no issue of having something need to be both types.
* regcomp.c, regcomp.h: White-space, comment onlyKarl Williamson2012-12-281-3/+3
| | | | No code changes
* regcomp.h: Split two ANYOF flag bitsKarl Williamson2012-12-281-5/+5
| | | | | | This essentially reverts 8b27d3db700fc2fce268e3d78e221a16ccaca2e8 and causes ANYOF nodes that are in locale but don't match things like \w to have a smaller node size.
* Free up regex ANYOF bit.Karl Williamson2012-12-281-4/+0
| | | | | | | This uses a regnode type, of which we have many available, to free up a bit in the ANYOF regnode flag field, of which we have none, and are trying to have the same bit do double duty. This will enable us to remove some of that double duty in the next commit.
* regcomp.c: Clean up ANYOF_CLASS handling.Karl Williamson2012-12-281-1/+1
| | | | | | | | | | | The ANYOF_CLASS flag is used in ANYOF nodes (for [bracketed] and the synthetic start class) only when matching something like \w, [:punct:] etc., under /l (locale). It should not be set unless /l is specified. However, it was always getting set for the synthetic start class. This commit fixes that. The previous code was masking errors in which it was being tested for unnecessarily, and for much of the 5.17 series, the synthetic start class was always set to test for locale, which was a waste of cpu when no locale was specified.