delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add RXf_UNBOUNDED_QUANTIFIER and regexp->maxlen	Yves Orton	2014-02-03	1	-0/+2
\| \| \| \| \| \| \| \| \|	The flag tells us that a pattern may match an infinitely long string. The new member in the regexp struct tells us how long the string might be. With these two items we can implement regexp based $/
*	rename REG_SEEN_WHATEVER to REG_WHATEVER_SEEN to match RXf_ and PREGf_ ↵	Yves Orton	2014-01-31	1	-12/+11
\| \| \| \|	convention
*	Move the RXf_ANCH flags to intflags as PREGf_ANCH_xxx and add ↵	Yves Orton	2014-01-31	1	-2/+9
\| \| \| \| \| \| \| \| \| \|	RXf_IS_ANCHORED as a replacement The only requirement outside of the regex engine is to identify that there is an anchor involved at all. So we move the 4 anchor flags to intflags and replace it with a single aggregate flag RXf_IS_ANCHORED in extflags. This frees up another 3 bits in extflags.
*	move RXf_GPOS_SEEN and RXf_GPOS_FLOAT to intflags	Yves Orton	2014-01-31	1	-1/+3
\| \| \| \| \| \| \| \|	This required removing the RXf_GPOS_CHECK mask as it uses one flag that will stay in extflags for now (RXf_ANCH_GPOS), and one flag that moves to intflags (RXf_GPOS_SEEN). This mask is strange however, as you cant have RXf_ANCH_GPOS without having RXf_GPOS_SEEN so I dont know why we test both. Further investigation required.
*	Rename RXf_CANY_SEEN to PREGf_CANY_SEEN and move from extflags to intflags	Yves Orton	2014-01-31	1	-0/+1
\|
*	move RXf_NOSCAN from extflags to intflags as PREGf_NOSCAN	Yves Orton	2014-01-31	1	-0/+5
\| \| \| \| \|	Includes some improvements to how we dump regexps so that when a regexp is for the standard perl engine we also show the intflags for the engine
*	regcomp.c: Change a variable and flag bit names	Karl Williamson	2014-01-27	1	-1/+1
\| \| \| \| \|	The meaning of these was expanded two commits ago, so update the name to reflect this, to prevent future confusion
*	Work properly under UTF-8 LC_CTYPE locales	Karl Williamson	2014-01-27	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
*	Rename regex internal flag bit	Karl Williamson	2014-01-22	1	-1/+1
\| \| \| \| \|	This is a clearer name; is used internally only in regcomp.c and regexec.c
*	Use bit instead of node for regex SSC	Karl Williamson	2014-01-22	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flag bits in regular expression ANYOF nodes are perennially in short supply. However there are still plenty of regex nodes possible. So one solution to needing to pass more information is to create a node that encapsulates what is needed. That is what commit 9aa1e39f96ac28f6ce5d814d9a1eccf1464aba4a did to tell regexec.c that a particular ANYOF node is for the synthetic start class (SSC). However this solution introduces other issues. If you have to express two things, then you need a regnode for A, a regnode for B, a regnode for both A and B, and another regnode for both not A nor B; With three things, you need 8 regnodes to express all possible combinations. This becomes unwieldy to write code for. The number of combinations goes way down if some of them are mutually exclusive. At the time of that commit, I thought that a SSC need not ever warn if matching against an above-Unicode code point. I was wrong, and that has been corrected earlier in the 5.19 series. But it finally came to me how to tell regexec that an ANYOF node is for the SSC without taking up a flag bit and without requiring a regnode type. The 'next_off' field in a regnode tells the engine the offeset in the regex program to the node it's supposed to go to after processing this one. Since the SSC stands alone, its 'next_off' field is unused, and we can put anything we want in it. That, however, is not true of other ANYOF regnodes. But it turns out that there are certain values that will never be legitimate in the 'next_off' field in these, and so this commit uses one of those to signal that this ANYOF field is an SSC. regnodes come in various sizes, and the offset is in terms of how many of the smallest ones are there to the next node to look at. Since ANYOF nodes are large, the offset is always > 1, and so this commit uses 1 to indicate an SSC.
*	regcomp.h: Reorder some #defines	Karl Williamson	2013-12-31	1	-8/+8
\| \| \| \| \| \|	There are no logic changes. The previous commit changed the numbers for some of the bits. This commit re-arranges things so that the #defines are again in numerical order.
*	Re-order some flag bits to avoid potential branches	Karl Williamson	2013-12-31	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	The ANYOF_INVERT flag is used in every single pattern match of [bracketed character classes]. With backtracking, this can be a huge number. All the other flags' uses pale by comparison. I noticed that by making it the lowest bit, we don't have to use CBOOL, as the only possibilities are 0 and 1. cBOOL hopefully will be optimized away, but not always. This commit reorders some of the flag bits to make this one the lowest, and adds a compile check to make sure it isn't inadvertently changed.
*	Output regex above-Unicode matching in syn strt class	Karl Williamson	2013-12-31	1	-1/+1
\| \| \| \| \| \| \|	A warning is supposed to be raised under some conditions when matching an above-Unicode code point against a Unicode property. Prior to this patch, if the synthetic start class excluded the code point, the warning would be skipped, even though it was attempted to be matched.
*	Convert regnode to a flag for [...]	Karl Williamson	2013-12-31	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, there were 3 types of ANYOF nodes; now there are two: regular, and one for the synthetic start class (ssc). This commit converted the third type dealing with warning about matching \p{} against non-Unicode code points, into using the spare flag bit for ANYOF nodes. This allows this bit to apply to ssc ANYOF nodes, whereas previously it couldn't. There is a bug in which the warning isn't raised if the match is rejected by the optimizer, because of this inability. This bug will be fixed in a later commit. Another option would have been to create a new node-type which was an ANYOF_SSC_WARN_SUPER node. But this adds extra complications to things; and we have a spare bit that we might as well use. The comments give better possibilities for freeing up 2 bits should they be needed.
*	regcomp.c: Split #define into two	Karl Williamson	2013-12-31	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The syntethic start class regnode (SSC) and a bracketed character class node share much of the same data structure, including a flags field, and some of the same flag bits within it. Currently, only locale-related flags (under /l rules) are the same between the two during construction of the SSC. But a future commit will introduce another common flag. This commit creates an extra #define for use where we want the common flags, while retaining the existing one for use where we want the locale flags. The new #define is just a copy of the existing one, to be changed in the future commit.
*	Avoid pointer churn in study_chunk recursion bitmap allocation	Yves Orton	2013-11-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we can only recurse into a given paren (or the entire pattern) once, we know that the maximum recursion depth is the number of parens in the pattern (plus one for "whole pattern"). This means we can preallocate one large bitmap, and then use different chunks of it for each level. That avoids SAVEFREEPV costs for each bitmap, which are likely short anyway. (One could imagine an optimization where a flag somewhere lets us use the RExC_study_chunk_recursed pointer as a bitmap, so we dont have to allocate all when we have less than 32 parens.) This removes the "recursed" argument from study_chunk() and replaces it with a "recursive_depth" argument which counts how deep we are in the bitmap "stack".
*	regcomp.c: Move bit to different data structure	Karl Williamson	2013-09-24	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Commit 899d20b99829f8ecdc14e1351b533bc62a354dea was used to free up a bit in a flags field that had run out of bits at the time. Further work has made that unnecessary, and this commit moves it back to the flags field, which even after this commit has a spare bit (which is intended to be used in a future commit). Doing so makes this bit "just one of the guys", so can be operated on en-masse with the others. This allows a little code to be removed, and the knowledge of this flag mostly confined to lower level subroutines.
*	Teach regex optimizer to handle above-Latin1	Karl Williamson	2013-09-24	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until this commit, the regular expression optimizer has essentially punted on above-Latin1 code points. Under some circumstances, they would be taken into account, more or less, but often, the generated synthetic start class would end up matching all above-Latin1 code points. With the advent of inversion lists, it becomes feasible to actually fully handle such code points, as inversion lists are a convenient way to express arbitrary lists of code points and take their union, intersection, etc. This commit changes the optimizer to use inversion lists for operating on the code points the synthetic start class can match. I don't much understand the overall operation of the optimizer. I'm told that previous porters found that perturbing it caused unexpected behaviors. I had promised to get this change in 5.18, but didn't. I'm trying to get it in early enough into the 5.20 preliminary series that any problems will surface before 5.20 ships. This commit doesn't change the macro level logic, but does significantly change various micro level things. Thus the 'and' and 'or' subroutines have been rewritten to use inversion lists. I'm pretty confident that they do what their names suggest. I re-derived the equations for what these operations should do, getting the same results in some cases, but extending others where the previous code mostly punted. The derivations are given in comments in the respective routines. Some of the code is greatly simplified, as it no longer has to treat above-Latin1 specially. It is now feasible for /i matching of above-Latin1 code points to know explicitly the folds that should be in the synthetic start class. But more prepatory work needs to be done before putting that into place. ...
*	regcomp.c: Add some static functions	Karl Williamson	2013-09-24	1	-0/+7
\| \| \| \| \| \| \| \| \|	This commit adds some functions that are currently unused, but will be used in a future commit. This commit is essentially to make the differences smaller in that commit, as 'diff' is getting confused and not outputting the logical differences. The functions are added in a block at the beginning of the file to avoid the 'diff' issues. A later white-space only commit will move them to more appropriate positions.
*	Enlarge dummy regex pass1 compilation node	Karl Williamson	2013-09-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pass 1 of compiling regular expressions, the needed size is calculated. There is space allocated for a scratch node that can be used for the things that the real one will hold in pass 2. It is valid only while working on the current node, and gets overwritten in the next node. Until this commit, this scratch space was sized only for the smallest node type, meaning that larger types could not use it for scratch. Now it is sized to be the largest non EXACTish node. We could make it an array of 256 + overhead bytes instead to be able to hold the EXACTish nodes, but I don't see a need for that now.
*	Rename regex flag bit for clarity	Karl Williamson	2013-09-24	1	-9/+10
\| \| \| \| \| \|	ANYOF_UNICODE_ALL doesn't mean every Unicode code point. It means those above the Latin1 range. Rename it, while retaining the old one for back compat.
*	regcomp.h: Create new typedef synonym for clarity	Karl Williamson	2013-09-24	1	-8/+8
\| \| \| \| \| \| \| \| \|	This commit finishes (at least for now) removing some of the overloading of the term class. A 'regnode_charclass_class' node contains space for storing the posix classes it matches that are never defined until the moment of matching because they are subject to the current run-time locale. This commit creates a typedef 'regnode_charclass_posixl' synonym that doesn't re-use the term 'class' for two different purposes.
*	regcomp.h: Parenthesize macro formal parameter	Karl Williamson	2013-09-24	1	-1/+1
\| \| \| \| \|	Not doing so can cause problems, so it is standard procedure to parenthesize all parameters within a macro definition.
*	regcomp.h: Add better named synonyms	Karl Williamson	2013-09-24	1	-25/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This continues the process started two commits ago of removing some of the overloading of the term 'class'. In this case, this commit adds some #defines referring to the portions of the regnode associated with bracketed character classes, the ANYOF node. Specifically those portions that deal with the Posix character classes, like \w and [:punct:] under /l (locale) matching are renamed substituting POSIXL for CLASS. POSIXL is already used for POSIX-related things under /l. I remember being terribly confused when I started reading this code about this. One had a class within a class. This should clarify things somewhat. The old names are retained in case files outside the core #include and use it (there are a few such in cpan).
*	regcomp.h: Move #define	Karl Williamson	2013-09-24	1	-4/+4
\| \| \| \|	This moves it to be adjacent to similar #defines
*	Add regnode struct for synthetic start class	Karl Williamson	2013-09-24	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	As part of extending the regular expression optimizer to properly handle above Latin1 code points, I need an inversion list to contain which code points the synthetic start class (ssc) matches. The ssc currently is the same as a locale-aware ANYOF node, which uses the struct of a regular ANYOF node, plus some extra fields at the end. This commit creates a new typedef for ssc use, which is the locale-aware ANYOF node, plus an extra SV* at the end to hold the inversion list.
*	regcomp.h: Add a couple #define synonyms	Karl Williamson	2013-08-14	1	-0/+2
\| \| \| \| \|	Sometimes SIZE_ONLY isn't really clear as to what is going, on. This adds PASS1 and PASS2 for such instances.
*	regcomp.h, sv.c, utf8.c: Comment nits	Karl Williamson	2013-08-10	1	-3/+0
\| \| \| \| \|	Remove obsolete comment, typos in others, plus reflow one block to fit into 79 columns
*	Possessive and non greedy quantifier modifiers are mutually exclusive	Yves Orton	2013-06-13	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	When I added support for possessive modifiers it was possible to build perl so that they could be combined even if it made no sense to do so. Later on in relation to Perl #118375 I got confused and thought they were undocumented but legal. So to prevent further confusion, and since nobody has every mentioned it since they were added, I am removing the unusued conditional build logic, and clearly documenting why they aren't allowed.
*	eliminate PL_regdummy	David Mitchell	2013-06-02	1	-1/+1
\| \| \| \| \| \| \|	This global (per-interpreter) var is just used during regex compilation as a placeholder to point RExC_emit at during the first (non-emitting) pass, to indicate to not to emit anything. There's no need for it to be a global var: just add it as an extra field in the RExC_state_t struct instead.
*	Revert "PATCH: regex longjmp flaws"	Nicholas Clark	2013-03-19	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 595598ee1f247e72e06e4cfbe0f98406015df5cc. The netbsd - 5.0.2 compiler pointed out that the recent changes to add longjmps to speed up some regex compilations can result in clobbering a few values. These depend on the compiled code, and so didn't show up in other compiler's warnings. This patch reinitializes them after a longjmp. [With a lot of hand editing in regcomp.c, to propagate the changes through subsequent commits.]
*	regex: Add pseudo-Posix class: 'cased'	Karl Williamson	2012-12-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
*	handy.h, regcomp.h, regexec.c: Sort initializers, switch()	Karl Williamson	2012-12-31	1	-12/+12
\| \| \| \| \| \| \| \|	Until recently, these were needed to be (or it made sense to be) in numerical value of what the rhs of each #define evaluates to. But now, they are all initialized to something else, and the numerical value is not even apparent. Alphabetical order gives a logical ordering to help a reader find things.
*	regcomp.c: Free up ANYOF flag bit	Karl Williamson	2012-12-28	1	-18/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This frees up a flag bit for ANYOF regnodes. The freed bit is currently not needed for other uses; I decided to make the change now, while how to do it was fresh in my mind. There are fewer shifts and masks as a result, as well. This commit moves the information this bit contains to the otherwise unused 'next_off' field in the synthetic start class. This paradigm could be used to pass information to the regex matching code for just the synthetic start class, but the current bit is used just during compilation.
*	Add new regnode for synthetic start class	Karl Williamson	2012-12-28	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This creates a regnode specifically for the synthetic start class, which is a type of ANYOF node. The flag bit previously used to denote this is removed. This paves the way for this bit to be freed up, but first the other use of this bit must also be removed, which will be done in the next commit. There are now three ANYOF-type regnodes. This one should be called only in one place in regexec.c. The other special one is ANYOF_WARN_SUPER. A synthetic start class node should not do any warning, so there is no issue of having something need to be both types.
*	regcomp.c, regcomp.h: White-space, comment only	Karl Williamson	2012-12-28	1	-3/+3
\| \| \| \|	No code changes
*	regcomp.h: Split two ANYOF flag bits	Karl Williamson	2012-12-28	1	-5/+5
\| \| \| \| \| \|	This essentially reverts 8b27d3db700fc2fce268e3d78e221a16ccaca2e8 and causes ANYOF nodes that are in locale but don't match things like \w to have a smaller node size.
*	Free up regex ANYOF bit.	Karl Williamson	2012-12-28	1	-4/+0
\| \| \| \| \| \| \|	This uses a regnode type, of which we have many available, to free up a bit in the ANYOF regnode flag field, of which we have none, and are trying to have the same bit do double duty. This will enable us to remove some of that double duty in the next commit.
*	regcomp.c: Clean up ANYOF_CLASS handling.	Karl Williamson	2012-12-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The ANYOF_CLASS flag is used in ANYOF nodes (for [bracketed] and the synthetic start class) only when matching something like \w, [:punct:] etc., under /l (locale). It should not be set unless /l is specified. However, it was always getting set for the synthetic start class. This commit fixes that. The previous code was masking errors in which it was being tested for unnecessarily, and for much of the 5.17 series, the synthetic start class was always set to test for locale, which was a waste of cpu when no locale was specified.
*	handy.h: Create isALPHANUMERIC() and kin	Karl Williamson	2012-12-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Perl has had an undocumented macro isALNUMC() for a long time. I want to document it, but the name is very obscure. Neither Yves nor I are sure what it is. My best guess is "C's alnum". It corresponds to /[[:alnum:]]/, and so its best name would be isALNUM(). But that is the name long given to what matches \w. A new synonym, isWORDCHAR(), has been in place for several releases for that, but the old isALNUM() should remain for backwards compatibility. I don't think that the name isALNUMC() should be published, as it is too close to isALNUM(). I finally came to the conclusion that isALPHANUMERIC() is the best name; it describes its purpose clearly; the disadvantage is its long length. I doubt that it will get much use, but we need something, I think, that we can publish to accomplish this functionality. This commit also converts core uses of isALNUMC to isALPHANUMERIC. (I intended to that separately, but made a mistake in rebasing, and combined the two patches; and it seemed like not a big enough problem to separate them out again.)
*	use PERL_UNUSED_VAR rather than PERL_UNUSED_DECL	David Mitchell	2012-12-17	1	-2/+2
\| \| \| \| \|	PERL_UNUSED_DECL doesn't do anything under g++, so doing this silences some g++ warnings.
*	Change 4 byte bitmap to 32 bit single word	Karl Williamson	2012-12-09	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I presume that the reason this bitmap was expressed in bytes was that the macros for dealing with that were already readily available and familiar, and because it could easily be grown. However, it's extremely unlikely that we would ever need more bits. This bit map is for the Posix character classes, and no one is making more of them. There is currently one spare bit available, and if we don't back out of the \s and [:space:] unification, a second will become available in 5.20 or 5.22. Using a single word is more efficient, so this changes to use that. Should we ever need more bits, we can change back.
*	regcomp.h: Revise #define setup and checking	Karl Williamson	2012-12-09	1	-12/+15
\| \| \| \| \| \|	This revises how these #defines are set up so that the order can change (as will be done in a later commit), and the only dependencies are on VERTWS and the max one from handy.h.
*	regexes: Add \v to table of latin1 char classes	Karl Williamson	2012-11-19	1	-0/+5
\| \| \| \| \| \| \|	This will be used in future commits to allow \v and \V to be treated consistently with other character classes. (Doing the same for \h isn't necessary, as it matches identically to [:blank:] in the entire Unicode range.)
*	regcomp.h: Make some #defines sequential	Karl Williamson	2012-11-19	1	-9/+11
\| \| \| \| \| \| \| \| \|	ANYOF_MAX is used as the upper boundary in loops. If we keep it larger than necessary, the loop does extraneous iterations. The #defines that come after ANYOF_MAX are moved down to start with it. This is useful in a later commit that will create an entry in l1_char_class_tab.h for vertical white space determination.
*	regcomp: Change name of #define to better reflect its purpose	Karl Williamson	2012-11-19	1	-0/+3
\| \| \| \| \|	ANYOF_MAX is used for two different purposes; this separates them and creates a separate #define for one of them.
*	Allow regexp-to-pvlv assignment	Father Chrysostomos	2012-10-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the xpvlv and regexp structs conflict, we have to find somewhere else to put the regexp struct. I was going to sneak it in SvPVX, allocating a buffer large enough to fit the regexp struct followed by the string, and have SvPVX - sizeof(regexp) point to the struct. But that would make all regexp flag-checking macros fatter, and those are used in hot code. So I came up with another method. Regexp stringification is not speed-critical. So we can move the regexp stringification out of re->sv_u and put it in the regexp struct. Then the regexp struct itself can be pointed to by re->sv_u. So SVt_REGEXPs will have re->sv_any and re->sv_u pointing to the same spot. PVLVs can then have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u point to a regexp struct. All regexp member access can go through sv_u instead of sv_any, which will be no slower than before. Regular expressions will no longer be SvPOK, so we give sv_2pv spec- ial logic for regexps. We don’t need to make the regexp struct larger, as SvLEN is currently always 0 iff mother_re is set. So we can replace the SvLEN field with the pv. SvFAKE is never used without SvPOK or SvSCREAM also set. So we can use that to identify regexps.
*	regex: Rename macro to reflect its narrowed use	Karl Williamson	2012-10-14	1	-8/+5
\| \| \| \| \|	This macro is now only used under locale; its other use has now been removed. Change the name to reflect its only use.
*	regcomp.c: Add a less confusing #define alias	Karl Williamson	2012-09-26	1	-2/+4
\| \| \| \|	ALNUM (meaning \w) is too close to ALNUMC ([[:alnum:]]) for comfort
*	regcomp.h: Use handy.h constants	Karl Williamson	2012-07-24	1	-30/+33
\| \| \| \| \|	This synchronizes the ANYOF_FOO usages to the isFOO() usages. Future commits will take advantage of this relationship.