summaryrefslogtreecommitdiff
path: root/regcomp.h
Commit message (Collapse)AuthorAgeFilesLines
* Possessive and non greedy quantifier modifiers are mutually exclusiveYves Orton2013-06-131-7/+0
| | | | | | | | | | | | | When I added support for possessive modifiers it was possible to build perl so that they could be combined even if it made no sense to do so. Later on in relation to Perl #118375 I got confused and thought they were undocumented but legal. So to prevent further confusion, and since nobody has every mentioned it since they were added, I am removing the unusued conditional build logic, and clearly documenting why they aren't allowed.
* eliminate PL_regdummyDavid Mitchell2013-06-021-1/+1
| | | | | | | This global (per-interpreter) var is just used during regex compilation as a placeholder to point RExC_emit at during the first (non-emitting) pass, to indicate to not to emit anything. There's no need for it to be a global var: just add it as an extra field in the RExC_state_t struct instead.
* Revert "PATCH: regex longjmp flaws"Nicholas Clark2013-03-191-3/+1
| | | | | | | | | | | | | This reverts commit 595598ee1f247e72e06e4cfbe0f98406015df5cc. The netbsd - 5.0.2 compiler pointed out that the recent changes to add longjmps to speed up some regex compilations can result in clobbering a few values. These depend on the compiled code, and so didn't show up in other compiler's warnings. This patch reinitializes them after a longjmp. [With a lot of hand editing in regcomp.c, to propagate the changes through subsequent commits.]
* regex: Add pseudo-Posix class: 'cased'Karl Williamson2012-12-311-0/+3
| | | | | | | | | | | | | | | | | /[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
* handy.h, regcomp.h, regexec.c: Sort initializers, switch()Karl Williamson2012-12-311-12/+12
| | | | | | | | Until recently, these were needed to be (or it made sense to be) in numerical value of what the rhs of each #define evaluates to. But now, they are all initialized to something else, and the numerical value is not even apparent. Alphabetical order gives a logical ordering to help a reader find things.
* regcomp.c: Free up ANYOF flag bitKarl Williamson2012-12-281-18/+11
| | | | | | | | | | | | | This frees up a flag bit for ANYOF regnodes. The freed bit is currently not needed for other uses; I decided to make the change now, while how to do it was fresh in my mind. There are fewer shifts and masks as a result, as well. This commit moves the information this bit contains to the otherwise unused 'next_off' field in the synthetic start class. This paradigm could be used to pass information to the regex matching code for just the synthetic start class, but the current bit is used just during compilation.
* Add new regnode for synthetic start classKarl Williamson2012-12-281-11/+1
| | | | | | | | | | | | | This creates a regnode specifically for the synthetic start class, which is a type of ANYOF node. The flag bit previously used to denote this is removed. This paves the way for this bit to be freed up, but first the other use of this bit must also be removed, which will be done in the next commit. There are now three ANYOF-type regnodes. This one should be called only in one place in regexec.c. The other special one is ANYOF_WARN_SUPER. A synthetic start class node should not do any warning, so there is no issue of having something need to be both types.
* regcomp.c, regcomp.h: White-space, comment onlyKarl Williamson2012-12-281-3/+3
| | | | No code changes
* regcomp.h: Split two ANYOF flag bitsKarl Williamson2012-12-281-5/+5
| | | | | | This essentially reverts 8b27d3db700fc2fce268e3d78e221a16ccaca2e8 and causes ANYOF nodes that are in locale but don't match things like \w to have a smaller node size.
* Free up regex ANYOF bit.Karl Williamson2012-12-281-4/+0
| | | | | | | This uses a regnode type, of which we have many available, to free up a bit in the ANYOF regnode flag field, of which we have none, and are trying to have the same bit do double duty. This will enable us to remove some of that double duty in the next commit.
* regcomp.c: Clean up ANYOF_CLASS handling.Karl Williamson2012-12-281-1/+1
| | | | | | | | | | | The ANYOF_CLASS flag is used in ANYOF nodes (for [bracketed] and the synthetic start class) only when matching something like \w, [:punct:] etc., under /l (locale). It should not be set unless /l is specified. However, it was always getting set for the synthetic start class. This commit fixes that. The previous code was masking errors in which it was being tested for unnecessarily, and for much of the 5.17 series, the synthetic start class was always set to test for locale, which was a waste of cpu when no locale was specified.
* handy.h: Create isALPHANUMERIC() and kinKarl Williamson2012-12-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | Perl has had an undocumented macro isALNUMC() for a long time. I want to document it, but the name is very obscure. Neither Yves nor I are sure what it is. My best guess is "C's alnum". It corresponds to /[[:alnum:]]/, and so its best name would be isALNUM(). But that is the name long given to what matches \w. A new synonym, isWORDCHAR(), has been in place for several releases for that, but the old isALNUM() should remain for backwards compatibility. I don't think that the name isALNUMC() should be published, as it is too close to isALNUM(). I finally came to the conclusion that isALPHANUMERIC() is the best name; it describes its purpose clearly; the disadvantage is its long length. I doubt that it will get much use, but we need something, I think, that we can publish to accomplish this functionality. This commit also converts core uses of isALNUMC to isALPHANUMERIC. (I intended to that separately, but made a mistake in rebasing, and combined the two patches; and it seemed like not a big enough problem to separate them out again.)
* use PERL_UNUSED_VAR rather than PERL_UNUSED_DECLDavid Mitchell2012-12-171-2/+2
| | | | | PERL_UNUSED_DECL doesn't do anything under g++, so doing this silences some g++ warnings.
* Change 4 byte bitmap to 32 bit single wordKarl Williamson2012-12-091-16/+14
| | | | | | | | | | | | | | | I presume that the reason this bitmap was expressed in bytes was that the macros for dealing with that were already readily available and familiar, and because it could easily be grown. However, it's extremely unlikely that we would ever need more bits. This bit map is for the Posix character classes, and no one is making more of them. There is currently one spare bit available, and if we don't back out of the \s and [:space:] unification, a second will become available in 5.20 or 5.22. Using a single word is more efficient, so this changes to use that. Should we ever need more bits, we can change back.
* regcomp.h: Revise #define setup and checkingKarl Williamson2012-12-091-12/+15
| | | | | | This revises how these #defines are set up so that the order can change (as will be done in a later commit), and the only dependencies are on VERTWS and the max one from handy.h.
* regexes: Add \v to table of latin1 char classesKarl Williamson2012-11-191-0/+5
| | | | | | | This will be used in future commits to allow \v and \V to be treated consistently with other character classes. (Doing the same for \h isn't necessary, as it matches identically to [:blank:] in the entire Unicode range.)
* regcomp.h: Make some #defines sequentialKarl Williamson2012-11-191-9/+11
| | | | | | | | | ANYOF_MAX is used as the upper boundary in loops. If we keep it larger than necessary, the loop does extraneous iterations. The #defines that come after ANYOF_MAX are moved down to start with it. This is useful in a later commit that will create an entry in l1_char_class_tab.h for vertical white space determination.
* regcomp: Change name of #define to better reflect its purposeKarl Williamson2012-11-191-0/+3
| | | | | ANYOF_MAX is used for two different purposes; this separates them and creates a separate #define for one of them.
* Allow regexp-to-pvlv assignmentFather Chrysostomos2012-10-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the xpvlv and regexp structs conflict, we have to find somewhere else to put the regexp struct. I was going to sneak it in SvPVX, allocating a buffer large enough to fit the regexp struct followed by the string, and have SvPVX - sizeof(regexp) point to the struct. But that would make all regexp flag-checking macros fatter, and those are used in hot code. So I came up with another method. Regexp stringification is not speed-critical. So we can move the regexp stringification out of re->sv_u and put it in the regexp struct. Then the regexp struct itself can be pointed to by re->sv_u. So SVt_REGEXPs will have re->sv_any and re->sv_u pointing to the same spot. PVLVs can then have sv->sv_any point to the xpvlv body as usual, but have sv->sv_u point to a regexp struct. All regexp member access can go through sv_u instead of sv_any, which will be no slower than before. Regular expressions will no longer be SvPOK, so we give sv_2pv spec- ial logic for regexps. We don’t need to make the regexp struct larger, as SvLEN is currently always 0 iff mother_re is set. So we can replace the SvLEN field with the pv. SvFAKE is never used without SvPOK or SvSCREAM also set. So we can use that to identify regexps.
* regex: Rename macro to reflect its narrowed useKarl Williamson2012-10-141-8/+5
| | | | | This macro is now only used under locale; its other use has now been removed. Change the name to reflect its only use.
* regcomp.c: Add a less confusing #define aliasKarl Williamson2012-09-261-2/+4
| | | | ALNUM (meaning \w) is too close to ALNUMC ([[:alnum:]]) for comfort
* regcomp.h: Use handy.h constantsKarl Williamson2012-07-241-30/+33
| | | | | This synchronizes the ANYOF_FOO usages to the isFOO() usages. Future commits will take advantage of this relationship.
* regcomp.c: Use data structure properties to remove testsKarl Williamson2012-07-241-1/+1
| | | | | | | The ANYOF_foo character class #defines really form an enum, with the property that the regular one is n, and its complement is n+1. So we can replace the tests in each case: of the switch, with a single test afterwards.
* Only generate above-Uni warning for \p{}, \P{}Karl Williamson2012-07-191-0/+4
| | | | | | | | | | | This warning was being generated inappropriately during some internal operations, such as parsing a program; spotted by Tom Christiansen. The solution is to move the check for this situation out of the common code, and into the code where just \p{} and \P{} are handled. As mentioned in the commit's perldelta, there remains a bug [perl #114148], where no warning gets generated when it should
* regcomp.h: Free up bit; downside is makes locale ANYOF nodes largeKarl Williamson2012-07-191-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been two flavors of ANYOF nodes under /l (locale) (for bracketed character classes). If a class didn't try to match things like [:word:], it was smaller by 4 bytes than one that did. A flag bit was used to indicate which size it was. By making all such nodes the larger size, whether needed or not, that bit can be freed to be used for other purposes. This only affects ANYOF nodes compiled under locale rules. The hope is to eventually git rid of these nodes anyway, by taking the suggestion of Yves Orton to compile regular expressions using the current locale, and automatically recompile the next time they are used after the locale changes. This commit is somewhat experimental, done early in the development cycle to see if there are any breakages. There are other ways to free up a bit, as explained in the comments. Best would be to split off nodes that match everything outside Latin1, freeing up the ANYOF_UNICODE_ALL bit. However, there currently would need to be two flavors of this, one also for ANYOFV. I'm currently working to eliminate the need for ANYOFV nodes (which aren't sufficient, [perl #89774]), so it's easiest to wait for this work to be done before doing the split, after which we can revert this change in order to gain back the space, but in the meantime, this will have had the opportunity to smoke out issues that I would like to know about.
* regcomp.h: Fix up commentKarl Williamson2012-07-191-2/+2
|
* propagate 'use re eval' into return from (??{})David Mitchell2012-06-131-0/+1
| | | | | | | | | (??{}) returns a string which needs to be put through the regex compiler, and which may also contain (?{...}) - so any 'use re eval' in scope needs to be propagated into the inner environment. Achieve this by adding a new private flag - PREGf_USE_RE_EVAL - to the regex to indicate the use is in scope, and modify how the call to compile the inner pattern is done, to allow the use state to be passed in.
* eliminate OP_4tree typeDavid Mitchell2012-06-131-3/+0
| | | | This was an alias to OP, and formerly used by the old re_eval mechanism
* eliminate REG_SEEN_EVALDavid Mitchell2012-06-131-1/+1
| | | | | This flag was set during pattern compilation if a (?{}) was encountered; but is redundant now that we have pRExC_state->num_code_blocks.
* Fix up runtime regex codeblocks.David Mitchell2012-06-131-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commits in this branch have brought literal code blocks into the New World Order; now do the same for runtime blocks, i.e. those needing "use re 'eval'". The main user-visible changes from this commit are that: * the code is now fully parsed, rather than needing balanced {}'s; i.e. this now works: my $code = q[ (?{ $a = '{' }) ]; use re 'eval'; /$code/ * warnings and errors are now reported as coming from "(eval NNN)" rather than "(re_eval NNN)" (although see the next commit for some fixups to that). Indeed, the string "re_eval" has been expunged from the source and documentation. The big internal difference is that the sv_compile_2op() and sv_compile_2op_is_broken() functions are no longer used, and will be removed shorty. It works by the regex compiler detecting the presence of run-time code blocks, and feeding the whole pattern string back into the parser (where the run-time blocks are now seen as compile-time), then extracting out any compiled code blocks and adding them to the mix. For example, in the following: $c = '(?{"runtime"})d'; use re 'eval'; /a(?{"literal"})\b'$c/ At the point the regex compiler is called, the perl parser will already have compiled the literal code block and presented it to the regex engine. The engine examines the pattern string, sees two '(?{', but only one accounted for by the parser, and so constructs a short string to be evalled: based on the pattern, but with literal code-blocks blanked out, and \ and ' escaped. In the above example, the pattern string is a(?{"literal"})\b'(?{"runtime"})d and we call eval_sv() with an SV containing the text qr'a \\b\'(?{"runtime"})d' The returned qr will contain the new code-block (and associated CV and pad) which can be extracted and added to the list of compiled code blocks of the original pattern. Note that with this scheme, the requirement for "use re 'eval'" is easily determined, and no longer requires all the pp_regcreset / PL_reginterp_cnt machinery, which will be removed shortly. Two subtleties of this scheme are that normally, \\ isn't collapsed into \ for literal regexes (unlike literal strings), and hints aren't inherited when using eval_sv(). We get round both of these by adding and setting a new flag, PL_reg_state.re_reparsing, which indicates that we are refeeding a pattern into the perl parser.
* add op_comp field to regexp_engine APIDavid Mitchell2012-06-131-1/+2
| | | | | | | | | | | | | | | | | | | | Perl's internal function for compiling regexes that knows about code blocks, Perl_re_op_compile, isn't part of the engine API. However, the way that regcomp.c is dual-lifed as ext/re/re_comp.c with debugging compiled in, means that Perl_re_op_compile is also compiled as my_re_op_compile. These days days the mechanism to choose whether to call the main functions or the debugging my_* functions when 'use re debug' is in scope, is the re engine API jump table. Ergo, to ensure that my_re_op_compile gets called appropriately, this method needs adding to the jump table. So, I've added it, but documented as 'for perl internal use only, set to null in your engine'. I've also updated current_re_engine() to always return a pointer to a jump table, even if we're using the internal engine (formerly it returned null). This then allows us to use the simple condition (eng->op_comp) to determine whether the current engine supports code blocks.
* preserve code blocks in interpolated qr//sDavid Mitchell2012-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This now works: { my $x = 1; $r = qr/(??{$x})/ } my $x = 2; print "ok\n" if "1" =~ /^$r$/; When a qr// is interpolated into another pattern, the pattern is still recompiled using the stringified qr, but now the pre-compiled code blocks from the qr are reused rather than being re-compiled, so it behaves like a closure. Note that this makes some tests in regexp_qr_embed_thr.t fail, due to a pre-existing threads bug, which can be summarised as: use threads; my $s = threads->new(sub { return sub { $::x = 1} })->join; $s->(); print "\$::x=[$::x]\n"; which prints undef, not 1, since the *::x is cloned into the child thread, then cloned back into the parent as part of the CV (linked from the pad) being returned in the join. The cloning/join code isn't clever enough to realise that the back-cloned *::x is the same as the original *::x, so the main thread ends up with two copies. This manifests itself in the re tests as my $re = threads->new( sub { qr/(?{$::x = 1 })/ })->join(); where, since the returned qr// is now a closure, it suffers from the same glob duplication in the parent. So I've disabled 4 re_tests tests under threads for now.
* in re_op_compile(), keep code_blocks for qr//David Mitchell2012-06-131-0/+2
| | | | | | | | code_blocks is a temporary list of start/end indices and pointers to DO blocks, that is used during the regexp compilation. Change it so that in the qr// case, this structure is preserved (attached to regexp_internal), so that in a forthcoming commit it will be available for use when interpolating a qr within another pattern.
* make qr/(?{})/ behave with closuresDavid Mitchell2012-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this commit, qr// with a literal (compile-time) code block will Do the Right Thing as regards closures and the scope of lexical vars; in particular, the following now creates three regexes that match 1, 2 and 3: for my $i (0..2) { push @r, qr/^(??{$i})$/; } "1" =~ $r[1]; # matches Previously, $i would be evaluated as undef in all 3 patterns. This is achieved by wrapping the compilation of the pattern within a new anonymous CV, which is then attached to the pattern. At run-time pp_qr() clones the CV as well as copying the REGEXP; and when the code block is executed, it does so using the pad of the cloned CV. Which makes everything come out all right in the wash. The CV is stored in a new field of the REGEXP, called qr_anoncv. Note that run-time qr//s are still not fixed, e.g. qr/$foo(?{...})/; nor is it yet fixed where the qr// is embedded within another pattern: continuing with the code example from above, my $i = 99; "1" =~ $r[1]; # bare qr: matches: correct! "X99" =~ /X$r[1]/; # embedded qr: matches: whoops, it's still seeing the wrong $i
* Mostly complete fix for literal /(?{..})/ blocksDavid Mitchell2012-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the way that code blocks in patterns are parsed and executed, especially as regards lexical and scoping behaviour. (Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns, and literals within qr//, are still done the old broken way for now). This change means that for literal /(?{..})/ and /(??{..})/: * the code block is now fully parsed in the same pass as the surrounding code, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" ) * Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from #!/usr/bin/perl /(?{ warn "boo" })/ has changed from boo at (re_eval 1) line 1. to boo at /tmp/p line 2. * scope and closures now behave as you might expect; for example for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ } now prints "abc" rather than "" * with recursion, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000": sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0); * an earlier fix that stopped 'my' declarations within code blocks causing crashes, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed; * UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though) This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit, for the non-qr and non-runtime variants, those code blocks are no longer thrown away. Instead: * the LISTOP generated by the parser, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern, is now stored in a new field in PMOPs, called op_code_list. For example in /A(?{BLOCK})C/, the listop stored in op_code_list looks like LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B'] * each of the code blocks has its last op set to null and is individually run through the peephole optimiser, so each one becomes a little self-contained block of code, rather than a list of blocks that run into each other; * then in re_op_compile(), we concatenate the list of CONSTs to produce a string to be compiled, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})']; * (if the current regex engine isn't the built-in perl one, then we just throw away the code blocks and pass the concatenated string to the engine) * then during regex compilation, whenever we encounter a '(?{', we see if it matches the index of one of the pre-compiled blocks, and if so, we store a pointer to that block in an 'l' data slot, and use the end index to skip over the text of the code body. Conversely, if the index doesn't match, then we know that it's a run-time pattern and (for now), compile it in the old way. * During execution, when an EVAL op is encountered, if data->what is 'l', then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* regex: Fix some tricky fold problemsKarl Williamson2012-01-191-0/+1
| | | | | | | | | | | | As described in the comments, this changes the design of handling the Unicode tricky fold characters to not generate a node for each possible sequence but to get them to work within EXACTFish nodes. The previous design(s) all used a node to handle these, which suffers from the downfall that it precludes legitimate matches that would cross the node boundary. The new design is described in the comments.
* Comment additions, typos, white-space.Karl Williamson2012-01-131-0/+2
| | | | And the reordering for clarity of one test
* Change __attribute_unused__ to PERL_UNUSED_DECLKarl Williamson2011-11-091-1/+1
| | | | The latter is the Perl standard way of making this declaration
* use __attribute__unused__ to silence -Wunused-but-set-variableRobin Barker2011-05-191-1/+2
|
* regcomp.h: Add commentKarl Williamson2011-03-191-1/+1
|
* regcomp.h: Add ANYOF_CLASS_SETALL()Karl Williamson2011-03-191-0/+2
| | | | | This macro sets all the bits of the class (for \w, etc) for use during initialization
* regex: Fix locale regressionKarl Williamson2011-03-181-31/+18
| | | | | | | | | | | | | | | | | | Things like \S have not been accessible to the synthetic start class under locale matching rules. They have been placed there, but the start class didn't know they were there. This patch sets ANYOF_CLASS in initializing the synthetic start class so that downstream code knows it is a charclass_class, and removes the code that partially allowed this bit to be shared, and which isn't needed in 5.14, and more thought would have to go into doing it than was reflected in the code. I can't come up with a test case that would verify that this works, because of general locale testing issues, except it looked at a dump of the generated regex synthetic start class, but the dump isn't the same thing as the real behavior, and using one is also subject to breakage if the regex code changes in the slightest.
* regcomp.h: #define of ANYOF flags immune from inversionKarl Williamson2011-03-081-0/+10
|
* regex: /l in combo with others in syn start classKarl Williamson2011-03-081-12/+10
| | | | | | | | | Now that regexes can be combinations of different charset modifiers, a synthetic start class can match locale and non-locale both. locale should generally match only things in the bitmap for code points < 256. But a synthetic start class with a non-locale component can match such code points. This patch makes an exception for synthetic nodes that will be resolved if it passes and is matched again for real.
* regcomp.c: Move #defines to be be in bit orderKarl Williamson2011-03-081-5/+5
|
* regex: Remove obsolete codeKarl Williamson2011-02-281-19/+0
| | | | | | | This code has been rendered obsolete in 5.14 by using a different mechanism altogether. This functionality is now provided at run-time, user-selectable, via the /u and /d regex modifiers. This code was for compile-time selection of which to use.
* bleadperl breaks RCLAMP/Text-GlobKarl Williamson2011-02-251-6/+13
| | | | | | | | This was from commit f424400810b6af341e96230836690da51c37b812 which came from needing a bit in an already-full flags field, and my faulty analysis that two bits could be shared. I found another mechanism to free up another bit, and now can separate these shared bits again.
* Free up bit in ANYOF flagsKarl Williamson2011-02-251-7/+16
| | | | | | | | | | | | | | | | This is the foundation for fixing the regression RT #82610. My analysis was wrong that two bits could be shared, at least not without further work. This changes to use a different mechanism to pass needed information to regexec.c so that another bit can be freed up and, in a later commit, the two bits can become unshared again. The bit that is freed up is ANYOF_UTF8, which basically said there is something that is matched outside the ANYOF bitmap, and requires the target string to be in utf8. This changes things so the existence of something besides the bitmap indicates this, and so no flag is needed. The flag bit ANYOF_NONBITMAP_NON_UTF8 remains to indicate that there is something that should be matched outside the bitmap even if the target string isn't in utf8.
* regcomp.h: Remove obsolete defineKarl Williamson2011-02-241-3/+0
|