summaryrefslogtreecommitdiff
path: root/op_reg_common.h
Commit message (Collapse)AuthorAgeFilesLines
* handy.h: Create nBIT_MASK(n) macroKarl Williamson2020-07-171-2/+2
| | | | | This encapsulates a common paradigm, making sure that it is done correctly for the platform's size.
* silence some gcc -pendantic warningsDavid Mitchell2015-06-191-2/+4
|
* op_reg_common.h: Add commentKarl Williamson2015-03-091-0/+1
|
* Reserve a bit for 'the re strict subpragma.Karl Williamson2015-01-131-6/+10
| | | | This is another step in the process
* Add documentation for /n (non-capture) regexp flag.Matthew Horsfall2014-12-301-1/+1
|
* Create bit for /n.Karl Williamson2014-12-281-10/+13
|
* op_reg_common.h: NitsKarl Williamson2014-10-061-9/+9
| | | | | Add missing U suffix to unsigned numeric constant; parenthesize macro expansions for safety.
* op_reg_common.h: Get blead to build in JenkinsKarl Williamson2014-09-291-0/+2
| | | | | I don't understand why this compile error check is failing Jenkins, but am removing it for now to get things to work.
* Suppress some Solaris warningsKarl Williamson2014-09-291-15/+15
| | | | | | | | We get an integer overflow message when we left shift a 1 into the highest bit of a word. This changes the 1's into 1U's to indicate unsigned. This is done for all the flag bits in the affected word, as they could get reorderd by someone in the future, unintentionally reintroducing this problem again.
* op_reg_common.h: Update commentKarl Williamson2014-09-291-2/+3
| | | | | The PL file previously referred to has been deleted, and replaced by a different one.
* op_reg_common.h: White-space onlyKarl Williamson2014-09-291-4/+4
| | | | Align columns vertically
* Make space for /xx flagKarl Williamson2014-09-291-10/+12
| | | | | | This doesn't actually use the flag yet. We no longer have to make version-dependent changes to ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
* op_reg_common.h: #define in terms of more basic oneKarl Williamson2014-09-291-1/+1
| | | | | | The mask to copy bits should always include at least the compile-time bits. By defining it in terms of the compile-time bits, we make it easier to change and understand.
* Up regex flags limit for (??{})Karl Williamson2014-09-291-2/+5
| | | | | | | | | Previously the regex pattern compilation flags needed for this construct would fit into an 8-bit byte. This conveniently fits into the flags structure element of a regnode. There are changes coming that require more than 8 bits, so in preparation, this commit adds an argument to the node that implements (??{}) (31-bits usable for flags), and moves the storage to that.
* rework split() special case interaction with regex engineYves Orton2013-03-271-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves several issues at once. The parts are sufficiently interconnected that it is hard to break it down into smaller commits. The tickets open for these issues are: RT #94490 - split and constant folding RT #116086 - split "\x20" doesn't work as documented It additionally corrects some issues with cached regexes that were exposed by the split changes (and applied to them). It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171 and cccd1425414e6518c1fc8b7bcaccfb119320c513. Prior to this patch the special RXf_SKIPWHITE behavior of split(" ", $thing) was only available if Perl could resolve the first argument to split at compile time, meaning under various arcane situations. This manifested as oddities like my $delim = $cond ? " " : qr/\s+/; split $delim, $string; and split $cond ? " ", qr/\s+/, $string not behaving the same as: ($cond ? split(" ", $string) : split(/\s+/, $string)) which isn't very convenient. This patch changes this by adding a new flag to the op_pmflags, PMf_SPLIT which enables pp_regcomp() to know whether it was called as part of split, which allows the RXf_SPLIT to be passed into run time regex compilation. We also preserve the original flags so pattern caching works properly, by adding a new property to the regexp structure, "compflags", and related macros for accessing it. We preserve the original flags passed into the compilation process, so we can compare when we are trying to decide if we need to recompile. Note that this essentially the opposite fix from the one applied originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171. The reverted patch was meant to make: split( 0 || " ", $thing ) #1 consistent with my $x=0; split( $x || " ", $thing ) #2 and not with split( " ", $thing ) #3 This was reverted because it broke C<split("\x{20}", $thing)>, and because one might argue that is not that #1 does the wrong thing, but rather that the behavior of #2 that is wrong. In other words we might expect that all three should behave the same as #3, and that instead of "fixing" the behavior of #1 to be like #2, we should really fix the behavior of #2 to behave like #3. (Which is what we did.) Also, it doesn't make sense to move the special case detection logic further from the regex engine. We really want the regex engine to decide this stuff itself, otherwise split " ", ... wouldn't work properly with an alternate engine. (Imagine we add a special regexp meta pattern that behaves the same as " " does in a split /.../. For instance we might make split /(*SPLITWHITE)/ trigger the same behavior as split " ". The other major change as result of this patch is it effectively reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which and free up bits in the regex flags structure. But we dont want to get rid of these vars, and it turns out that RXf_SEEN_LOOKBEHIND is used only in the same situation as the new RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to RXf_NO_INPLACE_SUBST, and then instead of using two vars we use only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to have their bits back.
* Fix comment references to removed rexex opsKarl Williamson2012-12-271-1/+1
| | | | | | Commit 3018b823898645e44b8c37c70ac5c6302b031381 removed the regular expression operations (regnodes) that these comments refer to, replacing them with different ones. Update the comments to be accurate
* regcomp.c: Simply some node calculationsKarl Williamson2012-06-291-1/+3
| | | | | | | | | | | | For the node types that have differing versions depending on the character set regex modifiers, /d, /l, /u, /a, and /aa, we can use the enum values as offsets from the base node number to derive the correct one. This eliminates a number of tests. Because there is no DIGITU node type, I added placeholders for it (and NDIGITU) to avoid some special casing of it (more important in future commits). We currently have many available node types, so can afford to waste these two.
* silence picky C compiler warningDavid Mitchell2012-06-141-0/+4
| | | | and add assert that a (U32 & mask) value can fit in a U8.
* Bump several file copyright datesSteffen Schwigon2012-01-191-1/+1
| | | | | | | Sync copyright dates with actual changes according to git history. [Plus run regen_perly.h to update the SHA-256 checksums, and regen/regcharclass.pl to update regcharclass.h]
* op_reg_common.h: Fix commentKarl Williamson2011-05-181-3/+3
|
* Initial setup to accommodate /aa regex modifierKarl Williamson2011-02-141-3/+4
| | | | | This changes the bits to add a new charset type for /aa, and other bookkeeping for it.
* op_reg_common.h: add explicit castKarl Williamson2011-01-181-1/+1
| | | | | A version of the g++ compiler isn't allowing the implicit cast of U32 to an enum. Change to use an explicit cast.
* Add /a regex modifierKarl Williamson2011-01-171-1/+2
| | | | | This restricts certain constructs, like \w, to matching in the ASCII range only.
* op_reg_common: correct path in commentKarl Williamson2011-01-161-1/+2
|
* Use multi-bit field for regex character setKarl Williamson2011-01-161-5/+43
| | | | | | | | | | | | | The /d, /l, and /u regex modifiers are mutually exclusive. This patch changes the field that stores the character set to use more than one bit with an enum determining which one. This data structure more closely follows the semantics of their being mutually exclusive, and conserves bits as well, and is better expandable. A small API is added to set and query the bit field. This patch is not .xs source backwards compatible. A handful of cpan programs are affected.
* op_reg_common.h: Add guard to only expand onceKarl Williamson2011-01-161-0/+4
| | | | This is in preparation for adding some in-line functions.
* [perl #78072] use re '/xism';Father Chrysostomos2010-10-211-0/+2
|
* Add /d, /l, /u (infixed) regex modifiersKarl Williamson2010-09-221-2/+3
| | | | | | | | | | | | This patch adds recognition of these modifiers, with appropriate action for d and l. u does nothing useful yet. This allows for the interpolation of a regex into another one without losing the character set semantics that it was compiled with, as for the first time, the semantics is now specified in the stringification as one of these modifiers. To this end, it allocates an unused bit in the structures. The off- sets change so as to not disturb other bits.
* op_reg_common.h: Continue refactoringKarl Williamson2010-08-111-8/+34
| | | | | | | | | | | | | | | The new op_reg_common.h did not have in it all the things that made sense for it to have, including some comment changes that I should have made when I created it. I also realized the the new mechanism of using shifts allowed RXf_PMf_STD_PMMOD_SHIFT to actually control things, rather than be a #define that one had to remember to change if those things changed independently. Finally, I created a check so that adding bits without adding them to RXf_PMf_COMPILETIME will force a compilation error. (This came from the school of hard knocks)
* op_reg_common.h: Move things aroundKarl Williamson2010-08-111-6/+13
| | | | | | | Moving the definitions of the duplicate variables makes it easier to read. Unfortunately, the values can't be in terms of the previous ones because defsubs_h.PL doesn't pick them up. So I've made them numeric with a #if to make sure they don't drift off.
* op_reg_common.h: Refactor variable for safetyKarl Williamson2010-08-111-1/+3
| | | | | | | | This patch changes the variable that tells how many common bits there are to instead be +1 that value, so bits won't get reused. A later commit will renumber the bits in op.h and regexp.h, but for now things are left as-is there, which means the base variables in those two files must subtract one to compensate for the +1
* Refactor common parts of op.h, regexp.h into new .hKarl Williamson2010-07-291-0/+27
op.h and regexp.h share common elements in their data structures. They have had to manually be kept in sync. This patch makes it easier by putting those common parts into a common header #included by the two. To do this, it seemed easiest to change the symbol definitions to use left shifts to generate the flag bits. But this meant that regcomp.pl and axt/B/defsubs_h.PL had to be taught to recognize those forms of expressions, done in separate commits