diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2020-01-01 12:07:02 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2020-01-01 12:07:02 +0000 |
commit | 470370f98ec277fed819a388af5b64c25619eeac (patch) | |
tree | 103de642550c679660a8bf47423048f856f5b2db | |
parent | fde39af34eb4a8eef2b3a3ce4b586c1763aca69c (diff) | |
download | pcre2-470370f98ec277fed819a388af5b64c25619eeac.tar.gz |
Allow real repetition of assertions.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1202 6239d852-aaf2-0410-a92c-79f79f948069
-rw-r--r-- | ChangeLog | 7 | ||||
-rw-r--r-- | doc/html/pcre2pattern.html | 39 | ||||
-rw-r--r-- | doc/pcre2.txt | 32 | ||||
-rw-r--r-- | doc/pcre2pattern.3 | 41 | ||||
-rw-r--r-- | src/pcre2_compile.c | 17 | ||||
-rw-r--r-- | testdata/testinput1 | 9 | ||||
-rw-r--r-- | testdata/testoutput1 | 21 | ||||
-rw-r--r-- | testdata/testoutput2 | 29 |
8 files changed, 114 insertions, 81 deletions
@@ -32,6 +32,13 @@ now correctly backtracked, so this unnecessary restriction has been removed. regex engine. The Perl regex folks are aware of this usage and have made a note about it. +9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to +1, believing that repeating an assertion is pointless. However, if a positive +assertion contains capturing groups, repetition can be useful. In any case, an +assertion could always be wrapped in a repeated group. The only restriction +that is now imposed is that an unlimited maximum is changed to one more than +the minimum. + Version 10.34 21-November-2019 ------------------------------ diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index 42d8515..36178b3 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -1901,8 +1901,8 @@ are permitted for groups with the same number, for example: (?|(?<AA>aa)|(?<AA>bb)) </pre> The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES -option at compile time, or by the use of (?J) within the pattern, as described -in the section entitled +option at compile time, or by the use of (?J) within the pattern, as described +in the section entitled <a href="#internaloptions">"Internal Option Setting"</a> above. </P> @@ -1968,7 +1968,7 @@ items: an escape such as \d or \pL that matches a single character a character class a backreference - a parenthesized group (including most assertions) + a parenthesized group (including lookaround assertions) a subroutine call (recursive or otherwise) </pre> The general repetition quantifier specifies a minimum and maximum number of @@ -2359,7 +2359,7 @@ of zero. For versions of PCRE2 less than 10.25, backreferences of this type used to cause the group that they reference to be treated as an <a href="#atomicgroup">atomic group.</a> -This restriction no longer applies, and backtracking into such groups can occur +This restriction no longer applies, and backtracking into such groups can occur as normal. <a name="bigassertions"></a></P> <br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br> @@ -2420,26 +2420,13 @@ control passes to the previous backtracking point, thus discarding any captured strings within the assertion. </P> <P> -For compatibility with Perl, most assertion groups may be repeated; though it -makes no sense to assert the same thing several times, the side effect of -capturing may occasionally be useful. However, an assertion that forms the -condition for a conditional group may not be quantified. In practice, for -other assertions, there only three cases: -<br> -<br> -(1) If the quantifier is {0}, the assertion is never obeyed during matching. -However, it may contain internal capture groups that are called from elsewhere -via the -<a href="#groupsassubroutines">subroutine mechanism.</a> -<br> -<br> -(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it -were {0,1}. At run time, the rest of the pattern match is tried with and -without the assertion, the order depending on the greediness of the quantifier. -<br> -<br> -(3) If the minimum repetition is greater than zero, the quantifier is ignored. -The assertion is obeyed just once when encountered during matching. +Most assertion groups may be repeated; though it makes no sense to assert the +same thing several times, the side effect of capturing in positive assertions +may occasionally be useful. However, an assertion that forms the condition for +a conditional group may not be quantified. PCRE2 used to restrict the +repetition of assertions, but from release 10.35 the only restriction is that +an unlimited maximum repetition is changed to be one more than the minimum. For +example, {3,} is treated as {3,4}. </P> <br><b> Alphabetic assertion names @@ -3840,9 +3827,9 @@ Cambridge, England. </P> <br><a name="SEC32" href="#TOC1">REVISION</a><br> <P> -Last updated: 29 December 2019 +Last updated: 01 January 2020 <br> -Copyright © 1997-2019 University of Cambridge. +Copyright © 1997-2020 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE2 index page</a>. diff --git a/doc/pcre2.txt b/doc/pcre2.txt index 974fafa..127e6ab 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -7729,7 +7729,7 @@ REPETITION an escape such as \d or \pL that matches a single character a character class a backreference - a parenthesized group (including most assertions) + a parenthesized group (including lookaround assertions) a subroutine call (recursive or otherwise) The general repetition quantifier specifies a minimum and maximum num- @@ -8162,24 +8162,14 @@ ASSERTIONS passes to the previous backtracking point, thus discarding any captured strings within the assertion. - For compatibility with Perl, most assertion groups may be repeated; - though it makes no sense to assert the same thing several times, the - side effect of capturing may occasionally be useful. However, an asser- - tion that forms the condition for a conditional group may not be quan- - tified. In practice, for other assertions, there only three cases: - - (1) If the quantifier is {0}, the assertion is never obeyed during - matching. However, it may contain internal capture groups that are - called from elsewhere via the subroutine mechanism. - - (2) If quantifier is {0,n} where n is greater than zero, it is treated - as if it were {0,1}. At run time, the rest of the pattern match is - tried with and without the assertion, the order depending on the greed- - iness of the quantifier. - - (3) If the minimum repetition is greater than zero, the quantifier is - ignored. The assertion is obeyed just once when encountered during - matching. + Most assertion groups may be repeated; though it makes no sense to as- + sert the same thing several times, the side effect of capturing in pos- + itive assertions may occasionally be useful. However, an assertion that + forms the condition for a conditional group may not be quantified. + PCRE2 used to restrict the repetition of assertions, but from release + 10.35 the only restriction is that an unlimited maximum repetition is + changed to be one more than the minimum. For example, {3,} is treated + as {3,4}. Alphabetic assertion names @@ -9490,8 +9480,8 @@ AUTHOR REVISION - Last updated: 29 December 2019 - Copyright (c) 1997-2019 University of Cambridge. + Last updated: 01 January 2020 + Copyright (c) 1997-2020 University of Cambridge. ------------------------------------------------------------------------------ diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index 9015679..c613878 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35" +.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -1902,8 +1902,8 @@ are permitted for groups with the same number, for example: (?|(?<AA>aa)|(?<AA>bb)) .sp The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES -option at compile time, or by the use of (?J) within the pattern, as described -in the section entitled +option at compile time, or by the use of (?J) within the pattern, as described +in the section entitled .\" HTML <a href="#internaloptions"> .\" </a> "Internal Option Setting" @@ -1975,7 +1975,7 @@ items: an escape such as \ed or \epL that matches a single character a character class a backreference - a parenthesized group (including most assertions) + a parenthesized group (including lookaround assertions) a subroutine call (recursive or otherwise) .sp The general repetition quantifier specifies a minimum and maximum number of @@ -2362,7 +2362,7 @@ cause the group that they reference to be treated as an .\" </a> atomic group. .\" -This restriction no longer applies, and backtracking into such groups can occur +This restriction no longer applies, and backtracking into such groups can occur as normal. . . @@ -2431,26 +2431,13 @@ the "no" branch of the condition. For other failing negative assertions, control passes to the previous backtracking point, thus discarding any captured strings within the assertion. .P -For compatibility with Perl, most assertion groups may be repeated; though it -makes no sense to assert the same thing several times, the side effect of -capturing may occasionally be useful. However, an assertion that forms the -condition for a conditional group may not be quantified. In practice, for -other assertions, there only three cases: -.sp -(1) If the quantifier is {0}, the assertion is never obeyed during matching. -However, it may contain internal capture groups that are called from elsewhere -via the -.\" HTML <a href="#groupsassubroutines"> -.\" </a> -subroutine mechanism. -.\" -.sp -(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it -were {0,1}. At run time, the rest of the pattern match is tried with and -without the assertion, the order depending on the greediness of the quantifier. -.sp -(3) If the minimum repetition is greater than zero, the quantifier is ignored. -The assertion is obeyed just once when encountered during matching. +Most assertion groups may be repeated; though it makes no sense to assert the +same thing several times, the side effect of capturing in positive assertions +may occasionally be useful. However, an assertion that forms the condition for +a conditional group may not be quantified. PCRE2 used to restrict the +repetition of assertions, but from release 10.35 the only restriction is that +an unlimited maximum repetition is changed to be one more than the minimum. For +example, {3,} is treated as {3,4}. . . .SS "Alphabetic assertion names" @@ -3884,6 +3871,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 29 December 2019 -Copyright (c) 1997-2019 University of Cambridge. +Last updated: 01 January 2020 +Copyright (c) 1997-2020 University of Cambridge. .fi diff --git a/src/pcre2_compile.c b/src/pcre2_compile.c index ed4fc74..0350328 100644 --- a/src/pcre2_compile.c +++ b/src/pcre2_compile.c @@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel Original API code Copyright (c) 1997-2012 University of Cambridge - New API code Copyright (c) 2016-2019 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -7074,15 +7074,18 @@ for (;; pptr++) previous[GET(previous, 1)] != OP_ALT) goto END_REPEAT; - /* There is no sense in actually repeating assertions. The only - potential use of repetition is in cases when the assertion is optional. - Therefore, if the minimum is greater than zero, just ignore the repeat. - If the maximum is not zero or one, set it to 1. */ + /* Perl allows all assertions to be quantified, and when they contain + capturing parentheses and/or are optional there are potential uses for + this feature. PCRE2 used to force the maximum quantifier to 1 on the + invalid grounds that further repetition was never useful. This was + always a bit pointless, since an assertion could be wrapped with a + repeated group to achieve the effect. General repetition is now + permitted, but if the maximum is unlimited it is set to one more than + the minimum. */ if (op_previous < OP_ONCE) /* Assertion */ { - if (repeat_min > 0) goto END_REPEAT; - if (repeat_max > 1) repeat_max = 1; + if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1; } /* The case of a zero minimum is special because of the need to stick diff --git a/testdata/testinput1 b/testdata/testinput1 index 109de29..9d7821d 100644 --- a/testdata/testinput1 +++ b/testdata/testinput1 @@ -6393,4 +6393,13 @@ ef) x/x,mark /^((\1+)|\d)+133X$/ 111133X +/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i + The quick brown fox jumps over the lazy dog. + Jackdaws love my big sphinx of quartz. + Pack my box with five dozen liquor jugs. +\= Expect no match + The quick brown fox jumps over the lazy cat. + Hackdaws love my big sphinx of quartz. + Pack my fox with five dozen liquor jugs. + # End of testinput1 diff --git a/testdata/testoutput1 b/testdata/testoutput1 index c425ed4..79acf04 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -10126,4 +10126,25 @@ No match 1: 11 2: 11 +/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i + The quick brown fox jumps over the lazy dog. + 0: + 1: quick brown fox jumps over the lazy dog. + 2: q + Jackdaws love my big sphinx of quartz. + 0: + 1: Jackdaws love my big sphinx of quartz. + 2: J + Pack my box with five dozen liquor jugs. + 0: + 1: Pack my box with five dozen liquor jugs. + 2: P +\= Expect no match + The quick brown fox jumps over the lazy cat. +No match + Hackdaws love my big sphinx of quartz. +No match + Pack my fox with five dozen liquor jugs. +No match + # End of testinput1 diff --git a/testdata/testoutput2 b/testdata/testoutput2 index 438aefe..3a46a0a 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -10962,6 +10962,12 @@ Matched, but too many substrings Assert abc Ket + Assert + abc + Ket + Assert + abc + Ket abc Ket End @@ -10973,6 +10979,10 @@ Matched, but too many substrings Assert abc Ket + Brazero + Assert + abc + Ket abc Ket End @@ -10981,9 +10991,15 @@ Matched, but too many substrings /(?=abc)++abc/B ------------------------------------------------------------------ Bra + Once Assert abc Ket + Brazero + Assert + abc + Ket + Ket abc Ket End @@ -16610,6 +16626,19 @@ No match Assert Any Ket + Assert + Any + Ket + Assert + Any + Ket + Assert + Any + Ket + Brazero + Assert + Any + Ket x Ket Ket |