diff options
-rw-r--r-- | pcre/AUTHORS | 6 | ||||
-rw-r--r-- | pcre/ChangeLog | 43 | ||||
-rw-r--r-- | pcre/LICENCE | 10 | ||||
-rw-r--r-- | pcre/NEWS | 10 | ||||
-rw-r--r-- | pcre/configure.ac | 10 | ||||
-rw-r--r-- | pcre/pcre_compile.c | 18 | ||||
-rw-r--r-- | pcre/pcre_jit_compile.c | 2 | ||||
-rw-r--r-- | pcre/pcrecpp.cc | 64 | ||||
-rw-r--r-- | pcre/pcrecpp_unittest.cc | 34 | ||||
-rw-r--r-- | pcre/pcregrep.c | 4 | ||||
-rw-r--r-- | pcre/testdata/testinput1 | 15 | ||||
-rw-r--r-- | pcre/testdata/testinput2 | 3 | ||||
-rw-r--r-- | pcre/testdata/testinput4 | 3 | ||||
-rw-r--r-- | pcre/testdata/testoutput1 | 24 | ||||
-rw-r--r-- | pcre/testdata/testoutput2 | 4 | ||||
-rw-r--r-- | pcre/testdata/testoutput4 | 4 |
16 files changed, 227 insertions, 27 deletions
diff --git a/pcre/AUTHORS b/pcre/AUTHORS index eb9b1a44b34..23c005a33d6 100644 --- a/pcre/AUTHORS +++ b/pcre/AUTHORS @@ -8,7 +8,7 @@ Email domain: cam.ac.uk University of Cambridge Computing Service, Cambridge, England. -Copyright (c) 1997-2018 University of Cambridge +Copyright (c) 1997-2019 University of Cambridge All rights reserved @@ -19,7 +19,7 @@ Written by: Zoltan Herczeg Email local part: hzmester Emain domain: freemail.hu -Copyright(c) 2010-2018 Zoltan Herczeg +Copyright(c) 2010-2019 Zoltan Herczeg All rights reserved. @@ -30,7 +30,7 @@ Written by: Zoltan Herczeg Email local part: hzmester Emain domain: freemail.hu -Copyright(c) 2009-2018 Zoltan Herczeg +Copyright(c) 2009-2019 Zoltan Herczeg All rights reserved. diff --git a/pcre/ChangeLog b/pcre/ChangeLog index 7b53195f6a6..e4d2d9fa24c 100644 --- a/pcre/ChangeLog +++ b/pcre/ChangeLog @@ -5,6 +5,49 @@ Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All development is happening in the PCRE2 10.xx series. +Version 8.43 23-February-2019 +----------------------------- + +1. Some time ago the config macro SUPPORT_UTF8 was changed to SUPPORT_UTF +because it also applies to UTF-16 and UTF-32. However, this change was not made +in the pcre2cpp files; consequently the C++ wrapper has from then been compiled +with a bug in it, which would have been picked up by the unit test except that +it also had its UTF8 code cut out. The bug was in a global replace when moving +forward after matching an empty string. + +2. The C++ wrapper got broken a long time ago (version 7.3, August 2007) when +(*CR) was invented (assuming it was the first such start-of-pattern option). +The wrapper could never handle such patterns because it wraps patterns in +(?:...)\z in order to support end anchoring. I have hacked in some code to fix +this, that is, move the wrapping till after any existing start-of-pattern +special settings. + +3. "pcre2grep" (sic) was accidentally mentioned in an error message (fix was +ported from PCRE2). + +4. Typo LCC_ALL for LC_ALL fixed in pcregrep. + +5. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated +negative class with no characters less than 0x100 followed by a positive class +with only characters less than 0x100, the first class was incorrectly being +auto-possessified, causing incorrect match failures. + +6. If the only branch in a conditional subpattern was anchored, the whole +subpattern was treated as anchored, when it should not have been, since the +assumed empty second branch cannot be anchored. Demonstrated by test patterns +such as /(?(1)^())b/ or /(?(?=^))b/. + +7. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has +a greater than 1 fixed quantifier. This issue was found by Yunho Kim. + +8. If a pattern started with a subroutine call that had a quantifier with a +minimum of zero, an incorrect "match must start with this character" could be +recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to +be the first character of a match. + +9. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. + + Version 8.42 20-March-2018 -------------------------- diff --git a/pcre/LICENCE b/pcre/LICENCE index f6ef7fd7664..760a6666b60 100644 --- a/pcre/LICENCE +++ b/pcre/LICENCE @@ -25,7 +25,7 @@ Email domain: cam.ac.uk University of Cambridge Computing Service, Cambridge, England. -Copyright (c) 1997-2018 University of Cambridge +Copyright (c) 1997-2019 University of Cambridge All rights reserved. @@ -34,9 +34,9 @@ PCRE JUST-IN-TIME COMPILATION SUPPORT Written by: Zoltan Herczeg Email local part: hzmester -Emain domain: freemail.hu +Email domain: freemail.hu -Copyright(c) 2010-2018 Zoltan Herczeg +Copyright(c) 2010-2019 Zoltan Herczeg All rights reserved. @@ -45,9 +45,9 @@ STACK-LESS JUST-IN-TIME COMPILER Written by: Zoltan Herczeg Email local part: hzmester -Emain domain: freemail.hu +Email domain: freemail.hu -Copyright(c) 2009-2018 Zoltan Herczeg +Copyright(c) 2009-2019 Zoltan Herczeg All rights reserved. diff --git a/pcre/NEWS b/pcre/NEWS index 09b4ad36003..0f184081740 100644 --- a/pcre/NEWS +++ b/pcre/NEWS @@ -1,6 +1,16 @@ News about PCRE releases ------------------------ +Note that this library (now called PCRE1) is now being maintained for bug fixes +only. New projects are advised to use the new PCRE2 libraries. + + +Release 8.43 23-February-2019 +----------------------------- + +This is a bug-fix release. + + Release 8.42 20-March-2018 -------------------------- diff --git a/pcre/configure.ac b/pcre/configure.ac index dcdef6a9427..d2e5236cbd6 100644 --- a/pcre/configure.ac +++ b/pcre/configure.ac @@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might dnl be defined as -RC2, for example. For real releases, it should be empty. m4_define(pcre_major, [8]) -m4_define(pcre_minor, [42]) +m4_define(pcre_minor, [43]) m4_define(pcre_prerelease, []) -m4_define(pcre_date, [2018-03-20]) +m4_define(pcre_date, [2019-02-23]) # NOTE: The CMakeLists.txt file searches for the above variables in the first # 50 lines of this file. Please update that if the variables above are moved. # Libtool shared library interface versions (current:revision:age) -m4_define(libpcre_version, [3:10:2]) -m4_define(libpcre16_version, [2:10:2]) -m4_define(libpcre32_version, [0:10:0]) +m4_define(libpcre_version, [3:11:2]) +m4_define(libpcre16_version, [2:11:2]) +m4_define(libpcre32_version, [0:11:0]) m4_define(libpcreposix_version, [0:6:0]) m4_define(libpcrecpp_version, [0:1:0]) diff --git a/pcre/pcre_compile.c b/pcre/pcre_compile.c index 9b9da46f0d0..734875de2fb 100644 --- a/pcre/pcre_compile.c +++ b/pcre/pcre_compile.c @@ -6,7 +6,7 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2016 University of Cambridge + Copyright (c) 1997-2018 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -3300,7 +3300,7 @@ for(;;) if ((*xclass_flags & XCL_MAP) == 0) { /* No bits are set for characters < 256. */ - if (list[1] == 0) return TRUE; + if (list[1] == 0) return (*xclass_flags & XCL_NOT) == 0; /* Might be an empty repeat. */ continue; } @@ -7645,6 +7645,8 @@ for (;; ptr++) /* Can't determine a first byte now */ if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; + zerofirstchar = firstchar; + zerofirstcharflags = firstcharflags; continue; @@ -8685,10 +8687,18 @@ do { if (!is_anchored(scode, new_map, cd, atomcount)) return FALSE; } - /* Positive forward assertions and conditions */ + /* Positive forward assertion */ - else if (op == OP_ASSERT || op == OP_COND) + else if (op == OP_ASSERT) + { + if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE; + } + + /* Condition; not anchored if no second branch */ + + else if (op == OP_COND) { + if (scode[GET(scode,1)] != OP_ALT) return FALSE; if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE; } diff --git a/pcre/pcre_jit_compile.c b/pcre/pcre_jit_compile.c index 2bad74b0231..bc5f9c01433 100644 --- a/pcre/pcre_jit_compile.c +++ b/pcre/pcre_jit_compile.c @@ -9002,7 +9002,7 @@ if (exact > 1) #ifdef SUPPORT_UTF && !common->utf #endif - ) + && type != OP_ANYNL && type != OP_EXTUNI) { OP2(SLJIT_ADD, TMP1, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(exact)); add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_GREATER, TMP1, 0, STR_END, 0)); diff --git a/pcre/pcrecpp.cc b/pcre/pcrecpp.cc index d09c9abc516..77a2fedc4be 100644 --- a/pcre/pcrecpp.cc +++ b/pcre/pcrecpp.cc @@ -80,6 +80,24 @@ static const string empty_string; // If the user doesn't ask for any options, we just use this one static RE_Options default_options; +// Specials for the start of patterns. See comments where start_options is used +// below. (PH June 2018) +static const char *start_options[] = { + "(*UTF8)", + "(*UTF)", + "(*UCP)", + "(*NO_START_OPT)", + "(*NO_AUTO_POSSESS)", + "(*LIMIT_RECURSION=", + "(*LIMIT_MATCH=", + "(*CRLF)", + "(*CR)", + "(*BSR_UNICODE)", + "(*BSR_ANYCRLF)", + "(*ANYCRLF)", + "(*ANY)", + "" }; + void RE::Init(const string& pat, const RE_Options* options) { pattern_ = pat; if (options == NULL) { @@ -135,7 +153,49 @@ pcre* RE::Compile(Anchor anchor) { } else { // Tack a '\z' at the end of RE. Parenthesize it first so that // the '\z' applies to all top-level alternatives in the regexp. - string wrapped = "(?:"; // A non-counting grouping operator + + /* When this code was written (for PCRE 6.0) it was enough just to + parenthesize the entire pattern. Unfortunately, when the feature of + starting patterns with (*UTF8) or (*CR) etc. was added to PCRE patterns, + this code was never updated. This bug was not noticed till 2018, long after + PCRE became obsolescent and its maintainer no longer around. Since PCRE is + frozen, I have added a hack to check for all the existing "start of + pattern" specials - knowing that no new ones will ever be added. I am not a + C++ programmer, so the code style is no doubt crude. It is also + inefficient, but is only run when the pattern starts with "(*". + PH June 2018. */ + + string wrapped = ""; + + if (pattern_.c_str()[0] == '(' && pattern_.c_str()[1] == '*') { + int kk, klen, kmat; + for (;;) { // Loop for any number of leading items + + for (kk = 0; start_options[kk][0] != 0; kk++) { + klen = strlen(start_options[kk]); + kmat = strncmp(pattern_.c_str(), start_options[kk], klen); + if (kmat >= 0) break; + } + if (kmat != 0) break; // Not found + + // If the item ended in "=" we must copy digits up to ")". + + if (start_options[kk][klen-1] == '=') { + while (isdigit(pattern_.c_str()[klen])) klen++; + if (pattern_.c_str()[klen] != ')') break; // Syntax error + klen++; + } + + // Move the item from the pattern to the start of the wrapped string. + + wrapped += pattern_.substr(0, klen); + pattern_.erase(0, klen); + } + } + + // Wrap the rest of the pattern. + + wrapped += "(?:"; // A non-counting grouping operator wrapped += pattern_; wrapped += ")\\z"; re = pcre_compile(wrapped.c_str(), pcre_options, @@ -415,7 +475,7 @@ int RE::GlobalReplace(const StringPiece& rewrite, matchend++; } // We also need to advance more than one char if we're in utf8 mode. -#ifdef SUPPORT_UTF8 +#ifdef SUPPORT_UTF if (options_.utf8()) { while (matchend < static_cast<int>(str->length()) && ((*str)[matchend] & 0xc0) == 0x80) diff --git a/pcre/pcrecpp_unittest.cc b/pcre/pcrecpp_unittest.cc index 4b15fbef1c3..1fc01a042b3 100644 --- a/pcre/pcrecpp_unittest.cc +++ b/pcre/pcrecpp_unittest.cc @@ -309,7 +309,7 @@ static void TestReplace() { "@aa", "@@@", 3 }, -#ifdef SUPPORT_UTF8 +#ifdef SUPPORT_UTF { "b*", "bb", "\xE3\x83\x9B\xE3\x83\xBC\xE3\x83\xA0\xE3\x81\xB8", // utf8 @@ -327,7 +327,7 @@ static void TestReplace() { { "", NULL, NULL, NULL, NULL, 0 } }; -#ifdef SUPPORT_UTF8 +#ifdef SUPPORT_UTF const bool support_utf8 = true; #else const bool support_utf8 = false; @@ -535,7 +535,7 @@ static void TestQuoteMetaLatin1() { } static void TestQuoteMetaUtf8() { -#ifdef SUPPORT_UTF8 +#ifdef SUPPORT_UTF TestQuoteMeta("Pl\xc3\xa1\x63ido Domingo", pcrecpp::UTF8()); TestQuoteMeta("xyz", pcrecpp::UTF8()); // No fancy utf8 TestQuoteMeta("\xc2\xb0", pcrecpp::UTF8()); // 2-byte utf8 (degree symbol) @@ -1178,7 +1178,7 @@ int main(int argc, char** argv) { CHECK(re.error().empty()); // Must have no error } -#ifdef SUPPORT_UTF8 +#ifdef SUPPORT_UTF // Check UTF-8 handling { printf("Testing UTF-8 handling\n"); @@ -1203,6 +1203,30 @@ int main(int argc, char** argv) { RE re_test2("...", pcrecpp::UTF8()); CHECK(re_test2.FullMatch(utf8_string)); + // PH added these tests for leading option settings + + RE re_testZ0("(*CR)(*NO_START_OPT)........."); + CHECK(re_testZ0.FullMatch(utf8_string)); + +#ifdef SUPPORT_UTF + RE re_testZ1("(*UTF8)..."); + CHECK(re_testZ1.FullMatch(utf8_string)); + + RE re_testZ2("(*UTF)..."); + CHECK(re_testZ2.FullMatch(utf8_string)); + +#ifdef SUPPORT_UCP + RE re_testZ3("(*UCP)(*UTF)..."); + CHECK(re_testZ3.FullMatch(utf8_string)); + + RE re_testZ4("(*UCP)(*LIMIT_MATCH=1000)(*UTF)..."); + CHECK(re_testZ4.FullMatch(utf8_string)); + + RE re_testZ5("(*UCP)(*LIMIT_MATCH=1000)(*ANY)(*UTF)..."); + CHECK(re_testZ5.FullMatch(utf8_string)); +#endif +#endif + // Check that '.' matches one byte or UTF-8 character // according to the mode. string ss; @@ -1248,7 +1272,7 @@ int main(int argc, char** argv) { CHECK(!match_sentence.FullMatch(target)); CHECK(!match_sentence_re.FullMatch(target)); } -#endif /* def SUPPORT_UTF8 */ +#endif /* def SUPPORT_UTF */ printf("Testing error reporting\n"); diff --git a/pcre/pcregrep.c b/pcre/pcregrep.c index a406be962d7..5982406862b 100644 --- a/pcre/pcregrep.c +++ b/pcre/pcregrep.c @@ -2252,7 +2252,7 @@ if (isdirectory(pathname)) int fnlength = strlen(pathname) + strlen(nextfile) + 2; if (fnlength > 2048) { - fprintf(stderr, "pcre2grep: recursive filename is too long\n"); + fprintf(stderr, "pcregrep: recursive filename is too long\n"); rc = 2; break; } @@ -3034,7 +3034,7 @@ LC_ALL environment variable is set, and if so, use it. */ if (locale == NULL) { locale = getenv("LC_ALL"); - locale_from = "LCC_ALL"; + locale_from = "LC_ALL"; } if (locale == NULL) diff --git a/pcre/testdata/testinput1 b/pcre/testdata/testinput1 index 5c23f41fa81..02e4f4825fc 100644 --- a/pcre/testdata/testinput1 +++ b/pcre/testdata/testinput1 @@ -5742,4 +5742,19 @@ AbcdCBefgBhiBqz /X+(?#comment)?/ >XXX< +/ (?<word> \w+ )* \. /xi + pokus. + +/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi + pokus. + +/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi + pokus. + +/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi + pokus. + +/(?&word)* \. (?<word> \w+ )/xi + pokus.hokus + /-- End of testinput1 --/ diff --git a/pcre/testdata/testinput2 b/pcre/testdata/testinput2 index 8ba4dc4ddab..3528de153eb 100644 --- a/pcre/testdata/testinput2 +++ b/pcre/testdata/testinput2 @@ -4257,4 +4257,7 @@ backtracking verbs. --/ ab aaab +/(?(?=^))b/ + abc + /-- End of testinput2 --/ diff --git a/pcre/testdata/testinput4 b/pcre/testdata/testinput4 index 8bdbdac4c26..63368c0a097 100644 --- a/pcre/testdata/testinput4 +++ b/pcre/testdata/testinput4 @@ -727,4 +727,7 @@ /\C(\W?ſ)'?{{/8 \\C(\\W?ſ)'?{{ +/[^\x{100}-\x{ffff}]*[\x80-\xff]/8 + \x{99}\x{99}\x{99} + /-- End of testinput4 --/ diff --git a/pcre/testdata/testoutput1 b/pcre/testdata/testoutput1 index eff8ecc948c..e6147e60b95 100644 --- a/pcre/testdata/testoutput1 +++ b/pcre/testdata/testoutput1 @@ -9446,4 +9446,28 @@ No match >XXX< 0: X +/ (?<word> \w+ )* \. /xi + pokus. + 0: pokus. + 1: pokus + +/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi + pokus. + 0: pokus. + +/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi + pokus. + 0: pokus. + 1: <unset> + 2: pokus + +/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi + pokus. + 0: pokus. + +/(?&word)* \. (?<word> \w+ )/xi + pokus.hokus + 0: pokus.hokus + 1: hokus + /-- End of testinput1 --/ diff --git a/pcre/testdata/testoutput2 b/pcre/testdata/testoutput2 index 61ed8d9d4e4..4ccda272010 100644 --- a/pcre/testdata/testoutput2 +++ b/pcre/testdata/testoutput2 @@ -14721,4 +14721,8 @@ No need char 0: ab 1: a +/(?(?=^))b/ + abc + 0: b + /-- End of testinput2 --/ diff --git a/pcre/testdata/testoutput4 b/pcre/testdata/testoutput4 index d43c12392dd..69e812cd357 100644 --- a/pcre/testdata/testoutput4 +++ b/pcre/testdata/testoutput4 @@ -1277,4 +1277,8 @@ No match \\C(\\W?ſ)'?{{ No match +/[^\x{100}-\x{ffff}]*[\x80-\xff]/8 + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + /-- End of testinput4 --/ |