diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2018-07-07 16:10:29 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2018-07-07 16:10:29 +0000 |
commit | 2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca (patch) | |
tree | 42b2765d206b26205f1f2e2c4c89555aed8ca6d7 /doc/pcre2pattern.3 | |
parent | c75868f77eb2ce2ff277355afcd966e3179e65a8 (diff) | |
download | pcre2-2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca.tar.gz |
Update to Unicode 11.0.0
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@958 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/pcre2pattern.3')
-rw-r--r-- | doc/pcre2pattern.3 | 34 |
1 files changed, 21 insertions, 13 deletions
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index 2b534f2..cd9a99c 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "30 June 2018" "PCRE2 10.32" +.TH PCRE2PATTERN 3 "07 July 2018" "PCRE2 10.32" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -788,6 +788,7 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Dogra, Duployan, Egyptian_Hieroglyphs, Elbasan, @@ -798,9 +799,11 @@ Gothic, Grantha, Greek, Gujarati, +Gunjala_Gondi, Gurmukhi, Han, Hangul, +Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, @@ -828,11 +831,13 @@ Lisu, Lycian, Lydian, Mahajani, +Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, +Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, @@ -855,6 +860,7 @@ Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, +Old_Sogdian, Old_South_Arabian, Old_Turkic, Oriya, @@ -875,6 +881,7 @@ Shavian, Siddham, SignWriting, Sinhala, +Sogdian, Sora_Sompeng, Soyombo, Sundanese, @@ -1003,7 +1010,10 @@ grapheme cluster", and treats the sequence as an atomic group Unicode supports various kinds of composite character by giving each character a grapheme breaking property, and having rules that use these properties to define the boundaries of extended grapheme clusters. The rules are defined in -Unicode Standard Annex 29, "Unicode Text Segmentation". +Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 +abandoned the use of some previous properties that had been used for emojis. +Instead it introduced various emoji-specific properties. PCRE2 uses only the +Extended Pictographic property. .P \eX always matches at least one character. Then it decides whether to add additional characters according to the following rules for ending a cluster: @@ -1018,22 +1028,20 @@ L, V, LV, or LVT character; an LV or V character may be followed by a V or T character; an LVT or T character may be follwed only by a T character. .P 4. Do not end before extending characters or spacing marks or the "zero-width -joiner" characters. Characters with the "mark" property always have the +joiner" character. Characters with the "mark" property always have the "extend" grapheme breaking property. .P 5. Do not end after prepend characters. .P -6. Do not break within emoji modifier sequences (a base character followed by a -modifier). Extending characters are allowed before the modifier. +6. Do not break within emoji modifier sequences or emoji zwj sequences. That +is, do not break between characters with the Extended_Pictographic property. +Extend and ZWJ characters are allowed between the characters. .P -7. Do not break within emoji zwj sequences (zero-width joiner followed by -"glue after ZWJ" or "base glue after ZWJ"). -.P -8. Do not break within emoji flag sequences. That is, do not break between +7. Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) characters if there are an odd number of RI characters before the break point. .P -6. Otherwise, end the cluster. +8. Otherwise, end the cluster. . . .\" HTML <a name="extraprops"></a> @@ -1112,8 +1120,8 @@ lead to odd effects. For example, consider this pattern: .sp (?<=\eKfoo)bar .sp -If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting -offset of 3 succeeds and reports the matching string as "foobar", that is, the +If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting +offset of 3 succeeds and reports the matching string as "foobar", that is, the start of the reported match is earlier than where the match started. . . @@ -3517,6 +3525,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 30 June 2018 +Last updated: 07 July 2018 Copyright (c) 1997-2018 University of Cambridge. .fi |