Update to Unicode 11.0.0

git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@958 6239d852-aaf2-0410-a92c-79f79f948069
author: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> 2018-07-07 16:10:29 +0000
committer: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> 2018-07-07 16:10:29 +0000
commit: 2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca (patch)
tree: 42b2765d206b26205f1f2e2c4c89555aed8ca6d7 /doc/pcre2pattern.3
parent: c75868f77eb2ce2ff277355afcd966e3179e65a8 (diff)
download: pcre2-2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca.tar.gz
1 files changed, 21 insertions, 13 deletions
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3
index 2b534f2..cd9a99c 100644
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "30 June 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "07 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -788,6 +788,7 @@ Cypriot,
 Cyrillic,
 Deseret,
 Devanagari,
+Dogra,
 Duployan,
 Egyptian_Hieroglyphs,
 Elbasan,
@@ -798,9 +799,11 @@ Gothic,
 Grantha,
 Greek,
 Gujarati,
+Gunjala_Gondi,
 Gurmukhi,
 Han,
 Hangul,
+Hanifi_Rohingya,
 Hanunoo,
 Hatran,
 Hebrew,
@@ -828,11 +831,13 @@ Lisu,
 Lycian,
 Lydian,
 Mahajani,
+Makasar,
 Malayalam,
 Mandaic,
 Manichaean,
 Marchen,
 Masaram_Gondi,
+Medefaidrin,
 Meetei_Mayek,
 Mende_Kikakui,
 Meroitic_Cursive,
@@ -855,6 +860,7 @@ Old_Italic,
 Old_North_Arabian,
 Old_Permic,
 Old_Persian,
+Old_Sogdian,
 Old_South_Arabian,
 Old_Turkic,
 Oriya,
@@ -875,6 +881,7 @@ Shavian,
 Siddham,
 SignWriting,
 Sinhala,
+Sogdian,
 Sora_Sompeng,
 Soyombo,
 Sundanese,
@@ -1003,7 +1010,10 @@ grapheme cluster", and treats the sequence as an atomic group
 Unicode supports various kinds of composite character by giving each character
 a grapheme breaking property, and having rules that use these properties to
 define the boundaries of extended grapheme clusters. The rules are defined in
-Unicode Standard Annex 29, "Unicode Text Segmentation".
+Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 
+abandoned the use of some previous properties that had been used for emojis. 
+Instead it introduced various emoji-specific properties. PCRE2 uses only the
+Extended Pictographic property.
 .P
 \eX always matches at least one character. Then it decides whether to add
 additional characters according to the following rules for ending a cluster:
@@ -1018,22 +1028,20 @@ L, V, LV, or LVT character; an LV or V character may be followed by a V or T
 character; an LVT or T character may be follwed only by a T character.
 .P
 4. Do not end before extending characters or spacing marks or the "zero-width
-joiner" characters. Characters with the "mark" property always have the
+joiner" character. Characters with the "mark" property always have the
 "extend" grapheme breaking property.
 .P
 5. Do not end after prepend characters.
 .P
-6. Do not break within emoji modifier sequences (a base character followed by a
-modifier). Extending characters are allowed before the modifier.
+6. Do not break within emoji modifier sequences or emoji zwj sequences. That
+is, do not break between characters with the Extended_Pictographic property.
+Extend and ZWJ characters are allowed between the characters.
 .P
-7. Do not break within emoji zwj sequences (zero-width joiner followed by
-"glue after ZWJ" or "base glue after ZWJ").
-.P
-8. Do not break within emoji flag sequences. That is, do not break between
+7. Do not break within emoji flag sequences. That is, do not break between
 regional indicator (RI) characters if there are an odd number of RI characters
 before the break point.
 .P
-6. Otherwise, end the cluster.
+8. Otherwise, end the cluster.
 .
 .
 .\" HTML <a name="extraprops"></a>
@@ -1112,8 +1120,8 @@ lead to odd effects. For example, consider this pattern:
 .sp
   (?<=\eKfoo)bar
 .sp
-If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting 
-offset of 3 succeeds and reports the matching string as "foobar", that is, the 
+If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting
+offset of 3 succeeds and reports the matching string as "foobar", that is, the
 start of the reported match is earlier than where the match started.
 .
 .
@@ -3517,6 +3525,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 June 2018
+Last updated: 07 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi
author	ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>	2018-07-07 16:10:29 +0000
committer	ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>	2018-07-07 16:10:29 +0000
commit	2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca (patch)
tree	42b2765d206b26205f1f2e2c4c89555aed8ca6d7 /doc/pcre2pattern.3
parent	c75868f77eb2ce2ff277355afcd966e3179e65a8 (diff)
download	pcre2-2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca.tar.gz