From 2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca Mon Sep 17 00:00:00 2001
From: ph10
Date: Sat, 7 Jul 2018 16:10:29 +0000
Subject: Update to Unicode 11.0.0
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@958 6239d852-aaf2-0410-a92c-79f79f948069
---
doc/html/pcre2pattern.html | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
(limited to 'doc/html/pcre2pattern.html')
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index 9adc426..9d241b7 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -789,6 +789,7 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
+Dogra,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
@@ -799,9 +800,11 @@ Gothic,
Grantha,
Greek,
Gujarati,
+Gunjala_Gondi,
Gurmukhi,
Han,
Hangul,
+Hanifi_Rohingya,
Hanunoo,
Hatran,
Hebrew,
@@ -829,11 +832,13 @@ Lisu,
Lycian,
Lydian,
Mahajani,
+Makasar,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
+Medefaidrin,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@@ -856,6 +861,7 @@ Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
+Old_Sogdian,
Old_South_Arabian,
Old_Turkic,
Oriya,
@@ -876,6 +882,7 @@ Shavian,
Siddham,
SignWriting,
Sinhala,
+Sogdian,
Sora_Sompeng,
Soyombo,
Sundanese,
@@ -1006,7 +1013,10 @@ grapheme cluster", and treats the sequence as an atomic group
Unicode supports various kinds of composite character by giving each character
a grapheme breaking property, and having rules that use these properties to
define the boundaries of extended grapheme clusters. The rules are defined in
-Unicode Standard Annex 29, "Unicode Text Segmentation".
+Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0
+abandoned the use of some previous properties that had been used for emojis.
+Instead it introduced various emoji-specific properties. PCRE2 uses only the
+Extended Pictographic property.
\X always matches at least one character. Then it decides whether to add
@@ -1026,27 +1036,24 @@ character; an LVT or T character may be follwed only by a T character.
4. Do not end before extending characters or spacing marks or the "zero-width
-joiner" characters. Characters with the "mark" property always have the
+joiner" character. Characters with the "mark" property always have the
"extend" grapheme breaking property.
5. Do not end after prepend characters.
-6. Do not break within emoji modifier sequences (a base character followed by a
-modifier). Extending characters are allowed before the modifier.
+6. Do not break within emoji modifier sequences or emoji zwj sequences. That
+is, do not break between characters with the Extended_Pictographic property.
+Extend and ZWJ characters are allowed between the characters.
-7. Do not break within emoji zwj sequences (zero-width joiner followed by
-"glue after ZWJ" or "base glue after ZWJ").
-
-
-8. Do not break within emoji flag sequences. That is, do not break between
+7. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) characters if there are an odd number of RI characters
before the break point.
-6. Otherwise, end the cluster.
+8. Otherwise, end the cluster.
PCRE2's additional properties
@@ -1119,8 +1126,8 @@ lead to odd effects. For example, consider this pattern:
(?<=\Kfoo)bar
-If the subject is "foobar", a call to pcre2_match() with a starting
-offset of 3 succeeds and reports the matching string as "foobar", that is, the
+If the subject is "foobar", a call to pcre2_match() with a starting
+offset of 3 succeeds and reports the matching string as "foobar", that is, the
start of the reported match is earlier than where the match started.
@@ -3490,7 +3497,7 @@ Cambridge, England.
REVISION
-Last updated: 30 June 2018
+Last updated: 07 July 2018
Copyright © 1997-2018 University of Cambridge.
--
cgit v1.2.1