summaryrefslogtreecommitdiff
path: root/maint/README
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-07-07 16:10:29 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-07-07 16:10:29 +0000
commit2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca (patch)
tree42b2765d206b26205f1f2e2c4c89555aed8ca6d7 /maint/README
parentc75868f77eb2ce2ff277355afcd966e3179e65a8 (diff)
downloadpcre2-2f04a0431dbcfd6a3d1e83ab2475667d40bfa6ca.tar.gz
Update to Unicode 11.0.0
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@958 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'maint/README')
-rw-r--r--maint/README20
1 files changed, 13 insertions, 7 deletions
diff --git a/maint/README b/maint/README
index fb9b7ee..d2de188 100644
--- a/maint/README
+++ b/maint/README
@@ -23,7 +23,7 @@ GenerateUtt.py A Python script to generate part of the pcre2_tables.c file
ManyConfigTests A shell script that runs "configure, make, test" a number of
times with different configuration settings.
-MultiStage2.py A Python script that generates the file pcre2_ucd.c from three
+MultiStage2.py A Python script that generates the file pcre2_ucd.c from five
Unicode data tables, which are themselves downloaded from the
Unicode web site. Run this script in the "maint" directory.
The generated file contains the tables for a 2-stage lookup
@@ -37,11 +37,17 @@ pcre2_chartables.c.non-standard
README This file.
-Unicode.tables The files in this directory (CaseFolding.txt,
- DerivedGeneralCategory.txt, GraphemeBreakProperty.txt,
- Scripts.txt and UnicodeData.txt) were downloaded from the
- Unicode web site. They contain information about Unicode
- characters and scripts.
+Unicode.tables The files in this directory were downloaded from the Unicode
+ web site. They contain information about Unicode characters
+ and scripts. The ones used by the MultiStage2.py script are
+ CaseFolding.txt, DerivedGeneralCategory.txt, Scripts.txt,
+ GraphemeBreakProperty.txt, and emoji-data.txt. I've kept
+ UnicodeData.txt (which is no longer used by the script)
+ because it is useful occasionally for manually looking up the
+ details of certain characters. However, note that character
+ names in this file such as "Arabic sign sanah" do NOT mean
+ that the character is in a particular script (in this case,
+ Arabic). Scripts.txt is where to look for script information.
ucptest.c A short C program for testing the Unicode property macros
that do lookups in the pcre2_ucd.c data, mainly useful after
@@ -359,4 +365,4 @@ very sensible; some are rather wacky. Some have been on this list for years.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 20 May 2017
+Last updated: 07 July 2018