Load pcre-3.3 into code/trunk.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@49 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-02-24 21:39:33 +0000
committer: nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2007-02-24 21:39:33 +0000
commit: 722283cf906c849b43a73af9527627e0fd2a3e8d (patch)
tree: a6d41530464f8772bddde9ff3770c2b29b81f7ce
parent: b82aaed025b2fb55a381b51a3cf13a06c2e8ceff (diff)
download: pcre-722283cf906c849b43a73af9527627e0fd2a3e8d.tar.gz
43 files changed, 2372 insertions, 313 deletions
diff --git a/COPYING b/COPYING
index f305033..34d20db 100644
--- a/COPYING
+++ b/COPYING
@@ -20,7 +20,21 @@ restrictions:
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 2. The origin of this software must not be misrepresented, either by
-   explicit claim or by omission.
+   explicit claim or by omission. In practice, this means that if you use
+   PCRE in software which you distribute to others, commercially or
+   otherwise, you must put a sentence like this
+
+     Regular expression support is provided by the PCRE library package,
+     which is open source software, written by Philip Hazel, and copyright
+     by the University of Cambridge, England.
+
+   somewhere reasonably visible in your documentation and in any relevant
+   files or online help data or similar. A reference to the ftp site for
+   the source, that is, to
+
+     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
+
+   should also be given in the documentation.
 
 3. Altered versions must be plainly marked as such, and must not be
    misrepresented as being the original software.
diff --git a/ChangeLog b/ChangeLog
index 5bedd53..47bbbe5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,38 @@ ChangeLog for PCRE
 ------------------
 
 
+Version 3.3 01-Aug-00
+---------------------
+
+1. If an octal character was given, but the value was greater than \377, it
+was not getting masked to the least significant bits, as documented. This could
+lead to crashes in some systems.
+
+2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
+the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.
+
+3. Added the functions pcre_free_substring() and pcre_free_substring_list().
+These just pass their arguments on to (pcre_free)(), but they are provided
+because some uses of PCRE bind it to non-C systems that can call its functions,
+but cannot call free() or pcre_free() directly.
+
+4. Add "make test" as a synonym for "make check". Corrected some comments in
+the Makefile.
+
+5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the
+Makefile.
+
+6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
+command called pgrep for grepping around the active processes.
+
+7. Added the beginnings of support for UTF-8 character strings.
+
+8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and
+RANLIB to ./ltconfig so that they are used by libtool. I think these are all
+the relevant ones. (AR is not passed because ./ltconfig does its own figuring
+out for the ar command.)
+
+
 Version 3.2 12-May-00
 ---------------------
 
diff --git a/INSTALL b/INSTALL
index d63a78f..0880281 100644
--- a/INSTALL
+++ b/INSTALL
@@ -4,7 +4,7 @@ Basic Installation
    These are generic installation instructions that apply to systems that
 can run the `configure' shell script - Unix systems and any that imitate
 it. They are not specific to PCRE. There are PCRE-specific instructions
-for non-Unix systems in the file NON-UNIX.
+for non-Unix systems in the file NON-UNIX-USE.
 
    The `configure' shell script attempts to guess correct values for
 various system-dependent variables used during compilation.  It uses
diff --git a/LICENCE b/LICENCE
index 8422bd9..34d20db 100644
--- a/LICENCE
+++ b/LICENCE
@@ -20,19 +20,21 @@ restrictions:
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 2. The origin of this software must not be misrepresented, either by
-   explicit claim or by omission. In practice, this means you must put
-   a sentence like this
+   explicit claim or by omission. In practice, this means that if you use
+   PCRE in software which you distribute to others, commercially or
+   otherwise, you must put a sentence like this
 
      Regular expression support is provided by the PCRE library package,
-     which is open source software, copyright by the University of
-     Cambridge.
+     which is open source software, written by Philip Hazel, and copyright
+     by the University of Cambridge, England.
 
    somewhere reasonably visible in your documentation and in any relevant
-   files. A reference to the ftp site for the source should also be given
+   files or online help data or similar. A reference to the ftp site for
+   the source, that is, to
 
      ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
 
-   in the documentation.
+   should also be given in the documentation.
 
 3. Altered versions must be plainly marked as such, and must not be
    misrepresented as being the original software.
diff --git a/Makefile.in b/Makefile.in
index d15fa3b..94edf49 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -17,7 +17,7 @@
 # given in its arguments, or which it finds out for itself.                 #
 #---------------------------------------------------------------------------#
 
-# BINDIR is the directory in which the pgrep command is installed.
+# BINDIR is the directory in which the pcregrep command is installed.
 # INCDIR is the directory in which the public header file pcre.h is installed.
 # LIBDIR is the directory in which the libraries are installed.
 # MANDIR is the directory in which the man pages are installed.
@@ -35,11 +35,12 @@ MANDIR = @mandir@
 CC = @CC@
 CFLAGS = @CFLAGS@
 RANLIB = @RANLIB@
+UTF8   = @UTF8@
 
-# LIBTOOL defaults to "", which cuts out the building of shared libraries.
-# If "configure" is called with --enable-shared-libraries, then LIBTOOL is
-# set to "./libtool", which causes shared libraries to be built, and LIBSUFFIX
-# is set to "la" instead of "a", which causes the shared libraries to be
+# LIBTOOL defaults to "./libtool", which enables the building of shared
+# libraries. If "configure" is called with --disable-shared-libraries, LIBTOOL
+# is set to "", which stops shared libraries from being built, and LIBSUFFIX
+# is set to "a" instead of "la", which causes the shared libraries not to be
 # installed.
 
 LIBTOOL = @LIBTOOL@
@@ -61,10 +62,11 @@ INSTALL_DATA = ${INSTALL} -m 644
 
 #---------------------------------------------------------------------------#
 # For almost all systems, the command to create a library is "ar cq", but   #
-# there is at least one where it is different, to make this configurable.   #
-# However, I haven't got round to learning how to make "configure" find     #
-# this out for itself. It is necessary to use a command such as             #
-# "make AR='ar -rc'" if you need to vary this.                              #
+# there is at least one where it is different, so this command must be      #
+# configurable. However, I haven't got round to learning how to make        #
+# "configure" find this out for itself. It is necessary to use a command    #
+# such as "make AR='ar -rc'" if you need to vary this. The setting of AR is #
+# *not* passed over to ./ltconfig, because it does its own setting up.      #
 #---------------------------------------------------------------------------#
 
 AR = ar cq
@@ -76,19 +78,19 @@ AR = ar cq
 OBJ = maketables.o get.o study.o pcre.o
 LOBJ = maketables.lo get.lo study.lo pcre.lo
 
-all:            libtool libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX) pcretest pgrep
+all:            libtool libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX) pcretest pcregrep
 
 libtool:        config.guess config.sub ltconfig ltmain.sh
 		@if test "$(LIBTOOL)" = "./libtool"; then \
 		  echo '--- Building libtool ---'; \
-		  ./ltconfig ./ltmain.sh; \
+		  CC=$(CC) CFLAGS='$(CFLAGS)' RANLIB='$(RANLIB)' ./ltconfig ./ltmain.sh; \
 		  echo '--- Built libtool ---'; fi
 
-pgrep:          libpcre.$(LIBSUFFIX) pgrep.o
+pcregrep:       libpcre.$(LIBSUFFIX) pcregrep.o
 		  @echo ' '
-		  @echo '--- Building pgrep utility'
+		  @echo '--- Building pcregrep utility'
 		  @echo ' '
-		$(LIBTOOL) $(CC) $(CFLAGS) -o pgrep pgrep.o libpcre.$(LIBSUFFIX)
+		$(LIBTOOL) $(CC) $(CFLAGS) -o pcregrep pcregrep.o libpcre.$(LIBSUFFIX)
 
 pcretest:       libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX) pcretest.o
 		  @echo ' '
@@ -128,7 +130,7 @@ libpcreposix.la: pcreposix.o
 		./libtool $(CC) -version-info '$(PCREPOSIXLIBVERSION)' -o libpcreposix.la -rpath $(LIBDIR) pcreposix.lo
 
 pcre.o:         chartables.c pcre.c pcre.h internal.h config.h Makefile
-		$(LIBTOOL) $(CC) -c $(CFLAGS) pcre.c
+		$(LIBTOOL) $(CC) -c $(CFLAGS) $(UTF8) pcre.c
 
 pcreposix.o:    pcreposix.c pcreposix.h internal.h pcre.h config.h Makefile
 		$(LIBTOOL) $(CC) -c $(CFLAGS) pcreposix.c
@@ -140,13 +142,13 @@ get.o:          get.c pcre.h internal.h config.h Makefile
 		$(LIBTOOL) $(CC) -c $(CFLAGS) get.c
 
 study.o:        study.c pcre.h internal.h config.h Makefile
-		$(LIBTOOL) $(CC) -c $(CFLAGS) study.c
+		$(LIBTOOL) $(CC) -c $(CFLAGS) $(UTF8) study.c
 
 pcretest.o:     pcretest.c pcre.h config.h Makefile
-		$(CC) -c $(CFLAGS) pcretest.c
+		$(CC) -c $(CFLAGS) $(UTF8) pcretest.c
 
-pgrep.o:        pgrep.c pcre.h Makefile config.h
-		$(CC) -c $(CFLAGS) pgrep.c
+pcregrep.o:     pcregrep.c pcre.h Makefile config.h
+		$(CC) -c $(CFLAGS) $(UTF8) pcregrep.c
 
 # An auxiliary program makes the default character table source
 
@@ -157,30 +159,30 @@ dftables:       dftables.c maketables.c pcre.h internal.h config.h Makefile
 		$(CC) -o dftables $(CFLAGS) dftables.c
 
 install:        all
-		$(LIBTOOL) $(INSTALL_DATA) libpcre.$(LIBSUFFIX) $(LIBDIR)/libpcre.$(LIBSUFFIX)
-		$(LIBTOOL) $(INSTALL_DATA) libpcreposix.$(LIBSUFFIX) $(LIBDIR)/libpcreposix.$(LIBSUFFIX)
-		$(INSTALL_DATA) pcre.h $(INCDIR)/pcre.h
-		$(INSTALL_DATA) pcreposix.h $(INCDIR)/pcreposix.h
-		$(INSTALL_DATA) doc/pcre.3 $(MANDIR)/man3/pcre.3
-		$(INSTALL_DATA) doc/pcreposix.3 $(MANDIR)/man3/pcreposix.3
-		$(INSTALL_DATA) doc/pgrep.1 $(MANDIR)/man1/pgrep.1
+		$(LIBTOOL) $(INSTALL_DATA) libpcre.$(LIBSUFFIX) $(DESTDIR)/$(LIBDIR)/libpcre.$(LIBSUFFIX)
+		$(LIBTOOL) $(INSTALL_DATA) libpcreposix.$(LIBSUFFIX) $(DESTDIR)/$(LIBDIR)/libpcreposix.$(LIBSUFFIX)
+		$(INSTALL_DATA) pcre.h $(DESTDIR)/$(INCDIR)/pcre.h
+		$(INSTALL_DATA) pcreposix.h $(DESTDIR)/$(INCDIR)/pcreposix.h
+		$(INSTALL_DATA) doc/pcre.3 $(DESTDIR)/$(MANDIR)/man3/pcre.3
+		$(INSTALL_DATA) doc/pcreposix.3 $(DESTDIR)/$(MANDIR)/man3/pcreposix.3
+		$(INSTALL_DATA) doc/pcregrep.1 $(DESTDIR)/$(MANDIR)/man1/pcregrep.1
 		@if test "$(LIBTOOL)" = "./libtool"; then \
 		  echo ' '; \
-		  echo '--- Rebuilding pgrep to use installed shared library ---'; \
-		  echo $(CC) $(CFLAGS) -o pgrep pgrep.o -L$(LIBDIR) -lpcre; \
-		  $(CC) $(CFLAGS) -o pgrep pgrep.o -L$(LIBDIR) -lpcre; \
+		  echo '--- Rebuilding pcregrep to use installed shared library ---'; \
+		  echo $(CC) $(CFLAGS) -o pcregrep pcregrep.o -L$(DESTDIR)/$(LIBDIR) -lpcre; \
+		  $(CC) $(CFLAGS) -o pcregrep pcregrep.o -L$(DESTDIR)/$(LIBDIR) -lpcre; \
 		  echo '--- Rebuilding pcretest to use installed shared library ---'; \
-		  echo $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(LIBDIR) -lpcre -lpcreposix; \
-		  $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(LIBDIR) -lpcre -lpcreposix; \
+		  echo $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(DESTDIR)/$(LIBDIR) -lpcre -lpcreposix; \
+		  $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(DESTDIR)/$(LIBDIR) -lpcre -lpcreposix; \
 		fi
-		$(INSTALL)	pgrep $(BINDIR)/pgrep
-		$(INSTALL)	pcre-config $(BINDIR)/pcre-config
+		$(INSTALL)	pcregrep $(DESTDIR)/$(BINDIR)/pcregrep
+		$(INSTALL)	pcre-config $(DESTDIR)/$(BINDIR)/pcre-config
 
 # We deliberately omit dftables and chartables.c from 'make clean'; once made
 # chartables.c shouldn't change, and if people have edited the tables by hand,
 # you don't want to throw them away.
 
-clean:;         -rm -rf *.o *.lo *.a *.la .libs pcretest pgrep testtry
+clean:;         -rm -rf *.o *.lo *.a *.la .libs pcretest pcregrep testtry
 
 # But "make distclean" should get back to a virgin distribution
 
@@ -190,6 +192,8 @@ distclean:      clean
 
 check:          runtest
 
+test:           runtest
+
 runtest:        all
 		./RunTest
 
@@ -198,7 +202,7 @@ runtest:        all
 # This addition for mingw32 was contributed by  Paul Sokolovsky
 # <Paul.Sokolovsky@technologist.com>. I (PH) don't know anything about it!
 
-dll:            _dll libpcre.dll.a pgrep_d pcretest_d
+dll:            _dll libpcre.dll.a pcregrep_d pcretest_d
 
 _dll:
 		$(MAKE) CFLAGS=-DSTATIC pcre.dll
@@ -206,8 +210,8 @@ _dll:
 pcre.dll:       $(OBJ) pcreposix.o pcre.def
 libpcre.dll.a:  pcre.def
 
-pgrep_d:        libpcre.dll.a pgrep.o
-		$(CC) $(CFLAGS) -L. -o pgrep pgrep.o -lpcre.dll
+pcregrep_d:     libpcre.dll.a pcregrep.o
+		$(CC) $(CFLAGS) -L. -o pcregrep pcregrep.o -lpcre.dll
 
 pcretest_d:     libpcre.dll.a pcretest.o
 		$(PURIFY) $(CC) $(CFLAGS) -L. -o pcretest pcretest.o -lpcre.dll
diff --git a/NEWS b/NEWS
index 4c80bd6..56fccdf 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,14 @@
 News about PCRE releases
 ------------------------
 
+Release 3.3 01-Aug-00
+---------------------
+
+There is some support for UTF-8 character strings. This is incomplete and
+experimental. The documentation describes what is and what is not implemented.
+Otherwise, this is just a bug-fixing release.
+
+
 Release 3.0 01-Feb-00
 ---------------------
 
diff --git a/README b/README
index 90aaf4d..d124ee0 100644
--- a/README
+++ b/README
@@ -7,6 +7,15 @@ The latest release of PCRE is always available from
 
 Please read the NEWS file if you are upgrading from a previous release.
 
+PCRE has its own native API, but a set of "wrapper" functions that are based on
+the POSIX API are also supplied in the library libpcreposix. Note that this
+just provides a POSIX calling interface to PCRE: the regular expressions
+themselves still follow Perl syntax and semantics. The header file
+for the POSIX-style functions is called pcreposix.h. The official POSIX name is
+regex.h, but I didn't want to risk possible problems with existing files of
+that name by distributing it that way. To use it with an existing program that
+uses the POSIX API, it will have to be renamed or pointed at by a link.
+
 
 Building PCRE on a Unix system
 ------------------------------
@@ -15,20 +24,29 @@ To build PCRE on a Unix system, run the "configure" command in the PCRE
 distribution directory. This is a standard GNU "autoconf" configuration script,
 for which generic instructions are supplied in INSTALL. On many systems just
 running "./configure" is sufficient, but the usual methods of changing standard
-defaults are available. For example
+defaults are available. For example,
 
 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
 
 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
 of the default, and that "make install" should install PCRE under /opt/local
-instead of the default /usr/local. The "configure" script builds thre files:
+instead of the default /usr/local.
+
+If you want to make use of the experimential, incomplete support for UTF-8
+character strings in PCRE, you must add --enable-utf8 to the "configure"
+command. Without it, the code for handling UTF-8 is not included in the
+library. (Even when included, it still has to be enabled by an option at run
+time.)
+
+The "configure" script builds four files:
 
 . Makefile is built by copying Makefile.in and making substitutions.
 . config.h is built by copying config.in and making substitutions.
 . pcre-config is built by copying pcre-config.in and making substitutions.
+. RunTest is a script for running tests
 
 Once "configure" has run, you can run "make". It builds two libraries called
-libpcre and libpcreposix, a test program called pcretest, and the pgrep
+libpcre and libpcreposix, a test program called pcretest, and the pcregrep
 command. You can use "make install" to copy these, and the public header file
 pcre.h, to appropriate live directories on your system, in the normal way.
 
@@ -54,11 +72,11 @@ The default distribution builds PCRE as two shared libraries. This support is
 new and experimental and may not work on all systems. It relies on the
 "libtool" scripts - these are distributed with PCRE. It should build a
 "libtool" script and use this to compile and link shared libraries, which are
-placed in a subdirectory called .libs. The programs pcretest and pgrep are
+placed in a subdirectory called .libs. The programs pcretest and pcregrep are
 built to use these uninstalled libraries by means of wrapper scripts. When you
-use "make install" to install shared libraries, pgrep and pcretest are
+use "make install" to install shared libraries, pcregrep and pcretest are
 automatically re-built to use the newly installed libraries. However, only
-pgrep is installed, as pcretest is really just a test program.
+pcregrep is installed, as pcretest is really just a test program.
 
 To build PCRE using static libraries you must use --disable-shared when
 configuring it. For example
@@ -82,8 +100,8 @@ Testing PCRE
 ------------
 
 To test PCRE on a Unix system, run the RunTest script in the pcre directory.
-(This can also be run by "make runtest" or "make check".) For other systems,
-see the instruction in NON-UNIX-USE.
+(This can also be run by "make runtest", "make check", or "make test".) For
+other systems, see the instruction in NON-UNIX-USE.
 
 The script runs the pcretest test program (which is documented in
 doc/pcretest.txt) on each of the testinput files (in the testdata directory) in
@@ -97,12 +115,24 @@ RunTest, for example:
 The first and third test files can also be fed directly into the perltest
 script to check that Perl gives the same results. The third file requires the
 additional features of release 5.005, which is why it is kept separate from the
-main test input, which needs only Perl 5.004. In the long run, when 5.005 is
-widespread, these two test files may get amalgamated.
-
-The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
-pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
-flags that are specific to PCRE, as well as the POSIX wrapper API.
+main test input, which needs only Perl 5.004. In the long run, when 5.005 (or
+higher) is widespread, these two test files may get amalgamated.
+
+The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
+pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
+detection, and run-time flags that are specific to PCRE, as well as the POSIX
+wrapper API. It also uses the debugging flag to check some of the internals of
+pcre_compile().
+
+If you build PCRE with a locale setting that is not the standard C locale, the
+character tables may be different (see next paragraph). In some cases, this may
+cause failures in the second set of tests. For example, in a locale where the
+isprint() function yields TRUE for characters in the range 128-255, the use of
+[:isascii:] inside a character class defines a different set of characters, and
+this shows up in this test as a difference in the compiled code, which is being
+listed for checking. Where the comparison test output contains [\x00-\x7f] the
+test will contain [\x00-\xff], and similarly in some other cases. This is not a
+bug in PCRE.
 
 The fourth set of tests checks pcre_maketables(), the facility for building a
 set of character tables for a specific locale and using them instead of the
@@ -117,14 +147,10 @@ output to say why. If running this test produces instances of the error
 in the comparison output, it means that locale is not available on your system,
 despite being listed by "locale". This does not mean that PCRE is broken.
 
-PCRE has its own native API, but a set of "wrapper" functions that are based on
-the POSIX API are also supplied in the library libpcreposix.a. Note that this
-just provides a POSIX calling interface to PCRE: the regular expressions
-themselves still follow Perl syntax and semantics. The header file
-for the POSIX-style functions is called pcreposix.h. The official POSIX name is
-regex.h, but I didn't want to risk possible problems with existing files of
-that name by distributing it that way. To use it with an existing program that
-uses the POSIX API, it will have to be renamed or pointed at by a link.
+The fifth test checks the experimental, incomplete UTF-8 support. It is not run
+automatically unless PCRE is built with UTF-8 support. This file can be fed
+directly to the perltest8 script, which requires Perl 5.6 or higher. The sixth
+file tests internal UTF-8 features of PCRE that are not relevant to Perl.
 
 
 Character tables
@@ -197,7 +223,7 @@ The distribution should contain the following files:
   NEWS                  important changes in this release
   NON-UNIX-USE          notes on building PCRE on non-Unix systems
   README                this file
-  RunTest               a Unix shell script for running tests
+  RunTest.in            template for a Unix shell script for running tests
   config.guess          ) files used by libtool,
   config.sub            )   used only when building a shared library
   configure             a configuring shell script (built by autoconf)
@@ -211,24 +237,29 @@ The distribution should contain the following files:
   doc/pcreposix.txt     plain text version
   doc/pcretest.txt      documentation of test program
   doc/perltest.txt      documentation of Perl test program
-  doc/pgrep.1           man page source for the pgrep utility
-  doc/pgrep.html        HTML version
-  doc/pgrep.txt         plain text version
+  doc/pcregrep.1        man page source for the pcregrep utility
+  doc/pcregrep.html     HTML version
+  doc/pcregrep.txt      plain text version
   install-sh            a shell script for installing files
   ltconfig              ) files used to build "libtool",
   ltmain.sh             )   used only when building a shared library
   pcretest.c            test program
   perltest              Perl test program
-  pgrep.c               source of a grep utility that uses PCRE
+  perltest8             Perl test program for UTF-8 tests
+  pcregrep.c            source of a grep utility that uses PCRE
   pcre-config.in        source of script which retains PCRE information
   testdata/testinput1   test data, compatible with Perl 5.004 and 5.005
   testdata/testinput2   test data for error messages and non-Perl things
   testdata/testinput3   test data, compatible with Perl 5.005
   testdata/testinput4   test data for locale-specific tests
+  testdata/testinput5   test data for UTF-8 tests compatible with Perl 5.6
+  testdata/testinput6   test data for other UTF-8 tests
   testdata/testoutput1  test results corresponding to testinput1
   testdata/testoutput2  test results corresponding to testinput2
   testdata/testoutput3  test results corresponding to testinput3
   testdata/testoutput4  test results corresponding to testinput4
+  testdata/testoutput5  test results corresponding to testinput5
+  testdata/testoutput6  test results corresponding to testinput6
 
 (C) Auxiliary files for Win32 DLL
 
@@ -236,4 +267,4 @@ The distribution should contain the following files:
   pcre.def
 
 Philip Hazel <ph10@cam.ac.uk>
-February 2000
+August 2000
diff --git a/RunTest b/RunTest.in
index 85eeb62..6e4eb08 100755
--- a/RunTest
+++ b/RunTest.in
@@ -1,5 +1,8 @@
 #! /bin/sh
 
+# This file is generated by configure from RunTest.in. Make any changes
+# to that file.
+
 # Run PCRE tests
 
 cf=diff
@@ -10,6 +13,8 @@ do1=no
 do2=no
 do3=no
 do4=no
+do5=no
+do6=no
 
 while [ $# -gt 0 ] ; do
   case $1 in
@@ -17,16 +22,32 @@ while [ $# -gt 0 ] ; do
     2) do2=yes;;
     3) do3=yes;;
     4) do4=yes;;
+    5) do5=yes;; 
+    6) do6=yes;; 
     *) echo "Unknown test number $1"; exit 1;;
   esac
   shift
 done
 
-if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no ] ; then
+if [ "@UTF8@" = "" ] ; then
+  if [ $do5 = yes ] ; then
+    echo "Can't run test 5 because UFT8 support is not configured"
+    exit 1
+  fi   
+  if [ $do6 = yes ] ; then
+    echo "Can't run test 6 because UFT8 support is not configured"
+    exit 1
+  fi   
+fi    
+
+if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a\
+     $do5 = no -a $do6 = no ] ; then
   do1=yes
   do2=yes
   do3=yes
   do4=yes
+  if [ "@UTF8@" != "" ] ; then do5=yes; fi
+  if [ "@UTF8@" != "" ] ; then do6=yes; fi
 fi
 
 # Primary test, Perl-compatible
@@ -66,6 +87,7 @@ if [ $do3 = yes ] ; then
 fi
 
 if [ $do1 = yes -a $do2 = yes -a $do3 = yes ] ; then
+  echo " " 
   echo "The three main tests all ran OK"
   echo " " 
 fi
@@ -79,8 +101,14 @@ if [ $do4 = yes ] ; then
     ./pcretest testdata/testinput4 testtry
     if [ $? = 0 ] ; then
       $cf testtry testdata/testoutput4
-      if [ $? != 0 ] ; then exit 1; fi
+      if [ $? != 0 ] ; then 
+        echo " "
+        echo "Locale test did not run entirely successfully."
+        echo "This usually means that there is a problem with the locale"
+        echo "settings rather than a bug in PCRE."    
+      else
       echo "Locale test ran OK" 
+      fi 
       echo " " 
     else exit 1
     fi
@@ -91,4 +119,30 @@ if [ $do4 = yes ] ; then
   fi
 fi
 
+# Additional tests for UTF8 support
+
+if [ $do5 = yes ] ; then
+  echo "Testing experimental, incomplete UTF8 support (Perl compatible)"
+  ./pcretest testdata/testinput5 testtry 
+  if [ $? = 0 ] ; then
+    $cf testtry testdata/testoutput5
+    if [ $? != 0 ] ; then exit 1; fi
+  else exit 1
+  fi
+  echo "UTF8 test ran OK"
+  echo " "
+fi
+
+if [ $do6 = yes ] ; then
+  echo "Testing API and internals for UTF8 support (not Perl compatible)"
+  ./pcretest testdata/testinput6 testtry 
+  if [ $? = 0 ] ; then
+    $cf testtry testdata/testoutput6
+    if [ $? != 0 ] ; then exit 1; fi
+  else exit 1
+  fi
+  echo "UTF8 internals test ran OK"
+  echo " "
+fi
+
 # End
diff --git a/configure b/configure
index f68c945..1fc9c59 100755
--- a/configure
+++ b/configure
@@ -13,6 +13,8 @@ ac_default_prefix=/usr/local
 # Any additions from configure.in:
 ac_help="$ac_help
   --disable-shared        build PCRE as a static library"
+ac_help="$ac_help
+  --enable-utf8           enable UTF8 support (incomplete)"
 
 # Initialize some variables set by options.
 # The variables have the same names as the options, with
@@ -138,6 +140,12 @@ do
     # The list generated by autoconf has been trimmed to remove many
     # options that are totally irrelevant to PCRE (e.g. relating to X),
     # or are not supported by its Makefile.
+    # The list generated by autoconf has been trimmed to remove many
+    # options that are totally irrelevant to PCRE (e.g. relating to X),
+    # or are not supported by its Makefile.
+    # The list generated by autoconf has been trimmed to remove many
+    # options that are totally irrelevant to PCRE (e.g. relating to X),
+    # or are not supported by its Makefile.
     # This message is too long to be a string in the A/UX 3.1 sh.
     cat << EOF
 Usage: ./configure [options]
@@ -505,8 +513,8 @@ fi
 
 
 PCRE_MAJOR=3
-PCRE_MINOR=2
-PCRE_DATE=12-May-2000
+PCRE_MINOR=3
+PCRE_DATE=01-Aug-2000
 PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}
 
 
@@ -517,7 +525,7 @@ PCRE_POSIXLIB_VERSION=0:0:0
 # Extract the first word of "gcc", so it can be a program name with args.
 set dummy gcc; ac_word=$2
 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6
-echo "configure:544: checking for $ac_word" >&5
+echo "configure:546: checking for $ac_word" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -547,7 +555,7 @@ if test -z "$CC"; then
   # Extract the first word of "cc", so it can be a program name with args.
 set dummy cc; ac_word=$2
 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6
-echo "configure:574: checking for $ac_word" >&5
+echo "configure:576: checking for $ac_word" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -598,7 +606,7 @@ fi
       # Extract the first word of "cl", so it can be a program name with args.
 set dummy cl; ac_word=$2
 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6
-echo "configure:625: checking for $ac_word" >&5
+echo "configure:627: checking for $ac_word" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -630,7 +638,7 @@ fi
 fi
 
 echo $ac_n "checking whether the C compiler ($CC $CFLAGS $LDFLAGS) works""... $ac_c" 1>&6
-echo "configure:657: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) works" >&5
+echo "configure:659: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) works" >&5
 
 ac_ext=c
 # CFLAGS is not in ac_cpp because -g, -O, etc. are not valid cpp options.
@@ -641,12 +649,12 @@ cross_compiling=$ac_cv_prog_cc_cross
 
 cat > conftest.$ac_ext << EOF
 
-#line 668 "configure"
+#line 670 "configure"
 #include "confdefs.h"
 
 main(){return(0);}
 EOF
-if { (eval echo configure:673: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
+if { (eval echo configure:675: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
   ac_cv_prog_cc_works=yes
   # If we can't run a trivial program, we are probably using a cross compiler.
   if (./conftest; exit) 2>/dev/null; then
@@ -672,12 +680,12 @@ if test $ac_cv_prog_cc_works = no; then
   { echo "configure: error: installation or configuration problem: C compiler cannot create executables." 1>&2; exit 1; }
 fi
 echo $ac_n "checking whether the C compiler ($CC $CFLAGS $LDFLAGS) is a cross-compiler""... $ac_c" 1>&6
-echo "configure:699: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) is a cross-compiler" >&5
+echo "configure:701: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) is a cross-compiler" >&5
 echo "$ac_t""$ac_cv_prog_cc_cross" 1>&6
 cross_compiling=$ac_cv_prog_cc_cross
 
 echo $ac_n "checking whether we are using GNU C""... $ac_c" 1>&6
-echo "configure:704: checking whether we are using GNU C" >&5
+echo "configure:706: checking whether we are using GNU C" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_gcc'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -686,7 +694,7 @@ else
   yes;
 #endif
 EOF
-if { ac_try='${CC-cc} -E conftest.c'; { (eval echo configure:713: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }; } | egrep yes >/dev/null 2>&1; then
+if { ac_try='${CC-cc} -E conftest.c'; { (eval echo configure:715: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }; } | egrep yes >/dev/null 2>&1; then
   ac_cv_prog_gcc=yes
 else
   ac_cv_prog_gcc=no
@@ -705,7 +713,7 @@ ac_test_CFLAGS="${CFLAGS+set}"
 ac_save_CFLAGS="$CFLAGS"
 CFLAGS=
 echo $ac_n "checking whether ${CC-cc} accepts -g""... $ac_c" 1>&6
-echo "configure:732: checking whether ${CC-cc} accepts -g" >&5
+echo "configure:734: checking whether ${CC-cc} accepts -g" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_cc_g'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -739,7 +747,7 @@ fi
 # Extract the first word of "ranlib", so it can be a program name with args.
 set dummy ranlib; ac_word=$2
 echo $ac_n "checking for $ac_word""... $ac_c" 1>&6
-echo "configure:766: checking for $ac_word" >&5
+echo "configure:768: checking for $ac_word" >&5
 if eval "test \"`echo '$''{'ac_cv_prog_RANLIB'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
@@ -769,7 +777,7 @@ fi
 
 
 echo $ac_n "checking how to run the C preprocessor""... $ac_c" 1>&6
-echo "configure:796: checking how to run the C preprocessor" >&5
+echo "configure:798: checking how to run the C preprocessor" >&5
 # On Suns, sometimes $CPP names a directory.
 if test -n "$CPP" && test -d "$CPP"; then
   CPP=
@@ -784,13 +792,13 @@ else
   # On the NeXT, cc -E runs the code through the compiler's parser,
   # not just through cpp.
   cat > conftest.$ac_ext <<EOF
-#line 811 "configure"
+#line 813 "configure"
 #include "confdefs.h"
 #include <assert.h>
 Syntax Error
 EOF
 ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
-{ (eval echo configure:817: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
+{ (eval echo configure:819: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
   :
@@ -801,13 +809,13 @@ else
   rm -rf conftest*
   CPP="${CC-cc} -E -traditional-cpp"
   cat > conftest.$ac_ext <<EOF
-#line 828 "configure"
+#line 830 "configure"
 #include "confdefs.h"
 #include <assert.h>
 Syntax Error
 EOF
 ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
-{ (eval echo configure:834: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
+{ (eval echo configure:836: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
   :
@@ -818,13 +826,13 @@ else
   rm -rf conftest*
   CPP="${CC-cc} -nologo -E"
   cat > conftest.$ac_ext <<EOF
-#line 845 "configure"
+#line 847 "configure"
 #include "confdefs.h"
 #include <assert.h>
 Syntax Error
 EOF
 ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
-{ (eval echo configure:851: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
+{ (eval echo configure:853: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
   :
@@ -849,12 +857,12 @@ fi
 echo "$ac_t""$CPP" 1>&6
 
 echo $ac_n "checking for ANSI C header files""... $ac_c" 1>&6
-echo "configure:876: checking for ANSI C header files" >&5
+echo "configure:878: checking for ANSI C header files" >&5
 if eval "test \"`echo '$''{'ac_cv_header_stdc'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
   cat > conftest.$ac_ext <<EOF
-#line 881 "configure"
+#line 883 "configure"
 #include "confdefs.h"
 #include <stdlib.h>
 #include <stdarg.h>
@@ -862,7 +870,7 @@ else
 #include <float.h>
 EOF
 ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
-{ (eval echo configure:889: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
+{ (eval echo configure:891: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
   rm -rf conftest*
@@ -879,7 +887,7 @@ rm -f conftest*
 if test $ac_cv_header_stdc = yes; then
   # SunOS 4.x string.h does not declare mem*, contrary to ANSI.
 cat > conftest.$ac_ext <<EOF
-#line 906 "configure"
+#line 908 "configure"
 #include "confdefs.h"
 #include <string.h>
 EOF
@@ -897,7 +905,7 @@ fi
 if test $ac_cv_header_stdc = yes; then
   # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI.
 cat > conftest.$ac_ext <<EOF
-#line 924 "configure"
+#line 926 "configure"
 #include "confdefs.h"
 #include <stdlib.h>
 EOF
@@ -918,7 +926,7 @@ if test "$cross_compiling" = yes; then
   :
 else
   cat > conftest.$ac_ext <<EOF
-#line 945 "configure"
+#line 947 "configure"
 #include "confdefs.h"
 #include <ctype.h>
 #define ISLOWER(c) ('a' <= (c) && (c) <= 'z')
@@ -929,7 +937,7 @@ if (XOR (islower (i), ISLOWER (i)) || toupper (i) != TOUPPER (i)) exit(2);
 exit (0); }
 
 EOF
-if { (eval echo configure:956: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext} && (./conftest; exit) 2>/dev/null
+if { (eval echo configure:958: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext} && (./conftest; exit) 2>/dev/null
 then
   :
 else
@@ -956,17 +964,17 @@ for ac_hdr in limits.h
 do
 ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'`
 echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6
-echo "configure:983: checking for $ac_hdr" >&5
+echo "configure:985: checking for $ac_hdr" >&5
 if eval "test \"`echo '$''{'ac_cv_header_$ac_safe'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
   cat > conftest.$ac_ext <<EOF
-#line 988 "configure"
+#line 990 "configure"
 #include "confdefs.h"
 #include <$ac_hdr>
 EOF
 ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
-{ (eval echo configure:993: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
+{ (eval echo configure:995: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
   rm -rf conftest*
@@ -995,12 +1003,12 @@ done
 
 
 echo $ac_n "checking for working const""... $ac_c" 1>&6
-echo "configure:1022: checking for working const" >&5
+echo "configure:1024: checking for working const" >&5
 if eval "test \"`echo '$''{'ac_cv_c_const'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
   cat > conftest.$ac_ext <<EOF
-#line 1027 "configure"
+#line 1029 "configure"
 #include "confdefs.h"
 
 int main() {
@@ -1049,7 +1057,7 @@ ccp = (char const *const *) p;
 
 ; return 0; }
 EOF
-if { (eval echo configure:1076: \"$ac_compile\") 1>&5; (eval $ac_compile) 2>&5; }; then
+if { (eval echo configure:1078: \"$ac_compile\") 1>&5; (eval $ac_compile) 2>&5; }; then
   rm -rf conftest*
   ac_cv_c_const=yes
 else
@@ -1070,12 +1078,12 @@ EOF
 fi
 
 echo $ac_n "checking for size_t""... $ac_c" 1>&6
-echo "configure:1097: checking for size_t" >&5
+echo "configure:1099: checking for size_t" >&5
 if eval "test \"`echo '$''{'ac_cv_type_size_t'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
   cat > conftest.$ac_ext <<EOF
-#line 1102 "configure"
+#line 1104 "configure"
 #include "confdefs.h"
 #include <sys/types.h>
 #if STDC_HEADERS
@@ -1107,12 +1115,12 @@ fi
 for ac_func in bcopy memmove strerror
 do
 echo $ac_n "checking for $ac_func""... $ac_c" 1>&6
-echo "configure:1134: checking for $ac_func" >&5
+echo "configure:1136: checking for $ac_func" >&5
 if eval "test \"`echo '$''{'ac_cv_func_$ac_func'+set}'`\" = set"; then
   echo $ac_n "(cached) $ac_c" 1>&6
 else
   cat > conftest.$ac_ext <<EOF
-#line 1139 "configure"
+#line 1141 "configure"
 #include "confdefs.h"
 /* System header to define __stub macros and hopefully few prototypes,
     which can conflict with char $ac_func(); below.  */
@@ -1135,7 +1143,7 @@ $ac_func();
 
 ; return 0; }
 EOF
-if { (eval echo configure:1162: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
+if { (eval echo configure:1164: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then
   rm -rf conftest*
   eval "ac_cv_func_$ac_func=yes"
 else
@@ -1175,6 +1183,18 @@ fi
 
 
 
+# Check whether --enable-utf8 or --disable-utf8 was given.
+if test "${enable_utf8+set}" = set; then
+  enableval="$enable_utf8"
+  if test "$enableval" = "yes"; then
+  UTF8=-DSUPPORT_UTF8
+fi
+
+fi
+
+
+
+
 
 
 
@@ -1286,7 +1306,7 @@ done
 
 ac_given_srcdir=$srcdir
 
-trap 'rm -fr `echo "Makefile pcre.h:pcre.in pcre-config config.h:config.in" | sed "s/:[^ ]*//g"` conftest*; exit 1' 1 2 15
+trap 'rm -fr `echo "Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in config.h:config.in" | sed "s/:[^ ]*//g"` conftest*; exit 1' 1 2 15
 EOF
 cat >> $CONFIG_STATUS <<EOF
 
@@ -1325,6 +1345,7 @@ s%@HAVE_MEMMOVE@%$HAVE_MEMMOVE%g
 s%@HAVE_STRERROR@%$HAVE_STRERROR%g
 s%@LIBTOOL@%$LIBTOOL%g
 s%@LIBSUFFIX@%$LIBSUFFIX%g
+s%@UTF8@%$UTF8%g
 s%@PCRE_MAJOR@%$PCRE_MAJOR%g
 s%@PCRE_MINOR@%$PCRE_MINOR%g
 s%@PCRE_DATE@%$PCRE_DATE%g
@@ -1372,7 +1393,7 @@ EOF
 
 cat >> $CONFIG_STATUS <<EOF
 
-CONFIG_FILES=\${CONFIG_FILES-"Makefile pcre.h:pcre.in pcre-config"}
+CONFIG_FILES=\${CONFIG_FILES-"Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in"}
 EOF
 cat >> $CONFIG_STATUS <<\EOF
 for ac_file in .. $CONFIG_FILES; do if test "x$ac_file" != x..; then
@@ -1538,7 +1559,7 @@ cat >> $CONFIG_STATUS <<EOF
 
 EOF
 cat >> $CONFIG_STATUS <<\EOF
-chmod a+x pcre-config
+chmod a+x RunTest pcre-config
 exit 0
 EOF
 chmod +x $CONFIG_STATUS
diff --git a/configure.in b/configure.in
index 0b15310..ff51f48 100644
--- a/configure.in
+++ b/configure.in
@@ -17,8 +17,8 @@ dnl digits for minor numbers less than 10. There are unlikely to be
 dnl that many releases anyway.
 
 PCRE_MAJOR=3
-PCRE_MINOR=2
-PCRE_DATE=12-May-2000
+PCRE_MINOR=3
+PCRE_DATE=01-Aug-2000
 PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}
 
 dnl Provide versioning information for libtool shared libraries that
@@ -58,12 +58,22 @@ if test "$enableval" = "no"; then
 fi
 )
 
+dnl Handle --enable-utf8
+
+AC_ARG_ENABLE(utf8,
+[  --enable-utf8           enable UTF8 support (incomplete)],
+if test "$enableval" = "yes"; then
+  UTF8=-DSUPPORT_UTF8
+fi
+)
+
 dnl "Export" these variables
 
 AC_SUBST(HAVE_MEMMOVE)
 AC_SUBST(HAVE_STRERROR)
 AC_SUBST(LIBTOOL)
 AC_SUBST(LIBSUFFIX)
+AC_SUBST(UTF8)
 AC_SUBST(PCRE_MAJOR)
 AC_SUBST(PCRE_MINOR)
 AC_SUBST(PCRE_DATE)
@@ -72,4 +82,4 @@ AC_SUBST(PCRE_LIB_VERSION)
 AC_SUBST(PCRE_POSIXLIB_VERSION)
 
 dnl This must be last; it determines what files are written
-AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config,[chmod a+x pcre-config])
+AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in,[chmod a+x RunTest pcre-config])
diff --git a/doc/Tech.Notes b/doc/Tech.Notes
index 03904db..7b96e5b 100644
--- a/doc/Tech.Notes
+++ b/doc/Tech.Notes
@@ -202,9 +202,10 @@ Forward assertions are just like other subpatterns, but starting with one of
 the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
 OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
 is OP_REVERSE, followed by a two byte count of the number of characters to move
-back the pointer in the subject string. A separate count is present in each
-alternative of a lookbehind assertion, allowing them to have different fixed
-lengths.
+back the pointer in the subject string. When operating in UTF-8 mode, the count
+is a character count rather than a byte count. A separate count is present in
+each alternative of a lookbehind assertion, allowing them to have different
+fixed lengths.
 
 
 Once-only subpatterns
@@ -239,4 +240,4 @@ the compiled data.
 
 
 Philip Hazel
-February 2000
+August 2000
diff --git a/doc/pcre.3 b/doc/pcre.3
index 4334be2..748417b 100644
--- a/doc/pcre.3
+++ b/doc/pcre.3
@@ -44,6 +44,12 @@ pcre - Perl-compatible regular expressions.
 .B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
 .PP
 .br
+.B void pcre_free_substring(const char *\fIstringptr\fR);
+.PP
+.br
+.B void pcre_free_substring_list(const char **\fIstringptr\fR);
+.PP
+.br
 .B const unsigned char *pcre_maketables(void);
 .PP
 .br
@@ -70,7 +76,9 @@ pcre - Perl-compatible regular expressions.
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl 5, with just a few
 differences (see below). The current implementation corresponds to Perl 5.005,
-with some additional features from the Perl development release.
+with some additional features from later versions. This includes some
+experimental, incomplete support for UTF-8 encoded strings. Details of exactly
+what is and what is not supported are given below.
 
 PCRE has its own native API, which is described in this document. There is also
 a set of wrapper functions that correspond to the POSIX regular expression API.
@@ -84,12 +92,16 @@ contain the major and minor release numbers for the library. Applications can
 use these to include support for different releases.
 
 The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
-are used for compiling and matching regular expressions, while
-\fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
+are used for compiling and matching regular expressions.
+
+The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
 \fBpcre_get_substring_list()\fR are convenience functions for extracting
-captured substrings from a matched subject string. The function
-\fBpcre_maketables()\fR is used (optionally) to build a set of character tables
-in the current locale for passing to \fBpcre_compile()\fR.
+captured substrings from a matched subject string; \fBpcre_free_substring()\fR
+and \fBpcre_free_substring_list()\fR are also provided, to free the memory used
+for extracted strings.
+
+The function \fBpcre_maketables()\fR is used (optionally) to build a set of
+character tables in the current locale for passing to \fBpcre_compile()\fR.
 
 The function \fBpcre_fullinfo()\fR is used to find out information about a
 compiled pattern; \fBpcre_info()\fR is an obsolete version which returns only
@@ -223,6 +235,14 @@ This option inverts the "greediness" of the quantifiers so that they are not
 greedy by default, but become greedy if followed by "?". It is not compatible
 with Perl. It can also be set by a (?U) option setting within the pattern.
 
+  PCRE_UTF8
+
+This option causes PCRE to regard both the pattern and the subject as strings
+of UTF-8 characters instead of just byte strings. However, it is available only
+if PCRE has been built to include UTF-8 support. If not, the use of this option
+provokes an error. Support for UTF-8 is new, experimental, and incomplete.
+Details of exactly what it entails are given below.
+
 
 .SH STUDYING A PATTERN
 When a pattern is going to be used several times, it is worth spending more
@@ -558,7 +578,7 @@ extract a single substring, whose number is given as \fIstringnumber\fR. A
 value of zero extracts the substring that matched the entire pattern, while
 higher values extract the captured substrings. For \fBpcre_copy_substring()\fR,
 the string is placed in \fIbuffer\fR, whose length is given by
-\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of store is
+\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of memory is
 obtained via \fBpcre_malloc\fR, and its address is returned via
 \fIstringptr\fR. The yield of the function is the length of the string, not
 including the terminating zero, or one of
@@ -590,6 +610,15 @@ string. This can be distinguished from a genuine zero-length substring by
 inspecting the appropriate offset in \fIovector\fR, which is negative for unset
 substrings.
 
+The two convenience functions \fBpcre_free_substring()\fR and
+\fBpcre_free_substring_list()\fR can be used to free the memory returned by
+a previous call of \fBpcre_get_substring()\fR or
+\fBpcre_get_substring_list()\fR, respectively. They do nothing more than call
+the function pointed to by \fBpcre_free\fR, which of course could be called
+directly from a C program. However, PCRE is used in some situations where it is
+linked via a special interface to another programming language which cannot use
+\fBpcre_free\fR directly; it is for these cases that the functions are
+provided.
 
 
 .SH LIMITATIONS
@@ -691,8 +720,14 @@ The syntax and semantics of the regular expressions supported by PCRE are
 described below. Regular expressions are also described in the Perl
 documentation and in a number of other books, some of which have copious
 examples. Jeffrey Friedl's "Mastering Regular Expressions", published by
-O'Reilly (ISBN 1-56592-257), covers them in great detail. The description
-here is intended as reference documentation.
+O'Reilly (ISBN 1-56592-257), covers them in great detail.
+
+The description here is intended as reference documentation. The basic
+operation of PCRE is on strings of bytes. However, there is the beginnings of
+some support for UTF-8 character strings. To use this support you must
+configure PCRE to include it, and then call \fBpcre_compile()\fR with the
+PCRE_UTF8 option. How this affects the pattern matching is described in the
+final section of this document.
 
 A regular expression is a pattern that is matched against a subject string from
 left to right. Most characters stand for themselves in a pattern, and match the
@@ -1311,7 +1346,7 @@ example, the pattern
 
   (a|b\\1)+
 
-matches any number of "a"s and also "aba", "ababaa" etc. At each iteration of
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
 the subpattern, the back reference matches the character string corresponding
 to the previous iteration. In order for this to work, the pattern must be such
 that the first iteration does not need to match the back reference. This can be
@@ -1685,6 +1720,77 @@ with the pattern above. The former gives a failure almost instantly when
 applied to a whole line of "a" characters, whereas the latter takes an
 appreciable time with strings longer than about 20 characters.
 
+
+.SH UTF-8 SUPPORT
+Starting at release 3.3, PCRE has some support for character strings encoded
+in the UTF-8 format. This is incomplete, and is regarded as experimental. In
+order to use it, you must configure PCRE to include UTF-8 support in the code,
+and, in addition, you must call \fBpcre_compile()\fR with the PCRE_UTF8 option
+flag. When you do this, both the pattern and any subject strings that are
+matched against it are treated as UTF-8 strings instead of just strings of
+bytes, but only in the cases that are mentioned below.
+
+If you compile PCRE with UTF-8 support, but do not use it at run time, the
+library will be a bit bigger, but the additional run time overhead is limited
+to testing the PCRE_UTF8 flag in several places, so should not be very large.
+
+PCRE assumes that the strings it is given contain valid UTF-8 codes. It does
+not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE,
+the results are undefined.
+
+Running with PCRE_UTF8 set causes these changes in the way PCRE works:
+
+1. In a pattern, the escape sequence \\x{...}, where the contents of the braces
+is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
+code number is the given hexadecimal number, for example: \\x{1234}. This
+inserts from one to six literal bytes into the pattern, using the UTF-8
+encoding. If a non-hexadecimal digit appears between the braces, the item is
+not recognized.
+
+2. The original hexadecimal escape sequence, \\xhh, generates a two-byte UTF-8
+character if its value is greater than 127.
+
+3. Repeat quantifiers are NOT correctly handled if they follow a multibyte
+character. For example, \\x{100}* and \\xc3+ do not work. If you want to
+repeat such characters, you must enclose them in non-capturing parentheses,
+for example (?:\\x{100}), at present.
+
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+
+5. Unlike literal UTF-8 characters, the dot metacharacter followed by a
+repeat quantifier does operate correctly on UTF-8 characters instead of
+single bytes.
+
+4. Although the \\x{...} escape is permitted in a character class, characters
+whose values are greater than 255 cannot be included in a class.
+
+5. A class is matched against a UTF-8 character instead of just a single byte,
+but it can match only characters whose values are less than 256. Characters
+with greater values always fail to match a class.
+
+6. Repeated classes work correctly on multiple characters.
+
+7. Classes containing just a single character whose value is greater than 127
+(but less than 256), for example, [\\x80] or [^\\x{93}], do not work because
+these are optimized into single byte matches. In the first case, of course,
+the class brackets are just redundant.
+
+8. Lookbehind assertions move backwards in the subject by a fixed number of
+characters instead of a fixed number of bytes. Simple cases have been tested
+to work correctly, but there may be hidden gotchas herein.
+
+9. The character types such as \\d and \\w do not work correctly with UTF-8
+characters. They continue to test a single byte.
+
+10. Anything not explicitly mentioned here continues to work in bytes rather
+than in characters.
+
+The following UTF-8 features of Perl 5.6 are not implemented:
+
+1. The escape sequence \\C to match a single byte.
+
+2. The use of Unicode tables and properties and escapes \\p, \\P, and \\X.
+
 .SH AUTHOR
 Philip Hazel <ph10@cam.ac.uk>
 .br
@@ -1696,6 +1802,8 @@ Cambridge CB2 3QG, England.
 .br
 Phone: +44 1223 334714
 
-Last updated: 27 January 2000
+Last updated: 28 August 2000,
+.br
+  the 250th anniversary of the death of J.S. Bach.
 .br
 Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pcre.html b/doc/pcre.html
index 7eba9c3..4598698 100644
--- a/doc/pcre.html
+++ b/doc/pcre.html
@@ -37,7 +37,8 @@ conversion went wrong.
 <LI><A NAME="TOC27" HREF="#SEC27">COMMENTS</A>
 <LI><A NAME="TOC28" HREF="#SEC28">RECURSIVE PATTERNS</A>
 <LI><A NAME="TOC29" HREF="#SEC29">PERFORMANCE</A>
-<LI><A NAME="TOC30" HREF="#SEC30">AUTHOR</A>
+<LI><A NAME="TOC30" HREF="#SEC30">UTF-8 SUPPORT</A>
+<LI><A NAME="TOC31" HREF="#SEC31">AUTHOR</A>
 </UL>
 <LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
 <P>
@@ -76,6 +77,12 @@ pcre - Perl-compatible regular expressions.
 <B>int *<I>ovector</I>, int <I>stringcount</I>, const char ***<I>listptr</I>);</B>
 </P>
 <P>
+<B>void pcre_free_substring(const char *<I>stringptr</I>);</B>
+</P>
+<P>
+<B>void pcre_free_substring_list(const char **<I>stringptr</I>);</B>
+</P>
+<P>
 <B>const unsigned char *pcre_maketables(void);</B>
 </P>
 <P>
@@ -100,7 +107,9 @@ pcre - Perl-compatible regular expressions.
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl 5, with just a few
 differences (see below). The current implementation corresponds to Perl 5.005,
-with some additional features from the Perl development release.
+with some additional features from later versions. This includes some
+experimental, incomplete support for UTF-8 encoded strings. Details of exactly
+what is and what is not supported are given below.
 </P>
 <P>
 PCRE has its own native API, which is described in this document. There is also
@@ -117,12 +126,18 @@ use these to include support for different releases.
 </P>
 <P>
 The functions <B>pcre_compile()</B>, <B>pcre_study()</B>, and <B>pcre_exec()</B>
-are used for compiling and matching regular expressions, while
-<B>pcre_copy_substring()</B>, <B>pcre_get_substring()</B>, and
+are used for compiling and matching regular expressions.
+</P>
+<P>
+The functions <B>pcre_copy_substring()</B>, <B>pcre_get_substring()</B>, and
 <B>pcre_get_substring_list()</B> are convenience functions for extracting
-captured substrings from a matched subject string. The function
-<B>pcre_maketables()</B> is used (optionally) to build a set of character tables
-in the current locale for passing to <B>pcre_compile()</B>.
+captured substrings from a matched subject string; <B>pcre_free_substring()</B>
+and <B>pcre_free_substring_list()</B> are also provided, to free the memory used
+for extracted strings.
+</P>
+<P>
+The function <B>pcre_maketables()</B> is used (optionally) to build a set of
+character tables in the current locale for passing to <B>pcre_compile()</B>.
 </P>
 <P>
 The function <B>pcre_fullinfo()</B> is used to find out information about a
@@ -297,6 +312,18 @@ This option inverts the "greediness" of the quantifiers so that they are not
 greedy by default, but become greedy if followed by "?". It is not compatible
 with Perl. It can also be set by a (?U) option setting within the pattern.
 </P>
+<P>
+<PRE>
+  PCRE_UTF8
+</PRE>
+</P>
+<P>
+This option causes PCRE to regard both the pattern and the subject as strings
+of UTF-8 characters instead of just byte strings. However, it is available only
+if PCRE has been built to include UTF-8 support. If not, the use of this option
+provokes an error. Support for UTF-8 is new, experimental, and incomplete.
+Details of exactly what it entails are given below.
+</P>
 <LI><A NAME="SEC6" HREF="#TOC1">STUDYING A PATTERN</A>
 <P>
 When a pattern is going to be used several times, it is worth spending more
@@ -743,7 +770,7 @@ extract a single substring, whose number is given as <I>stringnumber</I>. A
 value of zero extracts the substring that matched the entire pattern, while
 higher values extract the captured substrings. For <B>pcre_copy_substring()</B>,
 the string is placed in <I>buffer</I>, whose length is given by
-<I>buffersize</I>, while for <B>pcre_get_substring()</B> a new block of store is
+<I>buffersize</I>, while for <B>pcre_get_substring()</B> a new block of memory is
 obtained via <B>pcre_malloc</B>, and its address is returned via
 <I>stringptr</I>. The yield of the function is the length of the string, not
 including the terminating zero, or one of
@@ -789,6 +816,17 @@ string. This can be distinguished from a genuine zero-length substring by
 inspecting the appropriate offset in <I>ovector</I>, which is negative for unset
 substrings.
 </P>
+<P>
+The two convenience functions <B>pcre_free_substring()</B> and
+<B>pcre_free_substring_list()</B> can be used to free the memory returned by
+a previous call of <B>pcre_get_substring()</B> or
+<B>pcre_get_substring_list()</B>, respectively. They do nothing more than call
+the function pointed to by <B>pcre_free</B>, which of course could be called
+directly from a C program. However, PCRE is used in some situations where it is
+linked via a special interface to another programming language which cannot use
+<B>pcre_free</B> directly; it is for these cases that the functions are
+provided.
+</P>
 <LI><A NAME="SEC11" HREF="#TOC1">LIMITATIONS</A>
 <P>
 There are some size limitations in PCRE but it is hoped that they will never in
@@ -908,8 +946,15 @@ The syntax and semantics of the regular expressions supported by PCRE are
 described below. Regular expressions are also described in the Perl
 documentation and in a number of other books, some of which have copious
 examples. Jeffrey Friedl's "Mastering Regular Expressions", published by
-O'Reilly (ISBN 1-56592-257), covers them in great detail. The description
-here is intended as reference documentation.
+O'Reilly (ISBN 1-56592-257), covers them in great detail.
+</P>
+<P>
+The description here is intended as reference documentation. The basic
+operation of PCRE is on strings of bytes. However, there is the beginnings of
+some support for UTF-8 character strings. To use this support you must
+configure PCRE to include it, and then call <B>pcre_compile()</B> with the
+PCRE_UTF8 option. How this affects the pattern matching is described in the
+final section of this document.
 </P>
 <P>
 A regular expression is a pattern that is matched against a subject string from
@@ -1718,7 +1763,7 @@ example, the pattern
 </PRE>
 </P>
 <P>
-matches any number of "a"s and also "aba", "ababaa" etc. At each iteration of
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
 the subpattern, the back reference matches the character string corresponding
 to the previous iteration. In order for this to work, the pattern must be such
 that the first iteration does not need to match the back reference. This can be
@@ -2240,7 +2285,96 @@ with the pattern above. The former gives a failure almost instantly when
 applied to a whole line of "a" characters, whereas the latter takes an
 appreciable time with strings longer than about 20 characters.
 </P>
-<LI><A NAME="SEC30" HREF="#TOC1">AUTHOR</A>
+<LI><A NAME="SEC30" HREF="#TOC1">UTF-8 SUPPORT</A>
+<P>
+Starting at release 3.3, PCRE has some support for character strings encoded
+in the UTF-8 format. This is incomplete, and is regarded as experimental. In
+order to use it, you must configure PCRE to include UTF-8 support in the code,
+and, in addition, you must call <B>pcre_compile()</B> with the PCRE_UTF8 option
+flag. When you do this, both the pattern and any subject strings that are
+matched against it are treated as UTF-8 strings instead of just strings of
+bytes, but only in the cases that are mentioned below.
+</P>
+<P>
+If you compile PCRE with UTF-8 support, but do not use it at run time, the
+library will be a bit bigger, but the additional run time overhead is limited
+to testing the PCRE_UTF8 flag in several places, so should not be very large.
+</P>
+<P>
+PCRE assumes that the strings it is given contain valid UTF-8 codes. It does
+not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE,
+the results are undefined.
+</P>
+<P>
+Running with PCRE_UTF8 set causes these changes in the way PCRE works:
+</P>
+<P>
+1. In a pattern, the escape sequence \x{...}, where the contents of the braces
+is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
+code number is the given hexadecimal number, for example: \x{1234}. This
+inserts from one to six literal bytes into the pattern, using the UTF-8
+encoding. If a non-hexadecimal digit appears between the braces, the item is
+not recognized.
+</P>
+<P>
+2. The original hexadecimal escape sequence, \xhh, generates a two-byte UTF-8
+character if its value is greater than 127.
+</P>
+<P>
+3. Repeat quantifiers are NOT correctly handled if they follow a multibyte
+character. For example, \x{100}* and \xc3+ do not work. If you want to
+repeat such characters, you must enclose them in non-capturing parentheses,
+for example (?:\x{100}), at present.
+</P>
+<P>
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+</P>
+<P>
+5. Unlike literal UTF-8 characters, the dot metacharacter followed by a
+repeat quantifier does operate correctly on UTF-8 characters instead of
+single bytes.
+</P>
+<P>
+4. Although the \x{...} escape is permitted in a character class, characters
+whose values are greater than 255 cannot be included in a class.
+</P>
+<P>
+5. A class is matched against a UTF-8 character instead of just a single byte,
+but it can match only characters whose values are less than 256. Characters
+with greater values always fail to match a class.
+</P>
+<P>
+6. Repeated classes work correctly on multiple characters.
+</P>
+<P>
+7. Classes containing just a single character whose value is greater than 127
+(but less than 256), for example, [\x80] or [^\x{93}], do not work because
+these are optimized into single byte matches. In the first case, of course,
+the class brackets are just redundant.
+</P>
+<P>
+8. Lookbehind assertions move backwards in the subject by a fixed number of
+characters instead of a fixed number of bytes. Simple cases have been tested
+to work correctly, but there may be hidden gotchas herein.
+</P>
+<P>
+9. The character types such as \d and \w do not work correctly with UTF-8
+characters. They continue to test a single byte.
+</P>
+<P>
+10. Anything not explicitly mentioned here continues to work in bytes rather
+than in characters.
+</P>
+<P>
+The following UTF-8 features of Perl 5.6 are not implemented:
+</P>
+<P>
+1. The escape sequence \C to match a single byte.
+</P>
+<P>
+2. The use of Unicode tables and properties and escapes \p, \P, and \X.
+</P>
+<LI><A NAME="SEC31" HREF="#TOC1">AUTHOR</A>
 <P>
 Philip Hazel &#60;ph10@cam.ac.uk&#62;
 <BR>
@@ -2253,6 +2387,8 @@ Cambridge CB2 3QG, England.
 Phone: +44 1223 334714
 </P>
 <P>
-Last updated: 27 January 2000
+Last updated: 28 August 2000,
+<BR>
+  the 250th anniversary of the death of J.S. Bach.
 <BR>
 Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pcre.txt b/doc/pcre.txt
index b8106e4..29cc490 100644
--- a/doc/pcre.txt
+++ b/doc/pcre.txt
@@ -28,6 +28,10 @@ SYNOPSIS
      int pcre_get_substring_list(const char *subject,
           int *ovector, int stringcount, const char ***listptr);
 
+     void pcre_free_substring(const char *stringptr);
+
+     void pcre_free_substring_list(const char **stringptr);
+
      const unsigned char *pcre_maketables(void);
 
      int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
@@ -48,9 +52,12 @@ DESCRIPTION
      The PCRE library is a set of functions that implement  regu-
      lar  expression  pattern  matching using the same syntax and
      semantics as Perl  5,  with  just  a  few  differences  (see
+
      below).  The  current  implementation  corresponds  to  Perl
-     5.005, with some additional features from the Perl  develop-
-     ment release.
+     5.005, with some additional features  from  later  versions.
+     This  includes  some  experimental,  incomplete  support for
+     UTF-8 encoded strings. Details of exactly what is  and  what
+     is not supported are given below.
 
      PCRE has its own native API,  which  is  described  in  this
      document.  There  is  also  a  set of wrapper functions that
@@ -67,13 +74,18 @@ DESCRIPTION
      releases.
 
      The functions pcre_compile(), pcre_study(), and  pcre_exec()
-     are  used  for  compiling  and matching regular expressions,
-     while   pcre_copy_substring(),   pcre_get_substring(),   and
-     pcre_get_substring_list()   are  convenience  functions  for
+     are used for compiling and matching regular expressions.
+
+     The functions  pcre_copy_substring(),  pcre_get_substring(),
+     and  pcre_get_substring_list() are convenience functions for
      extracting  captured  substrings  from  a  matched   subject
-     string.  The function pcre_maketables() is used (optionally)
-     to build a set of character tables in the current locale for
-     passing to pcre_compile().
+     string; pcre_free_substring() and pcre_free_substring_list()
+     are also provided, to free the  memory  used  for  extracted
+     strings.
+
+     The function pcre_maketables() is used (optionally) to build
+     a  set of character tables in the current locale for passing
+     to pcre_compile().
 
      The function pcre_fullinfo() is used to find out information
      about a compiled pattern; pcre_info() is an obsolete version
@@ -92,10 +104,19 @@ DESCRIPTION
 
 
 MULTI-THREADING
-     The PCRE functions can be used in  multi-threading  applica-
-     tions, with the proviso that the memory management functions
-     pointed to by pcre_malloc and pcre_free are  shared  by  all
-     threads.
+     The  PCRE  functions  can   be   used   in   multi-threading
+
+
+
+
+
+SunOS 5.8                 Last change:                          2
+
+
+
+     applications,  with  the  proviso that the memory management
+     functions pointed to by pcre_malloc and pcre_free are shared
+     by all threads.
 
      The compiled form of a regular  expression  is  not  altered
      during  matching, so the same compiled pattern can safely be
@@ -103,7 +124,6 @@ MULTI-THREADING
 
 
 
-
 COMPILING A PATTERN
      The function pcre_compile() is called to compile  a  pattern
      into  an internal form. The pattern is a C string terminated
@@ -235,12 +255,23 @@ COMPILING A PATTERN
      followed by "?". It is not compatible with Perl. It can also
      be set by a (?U) option setting within the pattern.
 
+       PCRE_UTF8
+
+     This option causes PCRE to regard both the pattern  and  the
+     subject  as strings of UTF-8 characters instead of just byte
+     strings. However, it is available  only  if  PCRE  has  been
+     built  to  include  UTF-8  support.  If not, the use of this
+     option provokes an error. Support for UTF-8 is new,  experi-
+     mental,  and incomplete.  Details of exactly what it entails
+     are given below.
+
 
 
 STUDYING A PATTERN
      When a pattern is going to be  used  several  times,  it  is
      worth  spending  more time analyzing it in order to speed up
      the time taken for matching. The function pcre_study() takes
+
      a  pointer  to a compiled pattern as its first argument, and
      returns a  pointer  to  a  pcre_extra  block  (another  void
      typedef)  containing  additional  information about the pat-
@@ -344,9 +375,9 @@ INFORMATION ABOUT A PATTERN
 
        PCRE_INFO_BACKREFMAX
 
-     Return the number of the highest back reference in the  pat-
-     tern.  The  fourth argument should point to an int variable.
-     Zero is returned if there are no back references.
+     Return the number of  the  highest  back  reference  in  the
+     pattern.  The  fourth  argument should point to an int vari-
+     able. Zero is returned if there are no back references.
 
        PCRE_INFO_FIRSTCHAR
 
@@ -605,6 +636,15 @@ MATCHING A PATTERN
 
 EXTRACTING CAPTURED SUBSTRINGS
      Captured substrings can be accessed directly  by  using  the
+
+
+
+
+
+SunOS 5.8                 Last change:                         12
+
+
+
      offsets returned by pcre_exec() in ovector. For convenience,
      the functions  pcre_copy_substring(),  pcre_get_substring(),
      and  pcre_get_substring_list()  are  provided for extracting
@@ -631,7 +671,7 @@ EXTRACTING CAPTURED SUBSTRINGS
      the entire pattern, while higher values extract the captured
      substrings. For pcre_copy_substring(), the string is  placed
      in  buffer,  whose  length is given by buffersize, while for
-     pcre_get_substring() a new block of store  is  obtained  via
+     pcre_get_substring() a new block of memory is  obtained  via
      pcre_malloc,  and its address is returned via stringptr. The
      yield of the function is  the  length  of  the  string,  not
      including the terminating zero, or one of
@@ -665,6 +705,16 @@ EXTRACTING CAPTURED SUBSTRINGS
      inspecting the appropriate offset in ovector, which is nega-
      tive for unset substrings.
 
+     The  two  convenience  functions  pcre_free_substring()  and
+     pcre_free_substring_list()  can  be  used to free the memory
+     returned by  a  previous  call  of  pcre_get_substring()  or
+     pcre_get_substring_list(),  respectively.  They  do  nothing
+     more than call the function pointed to by  pcre_free,  which
+     of  course  could  be called directly from a C program. How-
+     ever, PCRE is used in some situations where it is linked via
+     a  special  interface  to another programming language which
+     cannot use pcre_free directly; it is for  these  cases  that
+     the functions are provided.
 
 
 
@@ -733,6 +783,7 @@ DIFFERENCES FROM PERL
      (?p{code})  constructions. However, there is some experimen-
      tal support for recursive patterns using the  non-Perl  item
      (?R).
+
      8. There are at the time of writing some  oddities  in  Perl
      5.005_02  concerned  with  the  settings of captured strings
      when part of a pattern is repeated.  For  example,  matching
@@ -785,11 +836,17 @@ REGULAR EXPRESSION DETAILS
      The syntax and semantics of  the  regular  expressions  sup-
      ported  by PCRE are described below. Regular expressions are
      also described in the Perl documentation and in a number  of
-
      other  books,  some  of which have copious examples. Jeffrey
      Friedl's  "Mastering  Regular  Expressions",  published   by
-     O'Reilly  (ISBN  1-56592-257),  covers them in great detail.
+     O'Reilly (ISBN 1-56592-257), covers them in great detail.
+
      The description here is intended as reference documentation.
+     The basic operation of PCRE is on strings of bytes. However,
+     there is the beginnings of some support for UTF-8  character
+     strings.  To  use  this  support  you must configure PCRE to
+     include it, and then call pcre_compile() with the  PCRE_UTF8
+     option.  How  this affects the pattern matching is described
+     in the final section of this document.
 
      A regular expression is a pattern that is matched against  a
      subject string from left to right. Most characters stand for
@@ -1004,6 +1061,7 @@ CIRCUMFLEX AND DOLLAR
      Outside a character class, in the default matching mode, the
      circumflex  character  is an assertion which is true only if
      the current matching point is at the start  of  the  subject
+
      string.  If  the startoffset argument of pcre_exec() is non-
      zero, circumflex can never match. Inside a character  class,
      circumflex has an entirely different meaning (see below).
@@ -1056,6 +1114,7 @@ FULL STOP (PERIOD, DOT)
      Outside a character class, a dot in the pattern matches  any
      one character in the subject, including a non-printing char-
      acter, but not (by default)  newline.   If  the  PCRE_DOTALL
+
      option  is set, dots match newlines as well. The handling of
      dot is entirely independent of the  handling  of  circumflex
      and  dollar,  the  only  relationship  being  that they both
@@ -1517,18 +1576,19 @@ BACK REFERENCES
      A back reference that occurs inside the parentheses to which
      it  refers  fails when the subpattern is first used, so, for
      example, (a\1) never matches.  However, such references  can
-     be  useful  inside  repeated  subpatterns.  For example, the
-     pattern
+     be useful inside repeated subpatterns. For example, the pat-
+     tern
 
        (a|b\1)+
 
-     matches any number of "a"s and also "aba", "ababaa" etc.  At
+     matches any number of "a"s and also "aba", "ababbaa" etc. At
      each iteration of the subpattern, the back reference matches
-     the character string corresponding to  the  previous  itera-
-     tion.  In  order  for this to work, the pattern must be such
-     that the first iteration does not need  to  match  the  back
-     reference.  This  can  be  done using alternation, as in the
-     example above, or by a quantifier with a minimum of zero.
+     the  character  string   corresponding   to   the   previous
+     iteration.  In  order  for this to work, the pattern must be
+     such that the first iteration does not  need  to  match  the
+     back  reference.  This  can be done using alternation, as in
+     the example above, or by a  quantifier  with  a  minimum  of
+     zero.
 
 
 
@@ -1681,9 +1741,9 @@ ONCE-ONLY SUBPATTERNS
 
      This kind of parenthesis "locks up" the  part of the pattern
      it  contains once it has matched, and a failure further into
-     the pattern is prevented from backtracking  into  it.  Back-
-     tracking  past  it to previous items, however, works as nor-
-     mal.
+     the  pattern  is  prevented  from  backtracking   into   it.
+     Backtracking  past  it  to previous items, however, works as
+     normal.
 
      An alternative description is that a subpattern of this type
      matches  the  string  of  characters that an identical stan-
@@ -1941,9 +2001,9 @@ PERFORMANCE
      repeat can match 0, 1, 2, 3, or 4 times,  and  for  each  of
      those  cases other than 0, the + repeats can match different
      numbers of times.) When the remainder of the pattern is such
-     that  the entire match is going to fail, PCRE has in princi-
-     ple to try every possible variation, and this  can  take  an
-     extremely long time.
+     that  the  entire  match  is  going  to  fail,  PCRE  has in
+     principle to try every possible variation, and this can take
+     an extremely long time.
 
      An optimization catches some of the more simple  cases  such
      as
@@ -1966,6 +2026,93 @@ PERFORMANCE
 
 
 
+UTF-8 SUPPORT
+     Starting at release 3.3, PCRE has some support for character
+     strings encoded in the UTF-8 format. This is incomplete, and
+     is regarded as experimental. In order to use  it,  you  must
+     configure PCRE to include UTF-8 support in the code, and, in
+     addition, you must call pcre_compile()  with  the  PCRE_UTF8
+     option flag. When you do this, both the pattern and any sub-
+     ject strings that are matched  against  it  are  treated  as
+     UTF-8  strings instead of just strings of bytes, but only in
+     the cases that are mentioned below.
+
+     If you compile PCRE with UTF-8 support, but do not use it at
+     run  time,  the  library will be a bit bigger, but the addi-
+     tional run time overhead is limited to testing the PCRE_UTF8
+     flag in several places, so should not be very large.
+
+     PCRE assumes that the strings  it  is  given  contain  valid
+     UTF-8  codes. It does not diagnose invalid UTF-8 strings. If
+     you pass invalid UTF-8 strings  to  PCRE,  the  results  are
+     undefined.
+
+     Running with PCRE_UTF8 set causes these changes in  the  way
+     PCRE works:
+
+     1. In a pattern, the escape sequence \x{...}, where the con-
+     tents  of  the  braces is a string of hexadecimal digits, is
+     interpreted as a UTF-8 character whose code  number  is  the
+     given   hexadecimal  number,  for  example:  \x{1234}.  This
+     inserts from one to six  literal  bytes  into  the  pattern,
+     using the UTF-8 encoding. If a non-hexadecimal digit appears
+     between the braces, the item is not recognized.
+
+     2. The original hexadecimal escape sequence, \xhh, generates
+     a two-byte UTF-8 character if its value is greater than 127.
+
+     3. Repeat quantifiers are NOT correctly handled if they fol-
+     low  a  multibyte character. For example, \x{100}* and \xc3+
+     do not work. If you want to repeat such characters, you must
+     enclose  them  in  non-capturing  parentheses,  for  example
+     (?:\x{100}), at present.
+
+     4. The dot metacharacter matches one UTF-8 character instead
+     of a single byte.
+
+     5. Unlike literal UTF-8 characters,  the  dot  metacharacter
+     followed  by  a  repeat quantifier does operate correctly on
+     UTF-8 characters instead of single bytes.
+
+     4. Although the \x{...} escape is permitted in  a  character
+     class,  characters  whose values are greater than 255 cannot
+     be included in a class.
+
+     5. A class is matched against a UTF-8 character  instead  of
+     just  a  single byte, but it can match only characters whose
+     values are less than 256.  Characters  with  greater  values
+     always fail to match a class.
+
+     6. Repeated classes work correctly on multiple characters.
+
+     7. Classes containing just a single character whose value is
+     greater than 127 (but less than 256), for example, [\x80] or
+     [^\x{93}], do not work because these are optimized into sin-
+     gle  byte  matches.  In the first case, of course, the class
+     brackets are just redundant.
+
+     8. Lookbehind assertions move backwards in the subject by  a
+     fixed  number  of  characters  instead  of a fixed number of
+     bytes. Simple cases have been tested to work correctly,  but
+     there may be hidden gotchas herein.
+
+     9. The character types  such  as  \d  and  \w  do  not  work
+     correctly  with  UTF-8  characters.  They continue to test a
+     single byte.
+
+     10. Anything not explicitly mentioned here continues to work
+     in bytes rather than in characters.
+
+     The following UTF-8 features of  Perl  5.6  are  not  imple-
+     mented:
+
+     1. The escape sequence \C to match a single byte.
+
+     2. The use of Unicode tables and properties and escapes  \p,
+     \P, and \X.
+
+
+
 AUTHOR
      Philip Hazel <ph10@cam.ac.uk>
      University Computing Service,
@@ -1973,5 +2120,6 @@ AUTHOR
      Cambridge CB2 3QG, England.
      Phone: +44 1223 334714
 
-     Last updated: 27 January 2000
+     Last updated: 28 August 2000,
+       the 250th anniversary of the death of J.S. Bach.
      Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pgrep.1 b/doc/pcregrep.1
index d9e9b57..41b9051 100644
--- a/doc/pgrep.1
+++ b/doc/pcregrep.1
@@ -1,20 +1,20 @@
-.TH PGREP 1
+.TH PCREGREP 1
 .SH NAME
-pgrep - a grep with Perl-compatible regular expressions.
+pcregrep - a grep with Perl-compatible regular expressions.
 .SH SYNOPSIS
-.B pgrep [-Vchilnsvx] pattern [file] ...
+.B pcregrep [-Vchilnsvx] pattern [file] ...
 
 
 .SH DESCRIPTION
-\fBpgrep\fR searches files for character patterns, in the same way as other
+\fBpcregrep\fR searches files for character patterns, in the same way as other
 grep commands do, but it uses the PCRE regular expression library to support
 patterns that are compatible with the regular expressions of Perl 5. See
 \fBpcre(3)\fR for a full description of syntax and semantics.
 
-If no files are specified, \fBpgrep\fR reads the standard input. By default,
+If no files are specified, \fBpcregrep\fR reads the standard input. By default,
 each line that matches the pattern is copied to the standard output, and if
 there is more than one file, the file name is printed before each line of
-output. However, there are options that can change how \fBpgrep\fR behaves.
+output. However, there are options that can change how \fBpcregrep\fR behaves.
 
 Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
 The newline character is removed from the end of each line before it is matched
@@ -73,4 +73,4 @@ for syntax errors or inacessible files (even if matches were found).
 .SH AUTHOR
 Philip Hazel <ph10@cam.ac.uk>
 .br
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pgrep.html b/doc/pcregrep.html
index 54efed6..77da7c4 100644
--- a/doc/pgrep.html
+++ b/doc/pcregrep.html
@@ -1,9 +1,9 @@
 <HTML>
 <HEAD>
-<TITLE>pgrep specification</TITLE>
+<TITLE>pcregrep specification</TITLE>
 </HEAD>
 <body bgcolor="#FFFFFF" text="#00005A">
-<H1>pgrep specification</H1>
+<H1>pcregrep specification</H1>
 This HTML document has been generated automatically from the original man page.
 If there is any nonsense in it, please consult the man page in case the
 conversion went wrong.
@@ -18,24 +18,24 @@ conversion went wrong.
 </UL>
 <LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
 <P>
-pgrep - a grep with Perl-compatible regular expressions.
+pcregrep - a grep with Perl-compatible regular expressions.
 </P>
 <LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
 <P>
-<B>pgrep [-Vchilnsvx] pattern [file] ...</B>
+<B>pcregrep [-Vchilnsvx] pattern [file] ...</B>
 </P>
 <LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
 <P>
-<B>pgrep</B> searches files for character patterns, in the same way as other
+<B>pcregrep</B> searches files for character patterns, in the same way as other
 grep commands do, but it uses the PCRE regular expression library to support
 patterns that are compatible with the regular expressions of Perl 5. See
 <B>pcre(3)</B> for a full description of syntax and semantics.
 </P>
 <P>
-If no files are specified, <B>pgrep</B> reads the standard input. By default,
+If no files are specified, <B>pcregrep</B> reads the standard input. By default,
 each line that matches the pattern is copied to the standard output, and if
 there is more than one file, the file name is printed before each line of
-output. However, there are options that can change how <B>pgrep</B> behaves.
+output. However, there are options that can change how <B>pcregrep</B> behaves.
 </P>
 <P>
 Lines are limited to BUFSIZ characters. BUFSIZ is defined in <B>&#60;stdio.h&#62;</B>.
@@ -102,4 +102,4 @@ for syntax errors or inacessible files (even if matches were found).
 <P>
 Philip Hazel &#60;ph10@cam.ac.uk&#62;
 <BR>
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pgrep.txt b/doc/pcregrep.txt
index bcd08c0..3483f9e 100644
--- a/doc/pgrep.txt
+++ b/doc/pcregrep.txt
@@ -1,25 +1,26 @@
 NAME
-     pgrep - a grep with Perl-compatible regular expressions.
+     pcregrep - a grep with Perl-compatible regular expressions.
 
 
 
 SYNOPSIS
-     pgrep [-Vchilnsvx] pattern [file] ...
+     pcregrep [-Vchilnsvx] pattern [file] ...
 
 
 
 DESCRIPTION
-     pgrep searches files for character patterns, in the same way
-     as  other  grep  commands  do,  but it uses the PCRE regular
+     pcregrep searches files for character patterns, in the  same
+     way  as other grep commands do, but it uses the PCRE regular
      expression library to support patterns that  are  compatible
      with  the  regular  expressions of Perl 5. See pcre(3) for a
      full description of syntax and semantics.
 
-     If no files are specified, pgrep reads the  standard  input.
-     By  default, each line that matches the pattern is copied to
-     the standard output, and if there is more than one file, the
-     file  name  is  printed before each line of output. However,
-     there are options that can change how pgrep behaves.
+     If no files  are  specified,  pcregrep  reads  the  standard
+     input.  By  default,  each  line that matches the pattern is
+     copied to the standard output, and if there is more than one
+     file,  the  file name is printed before each line of output.
+     However, there are options  that  can  change  how  pcregrep
+     behaves.
 
      Lines are limited to BUFSIZ characters. BUFSIZ is defined in
      <stdio.h>.  The newline character is removed from the end of
@@ -82,5 +83,5 @@ DIAGNOSTICS
 
 AUTHOR
      Philip Hazel <ph10@cam.ac.uk>
-     Copyright (c) 1997-1999 University of Cambridge.
+     Copyright (c) 1997-2000 University of Cambridge.
 
diff --git a/doc/pcreposix.3 b/doc/pcreposix.3
index 1be5d9a..41716ea 100644
--- a/doc/pcreposix.3
+++ b/doc/pcreposix.3
@@ -77,6 +77,14 @@ to the native function.
 The PCRE_MULTILINE option is set when the expression is passed for compilation
 to the native function.
 
+In the absence of these flags, no options are passed to the native function.
+This means the the regex is compiled with PCRE default semantics. In
+particular, the way it handles newline characters in the subject string is the
+Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
+\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way
+newlines are matched by . (they aren't) or a negative class such as [^a] (they
+are).
+
 The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
 \fIpreg\fR structure is filled in on success, and one member of the structure
 is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
@@ -138,4 +146,4 @@ Cambridge CB2 3QG, England.
 .br
 Phone: +44 1223 334714
 
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pcreposix.html b/doc/pcreposix.html
index 121d90f..9c89478 100644
--- a/doc/pcreposix.html
+++ b/doc/pcreposix.html
@@ -107,6 +107,15 @@ The PCRE_MULTILINE option is set when the expression is passed for compilation
 to the native function.
 </P>
 <P>
+In the absence of these flags, no options are passed to the native function.
+This means the the regex is compiled with PCRE default semantics. In
+particular, the way it handles newline characters in the subject string is the
+Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
+<I>some</I> of the effects specified for REG_NEWLINE. It does not affect the way
+newlines are matched by . (they aren't) or a negative class such as [^a] (they
+are).
+</P>
+<P>
 The yield of <B>regcomp()</B> is zero on success, and non-zero otherwise. The
 <I>preg</I> structure is filled in on success, and one member of the structure
 is publicized: <I>re_nsub</I> contains the number of capturing subpatterns in
@@ -179,4 +188,4 @@ Cambridge CB2 3QG, England.
 Phone: +44 1223 334714
 </P>
 <P>
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pcreposix.txt b/doc/pcreposix.txt
index 4a7036f..2d76f7c 100644
--- a/doc/pcreposix.txt
+++ b/doc/pcreposix.txt
@@ -80,6 +80,15 @@ COMPILING A PATTERN
      The PCRE_MULTILINE option is  set  when  the  expression  is
      passed for compilation to the native function.
 
+     In the absence of these flags, no options are passed to  the
+     native  function.  This means the the regex is compiled with
+     PCRE default semantics. In particular, the  way  it  handles
+     newline  characters  in  the subject string is the Perl way,
+     not the POSIX way. Note that setting PCRE_MULTILINE has only
+     some  of  the effects specified for REG_NEWLINE. It does not
+     affect the way newlines are matched by . (they aren't) or  a
+     negative class such as [^a] (they are).
+
      The yield of regcomp() is zero on success, and non-zero oth-
      erwise.  The preg structure is filled in on success, and one
      member of the structure is publicized: re_nsub contains  the
@@ -147,4 +156,4 @@ AUTHOR
      Cambridge CB2 3QG, England.
      Phone: +44 1223 334714
 
-     Copyright (c) 1997-1999 University of Cambridge.
+     Copyright (c) 1997-2000 University of Cambridge.
diff --git a/doc/pcretest.txt b/doc/pcretest.txt
index 0e6783a..add2979 100644
--- a/doc/pcretest.txt
+++ b/doc/pcretest.txt
@@ -43,6 +43,10 @@ backslash, because
 is interpreted as the first line of a pattern that starts with "abc/", causing
 pcretest to read the next line as a continuation of the regular expression.
 
+
+PATTERN MODIFIERS
+-----------------
+
 The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
 PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
 example:
@@ -103,37 +107,48 @@ compiled, and the results used when the expression is matched.
 The /M modifier causes the size of memory block used to hold the compiled
 pattern to be output.
 
-Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
-rather than its native API. When this is done, all other modifiers except /i,
-/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
-set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
-and PCRE_DOTALL unless REG_NEWLINE is set.
+The /P modifier causes pcretest to call PCRE via the POSIX wrapper API rather
+than its native API. When this is done, all other modifiers except /i, /m, and
+/+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m
+is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
+PCRE_DOTALL unless REG_NEWLINE is set.
+
+The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option set.
+This turns on the (currently incomplete) support for UTF-8 character handling
+in PCRE, provided that it was compiled with this support enabled. This modifier
+also causes any non-printing characters in output strings to be printed using
+the \x{hh...} notation if they are valid UTF-8 sequences.
+
+
+DATA LINES
+----------
 
 Before each data line is passed to pcre_exec(), leading and trailing whitespace
 is removed, and it is then scanned for \ escapes. The following are recognized:
 
-  \a     alarm (= BEL)
-  \b     backspace
-  \e     escape
-  \f     formfeed
-  \n     newline
-  \r     carriage return
-  \t     tab
-  \v     vertical tab
-  \nnn   octal character (up to 3 octal digits)
-  \xhh   hexadecimal character (up to 2 hex digits)
-
-  \A     pass the PCRE_ANCHORED option to pcre_exec()
-  \B     pass the PCRE_NOTBOL option to pcre_exec()
-  \Cdd   call pcre_copy_substring() for substring dd after a successful match
-           (any decimal number less than 32)
-  \Gdd   call pcre_get_substring() for substring dd after a successful match
-           (any decimal number less than 32)
-  \L     call pcre_get_substringlist() after a successful match
-  \N     pass the PCRE_NOTEMPTY option to pcre_exec()
-  \Odd   set the size of the output vector passed to pcre_exec() to dd
-           (any number of decimal digits)
-  \Z     pass the PCRE_NOTEOL option to pcre_exec()
+  \a         alarm (= BEL)
+  \b         backspace
+  \e         escape
+  \f         formfeed
+  \n         newline
+  \r         carriage return
+  \t         tab
+  \v         vertical tab
+  \nnn       octal character (up to 3 octal digits)
+  \xhh       hexadecimal character (up to 2 hex digits)
+  \x{hh...}  hexadecimal UTF-8 character
+
+  \A         pass the PCRE_ANCHORED option to pcre_exec()
+  \B         pass the PCRE_NOTBOL option to pcre_exec()
+  \Cdd       call pcre_copy_substring() for substring dd after a successful
+               match (any decimal number less than 32)
+  \Gdd       call pcre_get_substring() for substring dd after a successful
+               match (any decimal number less than 32)
+  \L         call pcre_get_substringlist() after a successful match
+  \N         pass the PCRE_NOTEMPTY option to pcre_exec()
+  \Odd       set the size of the output vector passed to pcre_exec() to dd
+               (any number of decimal digits)
+  \Z         pass the PCRE_NOTEOL option to pcre_exec()
 
 A backslash followed by anything else just escapes the anything else. If the
 very last character is a backslash, it is ignored. This gives a way of passing
@@ -143,6 +158,15 @@ If /P was present on the regex, causing the POSIX wrapper API to be used, only
 \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
 regexec() respectively.
 
+The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
+of the /8 modifier on the pattern. It is recognized always. There may be any
+number of hexadecimal digits inside the braces. The result is from one to six
+bytes, encoded according to the UTF-8 rules.
+
+
+OUTPUT FROM PCRETEST
+--------------------
+
 When a match succeeds, pcretest outputs the list of captured substrings that
 pcre_exec() returns, starting with number 0 for the string that matched the
 whole pattern. Here is an example of an interactive pcretest run.
@@ -158,8 +182,9 @@ whole pattern. Here is an example of an interactive pcretest run.
   No match
 
 If the strings contain any non-printing characters, they are output as \0x
-escapes. If the pattern has the /+ modifier, then the output for substring 0 is
-followed by the the rest of the subject string, identified by "0+" like this:
+escapes, or as \x{...} escapes if the /8 modifier was present on the pattern.
+If the pattern has the /+ modifier, then the output for substring 0 is followed
+by the the rest of the subject string, identified by "0+" like this:
 
     re> /cat/+
   data> cataract
@@ -190,6 +215,10 @@ Note that while patterns can be continued over several lines (a plain ">"
 prompt is used for continuations), data lines may not. However newlines can be
 included in data by means of the \n escape.
 
+
+COMMAND LINE OPTIONS
+--------------------
+
 If the -p option is given to pcretest, it is equivalent to adding /P to each
 regular expression: the POSIX wrapper API is used to call PCRE. None of the
 following flags has any effect in this case.
@@ -208,10 +237,10 @@ a synonym for -m.
 
 If the -t option is given, each compile, study, and match is run 20000 times
 while being timed, and the resulting time per compile or match is output in
-milliseconds. Do not set -t with -s, because you will then get the size output
+milliseconds. Do not set -t with -m, because you will then get the size output
 20000 times and the timing will be distorted. If you want to change the number
 of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
 pcretest.c
 
 Philip Hazel <ph10@cam.ac.uk>
-January 2000
+August 2000
diff --git a/doc/perltest.txt b/doc/perltest.txt
index 6c38ebe..5a40401 100644
--- a/doc/perltest.txt
+++ b/doc/perltest.txt
@@ -13,11 +13,17 @@ for perltest as well as for pcretest, and the special upper case modifiers such
 as /A that pcretest recognizes are not used in these files. The output should
 be identical, apart from the initial identifying banner.
 
+For testing UTF-8 features, an alternative form of perltest, called perltest8,
+is supplied. This requires Perl 5.6 or higher. It recognizes the special
+modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput5
+file can be fed to perltest8.
+
 The testinput2 and testinput4 files are not suitable for feeding to perltest,
 since they do make use of the special upper case modifiers and escapes that
 pcretest uses to test some features of PCRE. The first of these files also
 contains malformed regular expressions, in order to check that PCRE diagnoses
-them correctly.
+them correctly. Similarly, testinput6 tests UTF-8 features that do not relate
+to Perl.
 
 Philip Hazel <ph10@cam.ac.uk>
-January 2000
+August 2000
diff --git a/get.c b/get.c
index 035668e..42e9bd4 100644
--- a/get.c
+++ b/get.c
@@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals.
 
 Written by: Philip Hazel <ph10@cam.ac.uk>
 
-           Copyright (c) 1997-1999 University of Cambridge
+           Copyright (c) 1997-2000 University of Cambridge
 
 -----------------------------------------------------------------------------
 Permission is granted to anyone to use this software for any purpose on any
@@ -144,6 +144,25 @@ return 0;
 
 
 /*************************************************
+*   Free store obtained by get_substring_list    *
+*************************************************/
+
+/* This function exists for the benefit of people calling PCRE from non-C
+programs that can call its functions, but not free() or (pcre_free)() directly.
+
+Argument:   the result of a previous pcre_get_substring_list()
+Returns:    nothing
+*/
+
+void
+pcre_free_substring_list(const char **pointer)
+{
+(pcre_free)((void *)pointer);
+}
+
+
+
+/*************************************************
 *      Copy captured string to new store         *
 *************************************************/
 
@@ -186,4 +205,23 @@ substring[yield] = 0;
 return yield;
 }
 
+
+
+/*************************************************
+*       Free store obtained by get_substring     *
+*************************************************/
+
+/* This function exists for the benefit of people calling PCRE from non-C
+programs that can call its functions, but not free() or (pcre_free)() directly.
+
+Argument:   the result of a previous pcre_get_substring()
+Returns:    nothing
+*/
+
+void
+pcre_free_substring(const char *pointer)
+{
+(pcre_free)((void *)pointer);
+}
+
 /* End of get.c */
diff --git a/internal.h b/internal.h
index b4b750f..ea0d905 100644
--- a/internal.h
+++ b/internal.h
@@ -105,7 +105,7 @@ time, run time or study time, respectively. */
 
 #define PUBLIC_OPTIONS \
   (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
-   PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
+   PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8)
 
 #define PUBLIC_EXEC_OPTIONS \
   (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
@@ -274,6 +274,9 @@ just to accommodate the POSIX wrapper. */
 #define ERR29 "(?p must be followed by )"
 #define ERR30 "unknown POSIX class name"
 #define ERR31 "POSIX collating elements are not supported"
+#define ERR32 "this version of PCRE is not compiled with PCRE_UTF8 support"
+#define ERR33 "characters with values > 255 are not yet supported in classes"
+#define ERR34 "character value in \\x{...} sequence is too large"
 
 /* All character handling must be done as unsigned characters. Otherwise there
 are problems with top-bit-set characters and functions such as isspace().
@@ -330,6 +333,7 @@ typedef struct match_data {
   BOOL   offset_overflow;       /* Set if too many extractions */
   BOOL   notbol;                /* NOTBOL flag */
   BOOL   noteol;                /* NOTEOL flag */
+  BOOL   utf8;                  /* UTF8 flag */
   BOOL   endonly;               /* Dollar not before final \n */
   BOOL   notempty;              /* Empty string match not wanted */
   const uschar *start_pattern;  /* For use when recursing */
diff --git a/pcre.c b/pcre.c
index e3fdde9..e428eda 100644
--- a/pcre.c
+++ b/pcre.c
@@ -66,6 +66,16 @@ not be set greater than 200. */
 #define BRASTACK_SIZE 200
 
 
+/* The number of bytes in a literal character string above which we can't add
+any more is different when UTF-8 characters may be encountered. */
+
+#ifdef SUPPORT_UTF8
+#define MAXLIT 250
+#else
+#define MAXLIT 255
+#endif
+
+
 /* Min and max values for the common repeats; for the maxima, 0 => infinity */
 
 static const char rep_min[] = { 0, 0, 1, 1, 0, 0 };
@@ -176,6 +186,64 @@ void  (*pcre_free)(void *) = free;
 
 
 
+/*************************************************
+*    Macros and tables for character handling    *
+*************************************************/
+
+/* When UTF-8 encoding is being used, a character is no longer just a single
+byte. The macros for character handling generate simple sequences when used in
+byte-mode, and more complicated ones for UTF-8 characters. */
+
+#ifndef SUPPORT_UTF8
+#define GETCHARINC(c, eptr) c = *eptr++;
+#define GETCHARLEN(c, eptr, len) c = *eptr;
+#define BACKCHAR(eptr)
+
+#else   /* SUPPORT_UTF8 */
+
+/* Get the next UTF-8 character, advancing the pointer */
+
+#define GETCHARINC(c, eptr) \
+  c = *eptr++; \
+  if (md->utf8 && (c & 0xc0) == 0xc0) \
+    { \
+    int a = utf8_table4[c & 0x3f];  /* Number of additional bytes */ \
+    int s = 6 - a;                  /* Amount to shift next byte */  \
+    c &= utf8_table3[a];            /* Low order bits from first byte */ \
+    while (a-- > 0) \
+      { \
+      c |= (*eptr++ & 0x3f) << s; \
+      s += 6; \
+      } \
+    }
+
+/* Get the next UTF-8 character, not advancing the pointer, setting length */
+
+#define GETCHARLEN(c, eptr, len) \
+  c = *eptr; \
+  len = 1; \
+  if (md->utf8 && (c & 0xc0) == 0xc0) \
+    { \
+    int i; \
+    int a = utf8_table4[c & 0x3f];  /* Number of additional bytes */ \
+    int s = 6 - a;                  /* Amount to shift next byte */  \
+    c &= utf8_table3[a];            /* Low order bits from first byte */ \
+    for (i = 1; i <= a; i++) \
+      { \
+      c |= (eptr[i] & 0x3f) << s; \
+      s += 6; \
+      } \
+    len += a; \
+    }
+
+/* If the pointer is not at the start of a character, move it back until
+it is. */
+
+#define BACKCHAR(eptr) while((*eptr & 0xc0) == 0x80) eptr--;
+
+#endif
+
+
 
 /*************************************************
 *             Default character tables           *
@@ -191,6 +259,66 @@ tables. */
 
 
 
+#ifdef SUPPORT_UTF8
+/*************************************************
+*           Tables for UTF-8 support             *
+*************************************************/
+
+/* These are the breakpoints for different numbers of bytes in a UTF-8
+character. */
+
+static int utf8_table1[] = { 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
+
+/* These are the indicator bits and the mask for the data bits to set in the
+first byte of a character, indexed by the number of additional bytes. */
+
+static int utf8_table2[] = { 0,    0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
+static int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
+
+/* Table of the number of extra characters, indexed by the first character
+masked with 0x3f. The highest number for a valid UTF-8 character is in fact
+0x3d. */
+
+static uschar utf8_table4[] = {
+  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
+  3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
+
+
+/*************************************************
+*       Convert character value to UTF-8         *
+*************************************************/
+
+/* This function takes an integer value in the range 0 - 0x7fffffff
+and encodes it as a UTF-8 character in 0 to 6 bytes.
+
+Arguments:
+  cvalue     the character value
+  buffer     pointer to buffer for result - at least 6 bytes long
+
+Returns:     number of characters placed in the buffer
+*/
+
+static int
+ord2utf8(int cvalue, uschar *buffer)
+{
+register int i, j;
+for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+  if (cvalue <= utf8_table1[i]) break;
+*buffer++ = utf8_table2[i] | (cvalue & utf8_table3[i]);
+cvalue >>= 6 - i;
+for (j = 0; j < i; j++)
+  {
+  *buffer++ = 0x80 | (cvalue & 0x3f);
+  cvalue >>= 6;
+  }
+return i + 1;
+}
+#endif
+
+
+
 /*************************************************
 *          Return version string                 *
 *************************************************/
@@ -349,9 +477,9 @@ while (length-- > 0)
 
 /* This function is called when a \ has been encountered. It either returns a
 positive value for a simple escape such as \n, or a negative value which
-encodes one of the more complicated things such as \d. On entry, ptr is
-pointing at the \. On exit, it is on the final character of the escape
-sequence.
+encodes one of the more complicated things such as \d. When UTF-8 is enabled,
+a positive value greater than 255 may be returned. On entry, ptr is pointing at
+the \. On exit, it is on the final character of the escape sequence.
 
 Arguments:
   ptrptr     points to the pattern position pointer
@@ -373,7 +501,9 @@ check_escape(const uschar **ptrptr, const char **errorptr, int bracount,
 const uschar *ptr = *ptrptr;
 int c, i;
 
-c = *(++ptr) & 255;   /* Ensure > 0 on signed-char systems */
+/* If backslash is at the end of the pattern, it's an error. */
+
+c = *(++ptr);
 if (c == 0) *errorptr = ERR1;
 
 /* Digits or letters may have special meaning; all others are literals. */
@@ -433,18 +563,46 @@ else
       }
 
     /* \0 always starts an octal number, but we may drop through to here with a
-    larger first octal digit */
+    larger first octal digit. */
 
     case '0':
     c -= '0';
     while(i++ < 2 && (cd->ctypes[ptr[1]] & ctype_digit) != 0 &&
       ptr[1] != '8' && ptr[1] != '9')
         c = c * 8 + *(++ptr) - '0';
+    c &= 255;     /* Take least significant 8 bits */
     break;
 
-    /* Special escapes not starting with a digit are straightforward */
+    /* \x is complicated when UTF-8 is enabled. \x{ddd} is a character number
+    which can be greater than 0xff, but only if the ddd are hex digits. */
 
     case 'x':
+#ifdef SUPPORT_UTF8
+    if (ptr[1] == '{' && (options & PCRE_UTF8) != 0)
+      {
+      const uschar *pt = ptr + 2;
+      register int count = 0;
+      c = 0;
+      while ((cd->ctypes[*pt] & ctype_xdigit) != 0)
+        {
+        count++;
+        c = c * 16 + cd->lcc[*pt] -
+          (((cd->ctypes[*pt] & ctype_digit) != 0)? '0' : 'W');
+        pt++;
+        }
+      if (*pt == '}')
+        {
+        if (c < 0 || count > 8) *errorptr = ERR34;
+        ptr = pt;
+        break;
+        }
+      /* If the sequence of hex digits does not end with '}', then we don't
+      recognize this construct; fall through to the normal \x handling. */
+      }
+#endif
+
+    /* Read just a single hex char */
+
     c = 0;
     while (i++ < 2 && (cd->ctypes[ptr[1]] & ctype_xdigit) != 0)
       {
@@ -454,6 +612,8 @@ else
       }
     break;
 
+    /* Other special escapes not starting with a digit are straightforward */
+
     case 'c':
     c = *(++ptr);
     if (c == 0)
@@ -591,12 +751,13 @@ if the length is fixed. This is needed for dealing with backward assertions.
 
 Arguments:
   code     points to the start of the pattern (the bracket)
+  options  the compiling options
 
 Returns:   the fixed length, or -1 if there is no fixed length
 */
 
 static int
-find_fixedlength(uschar *code)
+find_fixedlength(uschar *code, int options)
 {
 int length = -1;
 
@@ -617,7 +778,7 @@ for (;;)
     case OP_BRA:
     case OP_ONCE:
     case OP_COND:
-    d = find_fixedlength(cc);
+    d = find_fixedlength(cc, options);
     if (d < 0) return -1;
     branchlength += d;
     do cc += (cc[1] << 8) + cc[2]; while (*cc == OP_ALT);
@@ -671,10 +832,17 @@ for (;;)
     cc++;
     break;
 
-    /* Handle char strings */
+    /* Handle char strings. In UTF-8 mode we must count characters, not bytes.
+    This requires a scan of the string, unfortunately. We assume valid UTF-8
+    strings, so all we do is reduce the length by one for byte whose bits are
+    10xxxxxx. */
 
     case OP_CHARS:
     branchlength += *(++cc);
+#ifdef SUPPORT_UTF8
+    for (d = 1; d <= *cc; d++)
+      if ((cc[d] & 0xc0) == 0x80) branchlength--;
+#endif
     cc += *cc + 1;
     break;
 
@@ -1054,7 +1222,17 @@ for (;; ptr++)
             goto FAILED;
             }
           }
-        /* Fall through if single character */
+
+        /* Fall through if single character, but don't at present allow
+        chars > 255 in UTF-8 mode. */
+
+#ifdef SUPPORT_UTF8
+        if (c > 255)
+          {
+          *errorptr = ERR33;
+          goto FAILED;
+          }
+#endif
         }
 
       /* A single character may be followed by '-' to form a range. However,
@@ -1074,17 +1252,29 @@ for (;; ptr++)
           }
 
         /* The second part of a range can be a single-character escape, but
-        not any of the other escapes. */
+        not any of the other escapes. Perl 5.6 treats a hyphen as a literal
+        in such circumstances. */
 
         if (d == '\\')
           {
+          const uschar *oldptr = ptr;
           d = check_escape(&ptr, errorptr, *brackets, options, TRUE, cd);
+
+#ifdef SUPPORT_UTF8
+          if (d > 255)
+            {
+            *errorptr = ERR33;
+            goto FAILED;
+            }
+#endif
+          /* \b is backslash; any other special means the '-' was literal */
+
           if (d < 0)
             {
             if (d == -ESC_b) d = '\b'; else
               {
-              *errorptr = ERR7;
-              goto FAILED;
+              ptr = oldptr - 2;
+              goto SINGLE_CHARACTER;  /* A few lines below */
               }
             }
           }
@@ -1112,6 +1302,8 @@ for (;; ptr++)
       /* Handle a lone single character - we can get here for a normal
       non-escape char, or after \ that introduces a single character. */
 
+      SINGLE_CHARACTER:
+
       class [c/8] |= (1 << (c&7));
       if ((options & PCRE_CASELESS) != 0)
         {
@@ -1829,6 +2021,20 @@ for (;; ptr++)
         tempptr = ptr;
         c = check_escape(&ptr, errorptr, *brackets, options, FALSE, cd);
         if (c < 0) { ptr = tempptr; break; }
+
+        /* If a character is > 127 in UTF-8 mode, we have to turn it into
+        two or more characters in the UTF-8 encoding. */
+
+#ifdef SUPPORT_UTF8
+        if (c > 127 && (options & PCRE_UTF8) != 0)
+          {
+          uschar buffer[8];
+          int len = ord2utf8(c, buffer);
+          for (c = 0; c < len; c++) *code++ = buffer[c];
+          length += len;
+          continue;
+          }
+#endif
         }
 
       /* Ordinary character or single-char escape */
@@ -1839,7 +2045,7 @@ for (;; ptr++)
 
     /* This "while" is the end of the "do" above. */
 
-    while (length < 255 && (cd->ctypes[c = *(++ptr)] & ctype_meta) == 0);
+    while (length < MAXLIT && (cd->ctypes[c = *(++ptr)] & ctype_meta) == 0);
 
     /* Update the last character and the count of literals */
 
@@ -1851,7 +2057,7 @@ for (;; ptr++)
     the next state. */
 
     previous[1] = length;
-    if (length < 255) ptr--;
+    if (length < MAXLIT) ptr--;
     break;
     }
   }                   /* end of big loop */
@@ -1989,7 +2195,7 @@ for (;;)
   if (lookbehind)
     {
     *code = OP_END;
-    length = find_fixedlength(last_branch);
+    length = find_fixedlength(last_branch, options);
     DPRINTF(("fixed length = %d\n", length));
     if (length < 0)
       {
@@ -2280,6 +2486,16 @@ uschar bralenstack[BRASTACK_SIZE];
 uschar *code_base, *code_end;
 #endif
 
+/* Can't support UTF8 unless PCRE has been compiled to include the code. */
+
+#ifndef SUPPORT_UTF8
+if ((options & PCRE_UTF8) != 0)
+  {
+  *errorptr = ERR32;
+  return NULL;
+  }
+#endif
+
 /* We can't pass back an error message if errorptr is NULL; I guess the best we
 can do is just return NULL. */
 
@@ -2775,6 +2991,16 @@ while ((c = *(++ptr)) != 0)
           &compile_block);
         if (*errorptr != NULL) goto PCRE_ERROR_RETURN;
         if (c < 0) { ptr = saveptr; break; }
+
+#ifdef SUPPORT_UTF8
+        if (c > 127 && (options & PCRE_UTF8) != 0)
+          {
+          int i;
+          for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+            if (c <= utf8_table1[i]) break;
+          runlength += i;
+          }
+#endif
         }
 
       /* Ordinary character or single-char escape */
@@ -2784,7 +3010,7 @@ while ((c = *(++ptr)) != 0)
 
     /* This "while" is the end of the "do" above. */
 
-    while (runlength < 255 &&
+    while (runlength < MAXLIT &&
       (compile_block.ctypes[c = *(++ptr)] & ctype_meta) == 0);
 
     ptr--;
@@ -3429,10 +3655,21 @@ for (;;)
 
     /* Move the subject pointer back. This occurs only at the start of
     each branch of a lookbehind assertion. If we are too close to the start to
-    move back, this match function fails. */
+    move back, this match function fails. When working with UTF-8 we move
+    back a number of characters, not bytes. */
 
     case OP_REVERSE:
+#ifdef SUPPORT_UTF8
+    c = (ecode[1] << 8) + ecode[2];
+    for (i = 0; i < c; i++)
+      {
+      eptr--;
+      BACKCHAR(eptr)
+      }
+#else
     eptr -= (ecode[1] << 8) + ecode[2];
+#endif
+
     if (eptr < md->start_subject) return FALSE;
     ecode += 3;
     break;
@@ -3752,6 +3989,10 @@ for (;;)
     if ((ims & PCRE_DOTALL) == 0 && eptr < md->end_subject && *eptr == '\n')
       return FALSE;
     if (eptr++ >= md->end_subject) return FALSE;
+#ifdef SUPPORT_UTF8
+    if (md->utf8)
+      while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+#endif
     ecode++;
     break;
 
@@ -3953,7 +4194,13 @@ for (;;)
       for (i = 1; i <= min; i++)
         {
         if (eptr >= md->end_subject) return FALSE;
-        c = *eptr++;
+        GETCHARINC(c, eptr)         /* Get character; increment eptr */
+
+#ifdef SUPPORT_UTF8
+        /* We do not yet support class members > 255 */
+        if (c > 255) return FALSE;
+#endif
+
         if ((data[c/8] & (1 << (c&7))) != 0) continue;
         return FALSE;
         }
@@ -3973,7 +4220,12 @@ for (;;)
           if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
             return TRUE;
           if (i >= max || eptr >= md->end_subject) return FALSE;
-          c = *eptr++;
+          GETCHARINC(c, eptr)       /* Get character; increment eptr */
+
+#ifdef SUPPORT_UTF8
+          /* We do not yet support class members > 255 */
+          if (c > 255) return FALSE;
+#endif
           if ((data[c/8] & (1 << (c&7))) != 0) continue;
           return FALSE;
           }
@@ -3985,17 +4237,29 @@ for (;;)
       else
         {
         const uschar *pp = eptr;
-        for (i = min; i < max; eptr++, i++)
+        int len = 1;
+        for (i = min; i < max; i++)
           {
           if (eptr >= md->end_subject) break;
-          c = *eptr;
-          if ((data[c/8] & (1 << (c&7))) != 0) continue;
-          break;
+          GETCHARLEN(c, eptr, len)  /* Get character, set length if UTF-8 */
+
+#ifdef SUPPORT_UTF8
+          /* We do not yet support class members > 255 */
+          if (c > 255) break;
+#endif
+          if ((data[c/8] & (1 << (c&7))) == 0) break;
+          eptr += len;
           }
 
         while (eptr >= pp)
+          {
           if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
             return TRUE;
+
+#ifdef SUPPORT_UTF8
+          BACKCHAR(eptr)
+#endif
+          }
         return FALSE;
         }
       }
@@ -4315,13 +4579,29 @@ for (;;)
 
     /* First, ensure the minimum number of matches are present. Use inline
     code for maximizing the speed, and do the type test once at the start
-    (i.e. keep it out of the loop). Also test that there are at least the
-    minimum number of characters before we start. */
+    (i.e. keep it out of the loop). Also we can test that there are at least
+    the minimum number of bytes before we start, except when doing '.' in
+    UTF8 mode. Leave the test in in all cases; in the special case we have
+    to test after each character. */
 
     if (min > md->end_subject - eptr) return FALSE;
     if (min > 0) switch(ctype)
       {
       case OP_ANY:
+#ifdef SUPPORT_UTF8
+      if (md->utf8)
+        {
+        for (i = 1; i <= min; i++)
+          {
+          if (eptr >= md->end_subject ||
+             (*eptr++ == '\n' && (ims & PCRE_DOTALL) == 0))
+            return FALSE;
+          while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+          }
+        break;
+        }
+#endif
+      /* Non-UTF8 can be faster */
       if ((ims & PCRE_DOTALL) == 0)
         { for (i = 1; i <= min; i++) if (*eptr++ == '\n') return FALSE; }
       else eptr += min;
@@ -4379,6 +4659,10 @@ for (;;)
           {
           case OP_ANY:
           if ((ims & PCRE_DOTALL) == 0 && c == '\n') return FALSE;
+#ifdef SUPPORT_UTF8
+          if (md->utf8)
+            while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+#endif
           break;
 
           case OP_NOT_DIGIT:
@@ -4418,6 +4702,33 @@ for (;;)
       switch(ctype)
         {
         case OP_ANY:
+
+        /* Special code is required for UTF8, but when the maximum is unlimited
+        we don't need it. */
+
+#ifdef SUPPORT_UTF8
+        if (md->utf8 && max < INT_MAX)
+          {
+          if ((ims & PCRE_DOTALL) == 0)
+            {
+            for (i = min; i < max; i++)
+              {
+              if (eptr >= md->end_subject || *eptr++ == '\n') break;
+              while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+              }
+            }
+          else
+            {
+            for (i = min; i < max; i++)
+              {
+              eptr++;
+              while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+              }
+            }
+          break;
+          }
+#endif
+        /* Non-UTF8 can be faster */
         if ((ims & PCRE_DOTALL) == 0)
           {
           for (i = min; i < max; i++)
@@ -4490,8 +4801,14 @@ for (;;)
         }
 
       while (eptr >= pp)
+        {
         if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
           return TRUE;
+#ifdef SUPPORT_UTF8
+        if (md->utf8)
+          while (eptr > pp && (*eptr & 0xc0) == 0x80) eptr--;
+#endif
+        }
       return FALSE;
       }
     /* Control never gets here */
@@ -4572,6 +4889,7 @@ match_block.end_subject = match_block.start_subject + length;
 end_subject = match_block.end_subject;
 
 match_block.endonly = (re->options & PCRE_DOLLAR_ENDONLY) != 0;
+match_block.utf8 = (re->options & PCRE_UTF8) != 0;
 
 match_block.notbol = (options & PCRE_NOTBOL) != 0;
 match_block.noteol = (options & PCRE_NOTEOL) != 0;
diff --git a/pcre.in b/pcre.in
index 8a53167..1dffb02 100644
--- a/pcre.in
+++ b/pcre.in
@@ -50,6 +50,7 @@ extern "C" {
 #define PCRE_NOTEOL          0x0100
 #define PCRE_UNGREEDY        0x0200
 #define PCRE_NOTEMPTY        0x0400
+#define PCRE_UTF8            0x0800
 
 /* Exec-time and get-time error codes */
 
@@ -88,14 +89,16 @@ PCRE_DL_IMPORT extern void  (*pcre_free)(void *);
 /* Functions */
 
 extern pcre *pcre_compile(const char *, int, const char **, int *,
-  const unsigned char *);
-extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
-extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
-  int, int, int, int *, int);
-extern int pcre_get_substring(const char *, int *, int, int, const char **);
-extern int pcre_get_substring_list(const char *, int *, int, const char ***);
-extern int pcre_info(const pcre *, int *, int *);
-extern int pcre_fullinfo(const pcre *, const pcre_extra *, int, void *);
+              const unsigned char *);
+extern int  pcre_copy_substring(const char *, int *, int, int, char *, int);
+extern int  pcre_exec(const pcre *, const pcre_extra *, const char *,
+              int, int, int, int *, int);
+extern void pcre_free_substring(const char *);
+extern void pcre_free_substring_list(const char **);
+extern int  pcre_get_substring(const char *, int *, int, int, const char **);
+extern int  pcre_get_substring_list(const char *, int *, int, const char ***);
+extern int  pcre_info(const pcre *, int *, int *);
+extern int  pcre_fullinfo(const pcre *, const pcre_extra *, int, void *);
 extern unsigned const char *pcre_maketables(void);
 extern pcre_extra *pcre_study(const pcre *, int, const char **);
 extern const char *pcre_version(void);
diff --git a/pgrep.c b/pcregrep.c
index ad1b87e..e8c934e 100644
--- a/pgrep.c
+++ b/pcregrep.c
@@ -1,7 +1,10 @@
 /*************************************************
-*               PCRE grep program                *
+*               pcregrep program                 *
 *************************************************/
 
+/* This is a grep program that uses the PCRE regular expression library to do
+its pattern matching. */
+
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
@@ -59,7 +62,7 @@ return sys_errlist[n];
 *************************************************/
 
 static int
-pgrep(FILE *in, char *name)
+pcregrep(FILE *in, char *name)
 {
 int rc = 1;
 int linenumber = 0;
@@ -119,7 +122,7 @@ return rc;
 static int
 usage(int rc)
 {
-fprintf(stderr, "Usage: pgrep [-Vchilnsvx] pattern [file] ...\n");
+fprintf(stderr, "Usage: pcregrep [-Vchilnsvx] pattern [file] ...\n");
 return rc;
 }
 
@@ -165,7 +168,7 @@ for (i = 1; i < argc; i++)
       break;
 
       default:
-      fprintf(stderr, "pgrep: unknown option %c\n", s[-1]);
+      fprintf(stderr, "pcregrep: unknown option %c\n", s[-1]);
       return usage(2);
       }
     }
@@ -180,7 +183,7 @@ if (i >= argc) return usage(0);
 pattern = pcre_compile(argv[i++], options, &error, &errptr, NULL);
 if (pattern == NULL)
   {
-  fprintf(stderr, "pgrep: error in regex at offset %d: %s\n", errptr, error);
+  fprintf(stderr, "pcregrep: error in regex at offset %d: %s\n", errptr, error);
   return 2;
   }
 
@@ -189,13 +192,13 @@ if (pattern == NULL)
 hints = pcre_study(pattern, 0, &error);
 if (error != NULL)
   {
-  fprintf(stderr, "pgrep: error while studing regex: %s\n", error);
+  fprintf(stderr, "pcregrep: error while studing regex: %s\n", error);
   return 2;
   }
 
 /* If there are no further arguments, do the business on stdin and exit */
 
-if (i >= argc) return pgrep(stdin, NULL);
+if (i >= argc) return pcregrep(stdin, NULL);
 
 /* Otherwise, work through the remaining arguments as files. If there is only
 one, don't give its name on the output. */
@@ -213,7 +216,7 @@ for (; i < argc; i++)
     }
   else
     {
-    int frc = pgrep(in, filenames? argv[i] : NULL);
+    int frc = pcregrep(in, filenames? argv[i] : NULL);
     if (frc == 0 && rc == 1) rc = 0;
     fclose(in);
     }
diff --git a/pcreposix.c b/pcreposix.c
index 7c66cce..71d02ef 100644
--- a/pcreposix.c
+++ b/pcreposix.c
@@ -80,7 +80,10 @@ static int eint[] = {
   REG_BADPAT,  /* "assertion expected after (?(" */
   REG_BADPAT,  /* "(?p must be followed by )" */
   REG_ECTYPE,  /* "unknown POSIX class name" */
-  REG_BADPAT   /* "POSIX collating elements are not supported" */
+  REG_BADPAT,  /* "POSIX collating elements are not supported" */
+  REG_INVARG,  /* "this version of PCRE is not compiled with PCRE_UTF8 support" */
+  REG_BADPAT,  /* "characters with values > 255 are not yet supported in classes" */
+  REG_BADPAT   /* "character value in \x{...} sequence is too large" */
 };
 
 /* Table of texts corresponding to POSIX error codes */
diff --git a/pcretest.c b/pcretest.c
index bbe9bdd..ee5df5f 100644
--- a/pcretest.c
+++ b/pcretest.c
@@ -38,6 +38,113 @@ static size_t gotten_store;
 
 
 
+static int utf8_table1[] = {
+  0x0000007f, 0x000007ff, 0x0000ffff, 0x001fffff, 0x03ffffff, 0x7fffffff};
+
+static int utf8_table2[] = {
+  0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
+
+static int utf8_table3[] = {
+  0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
+
+
+/*************************************************
+*       Convert character value to UTF-8         *
+*************************************************/
+
+/* This function takes an integer value in the range 0 - 0x7fffffff
+and encodes it as a UTF-8 character in 0 to 6 bytes.
+
+Arguments:
+  cvalue     the character value
+  buffer     pointer to buffer for result - at least 6 bytes long
+
+Returns:     number of characters placed in the buffer
+             -1 if input character is negative
+             0 if input character is positive but too big (only when
+             int is longer than 32 bits)
+*/
+
+static int
+ord2utf8(int cvalue, unsigned char *buffer)
+{
+register int i, j;
+for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+  if (cvalue <= utf8_table1[i]) break;
+if (i >= sizeof(utf8_table1)/sizeof(int)) return 0;
+if (cvalue < 0) return -1;
+*buffer++ = utf8_table2[i] | (cvalue & utf8_table3[i]);
+cvalue >>= 6 - i;
+for (j = 0; j < i; j++)
+  {
+  *buffer++ = 0x80 | (cvalue & 0x3f);
+  cvalue >>= 6;
+  }
+return i + 1;
+}
+
+
+/*************************************************
+*            Convert UTF-8 string to value       *
+*************************************************/
+
+/* This function takes one or more bytes that represents a UTF-8 character,
+and returns the value of the character.
+
+Argument:
+  buffer   a pointer to the byte vector
+  vptr     a pointer to an int to receive the value
+
+Returns:   >  0 => the number of bytes consumed
+           -6 to 0 => malformed UTF-8 character at offset = (-return)
+*/
+
+int
+utf82ord(unsigned char *buffer, int *vptr)
+{
+int c = *buffer++;
+int d = c;
+int i, j, s;
+
+for (i = -1; i < 6; i++)               /* i is number of additional bytes */
+  {
+  if ((d & 0x80) == 0) break;
+  d <<= 1;
+  }
+
+if (i == -1) { *vptr = c; return 1; }  /* ascii character */
+if (i == 0 || i == 6) return 0;        /* invalid UTF-8 */
+
+/* i now has a value in the range 1-5 */
+
+d = c & utf8_table3[i];
+s = 6 - i;
+
+for (j = 0; j < i; j++)
+  {
+  c = *buffer++;
+  if ((c & 0xc0) != 0x80) return -(j+1);
+  d |= (c & 0x3f) << s;
+  s += 6;
+  }
+
+/* Check that encoding was the correct unique one */
+
+for (j = 0; j < sizeof(utf8_table1)/sizeof(int); j++)
+  if (d <= utf8_table1[j]) break;
+if (j != i) return -(i+1);
+
+/* Valid value */
+
+*vptr = d;
+return i+1;
+}
+
+
+
+
+
+
 /* Debugging function to print the internal form of the regex. This is the same
 code as contained in pcre.c under the DEBUG macro. */
 
@@ -265,14 +372,31 @@ for(;;)
 
 
 
-/* Character string printing function. */
+/* Character string printing function. A "normal" and a UTF-8 version. */
 
-static void pchars(unsigned char *p, int length)
+static void pchars(unsigned char *p, int length, int utf8)
 {
 int c;
 while (length-- > 0)
+  {
+  if (utf8)
+    {
+    int rc = utf82ord(p, &c);
+    if (rc > 0)
+      {
+      length -= rc - 1;
+      p += rc;
+      if (c < 256 && isprint(c)) fprintf(outfile, "%c", c);
+        else fprintf(outfile, "\\x{%02x}", c);
+      continue;
+      }
+    }
+
+   /* Not UTF-8, or malformed UTF-8  */
+
   if (isprint(c = *(p++))) fprintf(outfile, "%c", c);
     else fprintf(outfile, "\\x%02x", c);
+  }
 }
 
 
@@ -403,6 +527,7 @@ while (!done)
   int do_g = 0;
   int do_showinfo = showinfo;
   int do_showrest = 0;
+  int utf8 = 0;
   int erroroffset, len, delimiter;
 
   if (infile == stdin) printf("  re> ");
@@ -494,6 +619,7 @@ while (!done)
       case 'S': do_study = 1; break;
       case 'U': options |= PCRE_UNGREEDY; break;
       case 'X': options |= PCRE_EXTRA; break;
+      case '8': options |= PCRE_UTF8; utf8 = 1; break;
 
       case 'L':
       ppp = pp;
@@ -633,7 +759,7 @@ while (!done)
       if (backrefmax > 0)
         fprintf(outfile, "Max back reference = %d\n", backrefmax);
       if (options == 0) fprintf(outfile, "No options\n");
-        else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s\n",
+        else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s\n",
           ((options & PCRE_ANCHORED) != 0)? " anchored" : "",
           ((options & PCRE_CASELESS) != 0)? " caseless" : "",
           ((options & PCRE_EXTENDED) != 0)? " extended" : "",
@@ -641,7 +767,8 @@ while (!done)
           ((options & PCRE_DOTALL) != 0)? " dotall" : "",
           ((options & PCRE_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "",
           ((options & PCRE_EXTRA) != 0)? " extra" : "",
-          ((options & PCRE_UNGREEDY) != 0)? " ungreedy" : "");
+          ((options & PCRE_UNGREEDY) != 0)? " ungreedy" : "",
+          ((options & PCRE_UTF8) != 0)? " utf8" : "");
 
       if (((((real_pcre *)re)->options) & PCRE_ICHANGED) != 0)
         fprintf(outfile, "Case state changes\n");
@@ -796,6 +923,30 @@ while (!done)
         break;
 
         case 'x':
+
+        /* Handle \x{..} specially - new Perl thing for utf8 */
+
+        if (*p == '{')
+          {
+          unsigned char *pt = p;
+          c = 0;
+          while (isxdigit(*(++pt)))
+            c = c * 16 + tolower(*pt) - ((isdigit(*pt))? '0' : 'W');
+          if (*pt == '}')
+            {
+            unsigned char buffer[8];
+            int ii, utn;
+            utn = ord2utf8(c, buffer);
+            for (ii = 0; ii < utn - 1; ii++) *q++ = buffer[ii];
+            c = buffer[ii];   /* Last byte */
+            p = pt + 1;
+            break;
+            }
+          /* Not correct form; fall through */
+          }
+
+        /* Ordinary \x */
+
         c = 0;
         while (i++ < 2 && isxdigit(*p))
           {
@@ -876,12 +1027,12 @@ while (!done)
             {
             fprintf(outfile, "%2d: ", (int)i);
             pchars(dbuffer + pmatch[i].rm_so,
-              pmatch[i].rm_eo - pmatch[i].rm_so);
+              pmatch[i].rm_eo - pmatch[i].rm_so, utf8);
             fprintf(outfile, "\n");
             if (i == 0 && do_showrest)
               {
               fprintf(outfile, " 0+ ");
-              pchars(dbuffer + pmatch[i].rm_eo, len - pmatch[i].rm_eo);
+              pchars(dbuffer + pmatch[i].rm_eo, len - pmatch[i].rm_eo, utf8);
               fprintf(outfile, "\n");
               }
             }
@@ -931,14 +1082,14 @@ while (!done)
           else
             {
             fprintf(outfile, "%2d: ", i/2);
-            pchars(bptr + offsets[i], offsets[i+1] - offsets[i]);
+            pchars(bptr + offsets[i], offsets[i+1] - offsets[i], utf8);
             fprintf(outfile, "\n");
             if (i == 0)
               {
               if (do_showrest)
                 {
                 fprintf(outfile, " 0+ ");
-                pchars(bptr + offsets[i+1], len - offsets[i+1]);
+                pchars(bptr + offsets[i+1], len - offsets[i+1], utf8);
                 fprintf(outfile, "\n");
                 }
               }
@@ -971,7 +1122,8 @@ while (!done)
             else
               {
               fprintf(outfile, "%2dG %s (%d)\n", i, substring, rc);
-              free((void *)substring);
+              /* free((void *)substring); */
+              pcre_free_substring(substring);
               }
             }
           }
@@ -989,7 +1141,8 @@ while (!done)
               fprintf(outfile, "%2dL %s\n", i, stringlist[i]);
             if (stringlist[i] != NULL)
               fprintf(outfile, "string list not terminated by NULL\n");
-            free((void *)stringlist);
+            /* free((void *)stringlist); */
+            pcre_free_substring_list(stringlist);
             }
           }
         }
diff --git a/perltest b/perltest
index 1e96c79..e6f7974 100755
--- a/perltest
+++ b/perltest
@@ -9,7 +9,7 @@
 sub pchars {
 my($t) = "";
 
-foreach $c (split(//, @_[0]))
+foreach $c (split(//, $_[0]))
   {
   if (ord $c >= 32 && ord $c < 127) { $t .= $c; }
     else { $t .= sprintf("\\x%02x", ord $c); }
diff --git a/perltest8 b/perltest8
new file mode 100755
index 0000000..2fe522d
--- /dev/null
+++ b/perltest8
@@ -0,0 +1,208 @@
+#! /usr/bin/perl
+
+# Program for testing regular expressions with perl to check that PCRE handles
+# them the same. This is the version that supports /8 for UTF-8 testing. It
+# requires at least Perl 5.6.
+
+
+# Function for turning a string into a string of printing chars. There are
+# currently problems with UTF-8 strings; this fudges round them.
+
+sub pchars {
+my($t) = "";
+
+if ($utf8)
+  {
+  use utf8;
+  @p = unpack('U*', $_[0]);
+  foreach $c (@p)
+    {
+    if ($c >= 32 && $c < 127) { $t .= chr $c; }
+      else { $t .= sprintf("\\x{%02x}", $c); }
+    }
+  }
+
+else
+  {
+  foreach $c (split(//, $_[0]))
+    {
+    if (ord $c >= 32 && ord $c < 127) { $t .= $c; }
+      else { $t .= sprintf("\\x%02x", ord $c); }
+    }
+  }
+
+$t;
+}
+
+
+
+# Read lines from named file or stdin and write to named file or stdout; lines
+# consist of a regular expression, in delimiters and optionally followed by
+# options, followed by a set of test data, terminated by an empty line.
+
+# Sort out the input and output files
+
+if (@ARGV > 0)
+  {
+  open(INFILE, "<$ARGV[0]") || die "Failed to open $ARGV[0]\n";
+  $infile = "INFILE";
+  }
+else { $infile = "STDIN"; }
+
+if (@ARGV > 1)
+  {
+  open(OUTFILE, ">$ARGV[1]") || die "Failed to open $ARGV[1]\n";
+  $outfile = "OUTFILE";
+  }
+else { $outfile = "STDOUT"; }
+
+printf($outfile "Perl $] Regular Expressions\n\n");
+
+# Main loop
+
+NEXT_RE:
+for (;;)
+  {
+  printf "  re> " if $infile eq "STDIN";
+  last if ! ($_ = <$infile>);
+  printf $outfile "$_" if $infile ne "STDIN";
+  next if ($_ eq "");
+
+  $pattern = $_;
+
+  while ($pattern !~ /^\s*(.).*\1/s)
+    {
+    printf "    > " if $infile eq "STDIN";
+    last if ! ($_ = <$infile>);
+    printf $outfile "$_" if $infile ne "STDIN";
+    $pattern .= $_;
+    }
+
+   chomp($pattern);
+   $pattern =~ s/\s+$//;
+
+  # The private /+ modifier means "print $' afterwards".
+
+  $showrest = ($pattern =~ s/\+(?=[a-z]*$)//);
+
+  # The private /8 modifier means "operate in UTF-8". Currently, Perl
+  # has bugs that we try to work around using this flag.
+
+  $utf8 = ($pattern =~ s/8(?=[a-z]*$)//);
+
+  # Check that the pattern is valid
+
+  if ($utf8)
+    {
+    use utf8;
+    eval "\$_ =~ ${pattern}";
+    }
+  else
+    {
+    eval "\$_ =~ ${pattern}";
+    }
+
+  if ($@)
+    {
+    printf $outfile "Error: $@";
+    next NEXT_RE;
+    }
+
+  # If the /g modifier is present, we want to put a loop round the matching;
+  # otherwise just a single "if".
+
+  $cmd = ($pattern =~ /g[a-z]*$/)? "while" : "if";
+
+  # If the pattern is actually the null string, Perl uses the most recently
+  # executed (and successfully compiled) regex is used instead. This is a
+  # nasty trap for the unwary! The PCRE test suite does contain null strings
+  # in places - if they are allowed through here all sorts of weird and
+  # unexpected effects happen. To avoid this, we replace such patterns with
+  # a non-null pattern that has the same effect.
+
+  $pattern = "/(?#)/$2" if ($pattern =~ /^(.)\1(.*)$/);
+
+  # Read data lines and test them
+
+  for (;;)
+    {
+    printf "data> " if $infile eq "STDIN";
+    last NEXT_RE if ! ($_ = <$infile>);
+    chomp;
+    printf $outfile "$_\n" if $infile ne "STDIN";
+
+    s/\s+$//;
+    s/^\s+//;
+
+    last if ($_ eq "");
+
+    $x = eval "\"$_\"";   # To get escapes processed
+
+    # Empty array for holding results, then do the matching.
+
+    @subs = ();
+
+    $pushes = "push \@subs,\$&;" .
+         "push \@subs,\$1;" .
+         "push \@subs,\$2;" .
+         "push \@subs,\$3;" .
+         "push \@subs,\$4;" .
+         "push \@subs,\$5;" .
+         "push \@subs,\$6;" .
+         "push \@subs,\$7;" .
+         "push \@subs,\$8;" .
+         "push \@subs,\$9;" .
+         "push \@subs,\$10;" .
+         "push \@subs,\$11;" .
+         "push \@subs,\$12;" .
+         "push \@subs,\$13;" .
+         "push \@subs,\$14;" .
+         "push \@subs,\$15;" .
+         "push \@subs,\$16;" .
+         "push \@subs,\$'; }";
+
+    if ($utf8)
+      {
+      use utf8;
+      eval "${cmd} (\$x =~ ${pattern}) {" . $pushes;
+      }
+    else
+      {
+      eval "${cmd} (\$x =~ ${pattern}) {" . $pushes;
+      }
+
+    if ($@)
+      {
+      printf $outfile "Error: $@\n";
+      next NEXT_RE;
+      }
+    elsif (scalar(@subs) == 0)
+      {
+      printf $outfile "No match\n";
+      }
+    else
+      {
+      while (scalar(@subs) != 0)
+        {
+        printf $outfile (" 0: %s\n", &pchars($subs[0]));
+        printf $outfile (" 0+ %s\n", &pchars($subs[17])) if $showrest;
+        $last_printed = 0;
+        for ($i = 1; $i <= 16; $i++)
+          {
+          if (defined $subs[$i])
+            {
+            while ($last_printed++ < $i-1)
+              { printf $outfile ("%2d: <unset>\n", $last_printed); }
+            printf $outfile ("%2d: %s\n", $i, &pchars($subs[$i]));
+            $last_printed = $i;
+            }
+          }
+        splice(@subs, 0, 18);
+        }
+      }
+    }
+  }
+
+printf $outfile "\n";
+
+# End
diff --git a/testdata/testinput1 b/testdata/testinput1
index d72a2c5..806a0b1 100644
--- a/testdata/testinput1
+++ b/testdata/testinput1
@@ -1898,4 +1898,24 @@
 //g
     abc
 
-/ End of test input /
+/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
+  <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+
+/a[^a]b/
+    acb
+    a\nb
+    
+/a.b/
+    acb
+    *** Failers 
+    a\nb   
+    
+/a[^a]b/s
+    acb
+    a\nb  
+    
+/a.b/s
+    acb
+    a\nb  
+
+/ End of testinput1 /
diff --git a/testdata/testinput2 b/testdata/testinput2
index 1d9504c..db9cd02 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -40,8 +40,6 @@
 
 /[\B]/
 
-/[a-\w]/
-
 /[z-a]/
 
 /^*/
@@ -707,4 +705,6 @@
     Ab
     AB        
 
-/ End of test input /
+/[\200-\410]/
+
+/ End of testinput2 /
diff --git a/testdata/testinput3 b/testdata/testinput3
index 67d39f3..d3bd74f 100644
--- a/testdata/testinput3
+++ b/testdata/testinput3
@@ -1707,4 +1707,18 @@
 /a*/g
     abbab
 
-/ End of test input /       
+/^[a-\d]/
+    abcde
+    -things
+    0digit
+    *** Failers
+    bcdef    
+
+/^[\d-a]/
+    abcde
+    -things
+    0digit
+    *** Failers
+    bcdef    
+
+/ End of testinput3 /       
diff --git a/testdata/testinput4 b/testdata/testinput4
index c23b52a..f287896 100644
--- a/testdata/testinput4
+++ b/testdata/testinput4
@@ -62,3 +62,4 @@
     *** Failers 
     �cole
 
+/ End of testinput4 /
diff --git a/testdata/testinput5 b/testdata/testinput5
new file mode 100644
index 0000000..d66cfbd
--- /dev/null
+++ b/testdata/testinput5
@@ -0,0 +1,118 @@
+/-- Because of problems with Perl 5.6 in handling UTF-8 vs non UTF-8 --/
+/-- strings automatically, do not use the \x{} construct except with --/
+/-- patterns that have the /8 option set, and don't use them without! --/
+
+/a.b/8
+    acb
+    a\x7fb
+    a\x{100}b 
+    *** Failers
+    a\nb  
+
+/a(.{3})b/8
+    a\x{4000}xyb 
+    a\x{4000}\x7fyb 
+    a\x{4000}\x{100}yb 
+    *** Failers
+    a\x{4000}b 
+    ac\ncb 
+
+/a(.*?)(.)/
+    a\xc0\x88b
+
+/a(.*?)(.)/8
+    a\x{100}b
+
+/a(.*)(.)/
+    a\xc0\x88b
+
+/a(.*)(.)/8
+    a\x{100}b
+
+/a(.)(.)/
+    a\xc0\x92bcd
+
+/a(.)(.)/8
+    a\x{240}bcd
+
+/a(.?)(.)/
+    a\xc0\x92bcd
+
+/a(.?)(.)/8
+    a\x{240}bcd
+
+/a(.??)(.)/
+    a\xc0\x92bcd
+
+/a(.??)(.)/8
+    a\x{240}bcd
+
+/a(.{3})b/8
+    a\x{1234}xyb 
+    a\x{1234}\x{4321}yb 
+    a\x{1234}\x{4321}\x{3412}b 
+    *** Failers
+    a\x{1234}b 
+    ac\ncb 
+
+/a(.{3,})b/8
+    a\x{1234}xyb 
+    a\x{1234}\x{4321}yb 
+    a\x{1234}\x{4321}\x{3412}b 
+    axxxxbcdefghijb 
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+    *** Failers
+    a\x{1234}b 
+
+/a(.{3,}?)b/8
+    a\x{1234}xyb 
+    a\x{1234}\x{4321}yb 
+    a\x{1234}\x{4321}\x{3412}b 
+    axxxxbcdefghijb 
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+    *** Failers
+    a\x{1234}b 
+
+/a(.{3,5})b/8
+    a\x{1234}xyb 
+    a\x{1234}\x{4321}yb 
+    a\x{1234}\x{4321}\x{3412}b 
+    axxxxbcdefghijb 
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+    axbxxbcdefghijb 
+    axxxxxbcdefghijb 
+    *** Failers
+    a\x{1234}b 
+    axxxxxxbcdefghijb 
+
+/a(.{3,5}?)b/8
+    a\x{1234}xyb 
+    a\x{1234}\x{4321}yb 
+    a\x{1234}\x{4321}\x{3412}b 
+    axxxxbcdefghijb 
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+    axbxxbcdefghijb 
+    axxxxxbcdefghijb 
+    *** Failers
+    a\x{1234}b 
+    axxxxxxbcdefghijb 
+
+/^[a\x{c0}]/8
+    *** Failers
+    \x{100}
+
+/(?<=aXb)cd/8
+    aXbcd
+
+/(?<=a\x{100}b)cd/8
+    a\x{100}bcd
+
+/(?<=a\x{100000}b)cd/8
+    a\x{100000}bcd
+    
+/(?:\x{100}){3}b/8
+    \x{100}\x{100}\x{100}b
+    *** Failers 
+    \x{100}\x{100}b
+
+/ End of testinput5 /
diff --git a/testdata/testinput6 b/testdata/testinput6
new file mode 100644
index 0000000..1ccaa0d
--- /dev/null
+++ b/testdata/testinput6
@@ -0,0 +1,52 @@
+/\x{100}/8DM
+
+/\x{1000}/8DM
+
+/\x{10000}/8DM
+
+/\x{100000}/8DM
+
+/\x{1000000}/8DM
+
+/\x{4000000}/8DM
+
+/\x{7fffFFFF}/8DM
+
+/[\x{ff}]/8DM
+
+/[\x{100}]/8DM
+
+/\x{ffffffff}/8
+
+/\x{100000000}/8
+
+/^\x{100}a\x{1234}/8
+    \x{100}a\x{1234}bcd
+
+/\x80/8D
+
+/\xff/8D
+
+/-- These tests are here rather than in testinput5 because Perl 5.6 has --/
+/-- some problems with UTF-8 support, in the area of \x{..} where the   --/
+/-- value is < 255. It grumbles about invalid UTF-8 strings.            --/
+
+/^[a\x{c0}]b/8
+    \x{c0}b
+    
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/^([a\x{c0}]*)aa/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/^([a\x{c0}]*)a\x{c0}/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/ End of testinput6 /
diff --git a/testdata/testoutput1 b/testdata/testoutput1
index 3ead056..145487d 100644
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@@ -1,4 +1,4 @@
-PCRE version 3.2 12-May-2000
+PCRE version 3.3 01-Aug-2000
 
 /the quick brown fox/
     the quick brown fox
@@ -2920,5 +2920,46 @@ No match
  0: 
  0: 
 
-/ End of test input /
+/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
+  <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+ 0: <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+ 1:  BGCOLOR='#DBE9E9'
+ 2:  align=left valign=top
+ 3: 43.
+ 4: <a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)
+ 5: 
+ 6: 
+ 7: <unset>
+ 8:  align=left valign=top
+ 9: Lega lstaff.com
+10:  align=left valign=top
+11: CA - Statewide
+
+/a[^a]b/
+    acb
+ 0: acb
+    a\nb
+ 0: a\x0ab
+    
+/a.b/
+    acb
+ 0: acb
+    *** Failers 
+No match
+    a\nb   
+No match
+    
+/a[^a]b/s
+    acb
+ 0: acb
+    a\nb  
+ 0: a\x0ab
+    
+/a.b/s
+    acb
+ 0: acb
+    a\nb  
+ 0: a\x0ab
+
+/ End of testinput1 /
 
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index ba8cf0e..a34394c 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -1,4 +1,4 @@
-PCRE version 3.2 12-May-2000
+PCRE version 3.3 01-Aug-2000
 
 /(a)b|/
 Capturing subpattern count = 1
@@ -94,9 +94,6 @@ Failed: missing terminating ] for character class at offset 5
 /[\B]/
 Failed: invalid escape sequence in character class at offset 2
 
-/[a-\w]/
-Failed: invalid escape sequence in character class at offset 4
-
 /[z-a]/
 Failed: range out of order in character class at offset 3
 
@@ -2064,7 +2061,10 @@ No match
     AB        
 No match
 
-/ End of test input /
+/[\200-\410]/
+Failed: range out of order in character class at offset 9
+
+/ End of testinput2 /
 Capturing subpattern count = 0
 No options
 First char = ' '
diff --git a/testdata/testoutput3 b/testdata/testoutput3
index 0269f87..784cf09 100644
--- a/testdata/testoutput3
+++ b/testdata/testoutput3
@@ -1,4 +1,4 @@
-PCRE version 3.2 12-May-2000
+PCRE version 3.3 01-Aug-2000
 
 /(?<!bar)foo/
     foo
@@ -2963,5 +2963,29 @@ No match
  0: 
  0: 
 
-/ End of test input /       
+/^[a-\d]/
+    abcde
+ 0: a
+    -things
+ 0: -
+    0digit
+ 0: 0
+    *** Failers
+No match
+    bcdef    
+No match
+
+/^[\d-a]/
+    abcde
+ 0: a
+    -things
+ 0: -
+    0digit
+ 0: 0
+    *** Failers
+No match
+    bcdef    
+No match
+
+/ End of testinput3 /       
 
diff --git a/testdata/testoutput4 b/testdata/testoutput4
index d285224..b6d2be2 100644
--- a/testdata/testoutput4
+++ b/testdata/testoutput4
@@ -1,4 +1,4 @@
-PCRE version 3.2 12-May-2000
+PCRE version 3.3 01-Aug-2000
 
 /^[\w]+/
     *** Failers
@@ -112,4 +112,5 @@ No match
     �cole
 No match
 
+/ End of testinput4 /
 
diff --git a/testdata/testoutput5 b/testdata/testoutput5
new file mode 100644
index 0000000..83bf1d8
--- /dev/null
+++ b/testdata/testoutput5
@@ -0,0 +1,242 @@
+PCRE version 3.3 01-Aug-2000
+
+/-- Because of problems with Perl 5.6 in handling UTF-8 vs non UTF-8 --/
+/-- strings automatically, do not use the \x{} construct except with --/
+No match
+/-- patterns that have the /8 option set, and don't use them without! --/
+No match
+
+/a.b/8
+    acb
+ 0: acb
+    a\x7fb
+ 0: a\x{7f}b
+    a\x{100}b 
+ 0: a\x{100}b
+    *** Failers
+No match
+    a\nb  
+No match
+
+/a(.{3})b/8
+    a\x{4000}xyb 
+ 0: a\x{4000}xyb
+ 1: \x{4000}xy
+    a\x{4000}\x7fyb 
+ 0: a\x{4000}\x{7f}yb
+ 1: \x{4000}\x{7f}y
+    a\x{4000}\x{100}yb 
+ 0: a\x{4000}\x{100}yb
+ 1: \x{4000}\x{100}y
+    *** Failers
+No match
+    a\x{4000}b 
+No match
+    ac\ncb 
+No match
+
+/a(.*?)(.)/
+    a\xc0\x88b
+ 0: a\xc0
+ 1: 
+ 2: \xc0
+
+/a(.*?)(.)/8
+    a\x{100}b
+ 0: a\x{100}
+ 1: 
+ 2: \x{100}
+
+/a(.*)(.)/
+    a\xc0\x88b
+ 0: a\xc0\x88b
+ 1: \xc0\x88
+ 2: b
+
+/a(.*)(.)/8
+    a\x{100}b
+ 0: a\x{100}b
+ 1: \x{100}
+ 2: b
+
+/a(.)(.)/
+    a\xc0\x92bcd
+ 0: a\xc0\x92
+ 1: \xc0
+ 2: \x92
+
+/a(.)(.)/8
+    a\x{240}bcd
+ 0: a\x{240}b
+ 1: \x{240}
+ 2: b
+
+/a(.?)(.)/
+    a\xc0\x92bcd
+ 0: a\xc0\x92
+ 1: \xc0
+ 2: \x92
+
+/a(.?)(.)/8
+    a\x{240}bcd
+ 0: a\x{240}b
+ 1: \x{240}
+ 2: b
+
+/a(.??)(.)/
+    a\xc0\x92bcd
+ 0: a\xc0
+ 1: 
+ 2: \xc0
+
+/a(.??)(.)/8
+    a\x{240}bcd
+ 0: a\x{240}
+ 1: 
+ 2: \x{240}
+
+/a(.{3})b/8
+    a\x{1234}xyb 
+ 0: a\x{1234}xyb
+ 1: \x{1234}xy
+    a\x{1234}\x{4321}yb 
+ 0: a\x{1234}\x{4321}yb
+ 1: \x{1234}\x{4321}y
+    a\x{1234}\x{4321}\x{3412}b 
+ 0: a\x{1234}\x{4321}\x{3412}b
+ 1: \x{1234}\x{4321}\x{3412}
+    *** Failers
+No match
+    a\x{1234}b 
+No match
+    ac\ncb 
+No match
+
+/a(.{3,})b/8
+    a\x{1234}xyb 
+ 0: a\x{1234}xyb
+ 1: \x{1234}xy
+    a\x{1234}\x{4321}yb 
+ 0: a\x{1234}\x{4321}yb
+ 1: \x{1234}\x{4321}y
+    a\x{1234}\x{4321}\x{3412}b 
+ 0: a\x{1234}\x{4321}\x{3412}b
+ 1: \x{1234}\x{4321}\x{3412}
+    axxxxbcdefghijb 
+ 0: axxxxbcdefghijb
+ 1: xxxxbcdefghij
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+ 0: a\x{1234}\x{4321}\x{3412}\x{3421}b
+ 1: \x{1234}\x{4321}\x{3412}\x{3421}
+    *** Failers
+No match
+    a\x{1234}b 
+No match
+
+/a(.{3,}?)b/8
+    a\x{1234}xyb 
+ 0: a\x{1234}xyb
+ 1: \x{1234}xy
+    a\x{1234}\x{4321}yb 
+ 0: a\x{1234}\x{4321}yb
+ 1: \x{1234}\x{4321}y
+    a\x{1234}\x{4321}\x{3412}b 
+ 0: a\x{1234}\x{4321}\x{3412}b
+ 1: \x{1234}\x{4321}\x{3412}
+    axxxxbcdefghijb 
+ 0: axxxxb
+ 1: xxxx
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+ 0: a\x{1234}\x{4321}\x{3412}\x{3421}b
+ 1: \x{1234}\x{4321}\x{3412}\x{3421}
+    *** Failers
+No match
+    a\x{1234}b 
+No match
+
+/a(.{3,5})b/8
+    a\x{1234}xyb 
+ 0: a\x{1234}xyb
+ 1: \x{1234}xy
+    a\x{1234}\x{4321}yb 
+ 0: a\x{1234}\x{4321}yb
+ 1: \x{1234}\x{4321}y
+    a\x{1234}\x{4321}\x{3412}b 
+ 0: a\x{1234}\x{4321}\x{3412}b
+ 1: \x{1234}\x{4321}\x{3412}
+    axxxxbcdefghijb 
+ 0: axxxxb
+ 1: xxxx
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+ 0: a\x{1234}\x{4321}\x{3412}\x{3421}b
+ 1: \x{1234}\x{4321}\x{3412}\x{3421}
+    axbxxbcdefghijb 
+ 0: axbxxb
+ 1: xbxx
+    axxxxxbcdefghijb 
+ 0: axxxxxb
+ 1: xxxxx
+    *** Failers
+No match
+    a\x{1234}b 
+No match
+    axxxxxxbcdefghijb 
+No match
+
+/a(.{3,5}?)b/8
+    a\x{1234}xyb 
+ 0: a\x{1234}xyb
+ 1: \x{1234}xy
+    a\x{1234}\x{4321}yb 
+ 0: a\x{1234}\x{4321}yb
+ 1: \x{1234}\x{4321}y
+    a\x{1234}\x{4321}\x{3412}b 
+ 0: a\x{1234}\x{4321}\x{3412}b
+ 1: \x{1234}\x{4321}\x{3412}
+    axxxxbcdefghijb 
+ 0: axxxxb
+ 1: xxxx
+    a\x{1234}\x{4321}\x{3412}\x{3421}b 
+ 0: a\x{1234}\x{4321}\x{3412}\x{3421}b
+ 1: \x{1234}\x{4321}\x{3412}\x{3421}
+    axbxxbcdefghijb 
+ 0: axbxxb
+ 1: xbxx
+    axxxxxbcdefghijb 
+ 0: axxxxxb
+ 1: xxxxx
+    *** Failers
+No match
+    a\x{1234}b 
+No match
+    axxxxxxbcdefghijb 
+No match
+
+/^[a\x{c0}]/8
+    *** Failers
+No match
+    \x{100}
+No match
+
+/(?<=aXb)cd/8
+    aXbcd
+ 0: cd
+
+/(?<=a\x{100}b)cd/8
+    a\x{100}bcd
+ 0: cd
+
+/(?<=a\x{100000}b)cd/8
+    a\x{100000}bcd
+ 0: cd
+    
+/(?:\x{100}){3}b/8
+    \x{100}\x{100}\x{100}b
+ 0: \x{100}\x{100}\x{100}b
+    *** Failers 
+No match
+    \x{100}\x{100}b
+No match
+
+/ End of testinput5 /
+
diff --git a/testdata/testoutput6 b/testdata/testoutput6
new file mode 100644
index 0000000..0fae289
--- /dev/null
+++ b/testdata/testoutput6
@@ -0,0 +1,185 @@
+PCRE version 3.3 01-Aug-2000
+
+/\x{100}/8DM
+Memory allocation (code space): 11
+------------------------------------------------------------------
+  0   7 Bra 0
+  3   2 \xc0\x88
+  7   7 Ket
+ 10     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 192
+Need char = 136
+
+/\x{1000}/8DM
+Memory allocation (code space): 12
+------------------------------------------------------------------
+  0   8 Bra 0
+  3   3 \xe0\x80\x84
+  8   8 Ket
+ 11     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 224
+Need char = 132
+
+/\x{10000}/8DM
+Memory allocation (code space): 13
+------------------------------------------------------------------
+  0   9 Bra 0
+  3   4 \xf0\x80\x80\x82
+  9   9 Ket
+ 12     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 240
+Need char = 130
+
+/\x{100000}/8DM
+Memory allocation (code space): 13
+------------------------------------------------------------------
+  0   9 Bra 0
+  3   4 \xf0\x80\x80\xa0
+  9   9 Ket
+ 12     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 240
+Need char = 160
+
+/\x{1000000}/8DM
+Memory allocation (code space): 14
+------------------------------------------------------------------
+  0  10 Bra 0
+  3   5 \xf8\x80\x80\x80\x90
+ 10  10 Ket
+ 13     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 248
+Need char = 144
+
+/\x{4000000}/8DM
+Memory allocation (code space): 15
+------------------------------------------------------------------
+  0  11 Bra 0
+  3   6 \xfc\x80\x80\x80\x80\x82
+ 11  11 Ket
+ 14     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 252
+Need char = 130
+
+/\x{7fffFFFF}/8DM
+Memory allocation (code space): 15
+------------------------------------------------------------------
+  0  11 Bra 0
+  3   6 \xfd\xbf\xbf\xbf\xbf\xbf
+ 11  11 Ket
+ 14     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 253
+Need char = 191
+
+/[\x{ff}]/8DM
+Memory allocation (code space): 40
+------------------------------------------------------------------
+  0   6 Bra 0
+  3   1 \xff
+  6   6 Ket
+  9     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 255
+No need char
+
+/[\x{100}]/8DM
+Memory allocation (code space): 40
+Failed: characters with values > 255 are not yet supported in classes at offset 7
+
+/\x{ffffffff}/8
+Failed: character value in \x{...} sequence is too large at offset 11
+
+/\x{100000000}/8
+Failed: character value in \x{...} sequence is too large at offset 12
+
+/^\x{100}a\x{1234}/8
+    \x{100}a\x{1234}bcd
+ 0: \x{100}a\x{1234}
+
+/\x80/8D
+------------------------------------------------------------------
+  0   7 Bra 0
+  3   2 \xc0\x84
+  7   7 Ket
+ 10     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 192
+Need char = 132
+
+/\xff/8D
+------------------------------------------------------------------
+  0   7 Bra 0
+  3   2 \xdf\x87
+  7   7 Ket
+ 10     End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: utf8
+First char = 223
+Need char = 135
+
+/-- These tests are here rather than in testinput5 because Perl 5.6 has --/
+/-- some problems with UTF-8 support, in the area of \x{..} where the   --/
+No match
+/-- value is < 255. It grumbles about invalid UTF-8 strings.            --/
+No match
+
+/^[a\x{c0}]b/8
+    \x{c0}b
+ 0: \x{c0}b
+    
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aa
+ 1: a\x{c0}
+
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aa
+ 1: a\x{c0}
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}aa
+ 1: a\x{c0}a\x{c0}
+
+/^([a\x{c0}]*)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aaaa
+ 1: a\x{c0}aa
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}aaa
+ 1: a\x{c0}a\x{c0}a
+
+/^([a\x{c0}]*)a\x{c0}/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}
+ 1: 
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}
+ 1: a\x{c0}
+
+/ End of testinput6 /
+
author	nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-02-24 21:39:33 +0000
committer	nigel <nigel@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2007-02-24 21:39:33 +0000
commit	722283cf906c849b43a73af9527627e0fd2a3e8d (patch)
tree	a6d41530464f8772bddde9ff3770c2b29b81f7ce
parent	b82aaed025b2fb55a381b51a3cf13a06c2e8ceff (diff)
download	pcre-722283cf906c849b43a73af9527627e0fd2a3e8d.tar.gz