summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--AUTHORS2
-rw-r--r--COPYING2
-rw-r--r--ChangeLog8
-rw-r--r--LICENCE2
-rw-r--r--NEWS12
-rw-r--r--NON-UNIX-USE5
-rw-r--r--README129
-rw-r--r--doc/pcrestack.34
-rw-r--r--maint/README249
9 files changed, 330 insertions, 83 deletions
diff --git a/AUTHORS b/AUTHORS
index 8061f5f..36e4aaf 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -6,7 +6,7 @@ Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
-Cambridge, England. Phone: +44 1223 334714.
+Cambridge, England.
Copyright (c) 1997-2007 University of Cambridge
All rights reserved
diff --git a/COPYING b/COPYING
index d61389d..4baa7d8 100644
--- a/COPYING
+++ b/COPYING
@@ -20,7 +20,7 @@ Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
-Cambridge, England. Phone: +44 1223 334714.
+Cambridge, England.
Copyright (c) 1997-2007 University of Cambridge
All rights reserved.
diff --git a/ChangeLog b/ChangeLog
index 8aebb41..d23d4cd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,11 +1,11 @@
ChangeLog for PCRE
------------------
-Version 7.1 05-Mar-07
+Version 7.1 12-Mar-07
---------------------
1. Applied Bob Rossi and Daniel G's patches to convert the build system to one
- that is more "standard", making use of automake and other autotools. There
+ that is more "standard", making use of automake and other Autotools. There
is some re-arrangement of the files and adjustment of comments consequent
on this.
@@ -111,10 +111,10 @@ Version 7.1 05-Mar-07
failed for link sizes other than 2. Rather than cut the whole test out,
I have added a new /Z option to pcretest that replaces the length and
offset values with spaces. This is now used to make test 3 independent
- of link size.
+ of link size. (Test 2 will be tidied up later.)
14. If erroroffset was passed as NULL to pcre_compile, it provoked a
- segmentation fault instead of giving the appropriate error message.
+ segmentation fault instead of returning the appropriate error message.
Version 7.0 19-Dec-06
diff --git a/LICENCE b/LICENCE
index d61389d..4baa7d8 100644
--- a/LICENCE
+++ b/LICENCE
@@ -20,7 +20,7 @@ Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
-Cambridge, England. Phone: +44 1223 334714.
+Cambridge, England.
Copyright (c) 1997-2007 University of Cambridge
All rights reserved.
diff --git a/NEWS b/NEWS
index 92768ea..da936fe 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,17 @@
News about PCRE releases
------------------------
-Release 7.0 23-Nov-06
+
+Release 7.1 12-Mar-07
+---------------------
+
+There are no new features in this release. A few bugs are fixed (see ChangeLog
+for details), but the major change is a complete re-implementation of the build
+system. This now has full Autotools support and so is now "standard" in some
+sense. It should help with compiling PCRE in a wide variety of environments.
+
+
+Release 7.0 19-Dec-06
---------------------
This release has a new major number because there have been some internal
diff --git a/NON-UNIX-USE b/NON-UNIX-USE
index df0b255..7c57ff8 100644
--- a/NON-UNIX-USE
+++ b/NON-UNIX-USE
@@ -3,8 +3,7 @@ Compiling PCRE on non-Unix systems
I (Philip Hazel) have no knowledge of Windows or VMS sytems and how their
libraries work. The items in the PCRE distribution and Makefile that relate to
-anything other than Unix-like systems have been contributed by PCRE users and
-are untested by me.
+anything other than Unix-like systems are untested by me.
There are some other comments and files in the Contrib directory on the ftp
site that you may find useful, although a lot of them are now out-of-date. See
@@ -30,7 +29,7 @@ The following are generic comments about building the PCRE C library "by hand".
An alternative approach is not to edit config.h, but to use -D on the
compiler command line to make any changes that you need.
-(2) Copy or rename the file pcre.h.generic to pcre.h.
+(2) Copy or rename the file pcre.h.generic as pcre.h.
(3) Compile dftables.c as a stand-alone program, and then run it with
the single argument "pcre_chartables.c". This generates a set of standard
diff --git a/README b/README
index f0ce818..91fa161 100644
--- a/README
+++ b/README
@@ -16,10 +16,10 @@ The contents of this README file are:
Documentation for PCRE
Contributions by users of PCRE
Building PCRE on non-Unix systems
- Building PCRE on a Unix-like system
- Retrieving configuration information on a Unix-like system
+ Building PCRE on Unix-like systems
+ Retrieving configuration information on Unix-like systems
Shared libraries on Unix-like systems
- Cross-compiling on a Unix-like system
+ Cross-compiling on Unix-like systems
Using HP's ANSI C++ compiler (aCC)
Making new tarballs
Testing PCRE
@@ -53,20 +53,20 @@ ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
up the POSIX functions of the same name from the other library.
One way of avoiding this confusion is to compile PCRE with the addition of
--Dregcomp=PCREregcomp (and similarly for the other functions) to the compiler
-flags (CFLAGS if you are using "configure" -- see below). This has the effect
-of renaming the functions so that the names no longer clash. Of course, you
-have to do the same thing for your applications, or write them using the new
-names.
+-Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
+compiler flags (CFLAGS if you are using "configure" -- see below). This has the
+effect of renaming the functions so that the names no longer clash. Of course,
+you have to do the same thing for your applications, or write them using the
+new names.
Documentation for PCRE
----------------------
-If you install PCRE in the normal way, you will end up with an installed set of
-man pages whose names all start with "pcre". The one that is just called "pcre"
-lists all the others. In addition to these man pages, the PCRE documentation is
-supplied in two other forms:
+If you install PCRE in the normal way on a Unix-like system, you will end up
+with a set of man pages whose names all start with "pcre". The one that is just
+called "pcre" lists all the others. In addition to these man pages, the PCRE
+documentation is supplied in two other forms:
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
doc/pcretest.txt in the source distribution. The first of these is a
@@ -78,8 +78,8 @@ supplied in two other forms:
<prefix> is the installation prefix (defaulting to /usr/local).
2. A set of files containing all the documentation in HTML form, hyperlinked
- in various ways, and rooted in a file called index.html, is installed in
- the directory <prefix>/share/doc/pcre/html.
+ in various ways, and rooted in a file called index.html, is distributed in
+ doc/html and installed in <prefix>/share/doc/pcre/html.
Contributions by users of PCRE
@@ -89,28 +89,28 @@ You can find contributions from PCRE users in the directory
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
-where there is also a README file giving brief descriptions of what they are.
-Some are complete in themselves; others are pointers to URLs containing
-relevant files. Some of this material is likely to be well out-of-date. In
-particular, several of the contributions provide support for compiling PCRE on
-various flavours of Windows (I myself do not use Windows), but it is hoped that
-more Windows support will find its way into the standard distribution.
+There is a README file giving brief descriptions of what they are. Some are
+complete in themselves; others are pointers to URLs containing relevant files.
+Some of this material is likely to be well out-of-date. In particular, several
+of the contributions provide support for compiling PCRE on various flavours of
+Windows (I myself do not use Windows), but nowadays there is more Windows
+support in the standard distribution.
Building PCRE on non-Unix systems
---------------------------------
-For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
-the system supports the use of "configure" and "make" you may be able to build
-PCRE in the same way as for Unix-like systems.
+For a non-Unix system, please read the comments in the file NON-UNIX-USE,
+though if your system supports the use of "configure" and "make" you may be
+able to build PCRE in the same way as for Unix-like systems.
PCRE has been compiled on many different operating systems. It should be
straightforward to build PCRE on any system that has a Standard C compiler and
library, because it uses only Standard C functions.
-Building PCRE on a Unix-like system
------------------------------------
+Building PCRE on Unix-like systems
+----------------------------------
If you are using HP's ANSI C++ compiler (aCC), please see the special note
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
@@ -119,7 +119,7 @@ To build PCRE on a Unix-like system, first run the "configure" command from the
PCRE distribution directory, with your current directory set to the directory
where you want the files to be created. This command is a standard GNU
"autoconf" configuration script, for which generic instructions are supplied in
-INSTALL.
+the file INSTALL.
Most commonly, people build PCRE within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient. However,
@@ -191,8 +191,8 @@ library. You can read more about them in the pcrebuild man page.
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
- pcre_exec() can supply their own value. There is discussion on the pcreapi
- man page.
+ pcre_exec() can supply their own value. There is more discussion on the
+ pcreapi man page.
. There is a separate counter that limits the depth of recursive function calls
during a matching process. This also has a default of ten million, which is
@@ -207,23 +207,21 @@ library. You can read more about them in the pcrebuild man page.
. The default maximum compiled pattern size is around 64K. You can increase
this by adding --with-link-size=3 to the "configure" command. You can
increase it even more by setting --with-link-size=4, but this is unlikely
- ever to be necessary. If you build PCRE with an increased link size, test 2
- (and 5 if you are using UTF-8) will fail. Part of the output of these tests
- is a representation of the compiled pattern, and this changes with the link
- size.
+ ever to be necessary.
. You can build PCRE so that its internal match() function that is called from
- pcre_exec() does not call itself recursively. Instead, it uses blocks of data
- from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
- to save data that would otherwise be saved on the stack. To build PCRE like
- this, use
+ pcre_exec() does not call itself recursively. Instead, it uses memory blocks
+ obtained from the heap via the special functions pcre_stack_malloc() and
+ pcre_stack_free() to save data that would otherwise be saved on the stack. To
+ build PCRE like this, use
--disable-stack-for-recursion
on the "configure" command. PCRE runs more slowly in this mode, but it may be
necessary in environments with limited stack sizes. This applies only to the
pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
- use deeply nested recursion.
+ use deeply nested recursion. There is a discussion about stack sizes in the
+ pcrestack man page.
The "configure" script builds the following files for the basic C library:
@@ -236,9 +234,10 @@ The "configure" script builds the following files for the basic C library:
. RunTest is a script for running tests on the basic C library
. RunGrepTest is a script for running tests on the pcregrep command
-Versions of config.h and pcre.h are distributed in the PCRE tarballs. These are
-provided for the benefit of those who have to compile PCRE without the benefit
-of "configure". If you use "configure", the distributed copies are replaced.
+Versions of config.h and pcre.h are distributed in the PCRE tarballs under
+the names config.h.generic and pcre.h.generic. These are provided for the
+benefit of those who have to built PCRE without the benefit of "configure". If
+you use "configure", the .generic versions are not used.
If a C++ compiler is found, the following files are also built:
@@ -253,9 +252,10 @@ contains compiler output from tests that "configure" runs.
Once "configure" has run, you can run "make". It builds two libraries, called
libpcre and libpcreposix, a test program called pcretest, a demonstration
program called pcredemo, and the pcregrep command. If a C++ compiler was found
-on your system, it also builds the C++ wrapper library, which is called
+on your system, "make" also builds the C++ wrapper library, which is called
libpcrecpp, and some test programs called pcrecpp_unittest,
-pcre_scanner_unittest, and pcre_stringpiece_unittest.
+pcre_scanner_unittest, and pcre_stringpiece_unittest. Building the C++ wrapper
+can be disabled by adding --disable-cpp to the "configure" command.
The command "make check" runs all the appropriate tests. Details of the PCRE
tests are given below in a separate section of this document.
@@ -276,7 +276,7 @@ system. The following are installed (file names are all relative to the
Configuration information (lib/pkgconfig):
libpcre.pc
- libpcrecpp.ps (if C++ support is enabled)
+ libpcrecpp.pc (if C++ support is enabled)
Header files (include):
pcre.h
@@ -315,8 +315,8 @@ This removes all the files that "make install" installed. However, it does not
remove any directories, because these are often shared with other programs.
-Retrieving configuration information on a Unix-like system
-----------------------------------------------------------
+Retrieving configuration information on Unix-like systems
+---------------------------------------------------------
Running "make install" installs the command pcre-config, which can be used to
recall information about the PCRE configuration and installation. For example:
@@ -355,7 +355,7 @@ built. The programs pcretest and pcregrep are built to use these uninstalled
libraries (by means of wrapper scripts in the case of shared libraries). When
you use "make install" to install shared libraries, pcregrep and pcretest are
automatically re-built to use the newly installed shared libraries before being
-installed themselves. However, the versions left in the source directory still
+installed themselves. However, the versions left in the build directory still
use the uninstalled libraries.
To build PCRE using static libraries only you must use --disable-shared when
@@ -367,8 +367,8 @@ Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.
-Cross-compiling on a Unix-like system
--------------------------------------
+Cross-compiling on Unix-like systems
+------------------------------------
You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE for some other host. However, during the building
@@ -385,7 +385,7 @@ Using HP's ANSI C++ compiler (aCC)
----------------------------------
Unless C++ support is disabled by specifying the "--disable-cpp" option of the
-"configure" script, you *must* include the "-AA" option in the CXXFLAGS
+"configure" script, you must include the "-AA" option in the CXXFLAGS
environment variable in order for the C++ components to compile correctly.
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
@@ -409,17 +409,17 @@ the .txt and HTML forms of the documentation from the man pages.
Testing PCRE
------------
-To test PCRE on a Unix system, run the RunTest script that is created by the
-configuring process. There is also a script called RunGrepTest that tests the
-options of the pcregrep command. If the C++ wrapper library is build, three
-test programs called pcrecpp_unittest, pcre_scanner_unittest, and
+To test the basic PCRE library on a Unix system, run the RunTest script that is
+created by the configuring process. There is also a script called RunGrepTest
+that tests the options of the pcregrep command. If the C++ wrapper library is
+built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and
pcre_stringpiece_unittest are also built.
Both the scripts and all the program tests are run if you obey "make check" or
"make test". For other systems, see the instructions in NON-UNIX-USE.
The RunTest script runs the pcretest test program (which is documented in its
-own man page) on each of the testinput files (in the testdata directory) in
+own man page) on each of the testinput files in the testdata directory in
turn, and compares the output with the contents of the corresponding testoutput
files. A file called testtry is used to hold the main output from pcretest
(testsavedregex is also used as a working file). To run pcretest on just one of
@@ -435,7 +435,7 @@ version.
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
detection, and run-time flags that are specific to PCRE, as well as the POSIX
-wrapper API. It also uses the debugging flag to check some of the internals of
+wrapper API. It also uses the debugging flags to check some of the internals of
pcre_compile().
If you build PCRE with a locale setting that is not the standard C locale, the
@@ -470,8 +470,8 @@ commented in the script, can be be used.)
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.
-The sixth and test checks the support for Unicode character properties. It it
-not run automatically unless PCRE is built with Unicode property support. To to
+The sixth test checks the support for Unicode character properties. It it not
+run automatically unless PCRE is built with Unicode property support. To to
this you must set --enable-unicode-properties when running "configure".
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
@@ -483,12 +483,12 @@ automatically unless PCRE is build with the relevant support.
Character tables
----------------
-PCRE uses four tables for manipulating and identifying characters whose values
-are less than 256. The final argument of the pcre_compile() function is a
-pointer to a block of memory containing the concatenated tables. A call to
-pcre_maketables() can be used to generate a set of tables in the current
-locale. If the final argument for pcre_compile() is passed as NULL, a set of
-default tables that is built into the binary is used.
+For speed, PCRE uses four tables for manipulating and identifying characters
+whose code point values are less than 256. The final argument of the
+pcre_compile() function is a pointer to a block of memory containing the
+concatenated tables. A call to pcre_maketables() can be used to generate a set
+of tables in the current locale. If the final argument for pcre_compile() is
+passed as NULL, a set of default tables that is built into the binary is used.
The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
@@ -497,8 +497,7 @@ such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
sources. This means that the default C locale which is set for your system will
control the contents of these default tables. You can change the default tables
by editing chartables.c and then re-building PCRE. If you do this, you should
-probably also edit Makefile to ensure that the file doesn't ever get
-re-generated.
+take care to ensure that the file does not get automaticaly re-generated.
The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
diff --git a/doc/pcrestack.3 b/doc/pcrestack.3
index d732cdc..72952c6 100644
--- a/doc/pcrestack.3
+++ b/doc/pcrestack.3
@@ -52,7 +52,7 @@ frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
.sp
- ([^<]++|<(?!inet))
+ ([^<]++|<(?!inet))+
.sp
This uses very much less stack, because runs of characters that do not contain
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
@@ -129,6 +129,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 06 March 2007
+Last updated: 12 March 2007
Copyright (c) 1997-2007 University of Cambridge.
.fi
diff --git a/maint/README b/maint/README
index e2466df..73d76a2 100644
--- a/maint/README
+++ b/maint/README
@@ -1,6 +1,20 @@
+MAINTENANCE README FOR PCRE
+---------------------------
+
The files in the "maint" directory of the PCRE source contain data, scripts,
-and programs that are used for the maintenance of PCRE, but do not form part of
-the PCRE distribution tarballs.
+and programs that are used for the maintenance of PCRE, but which do not form
+part of the PCRE distribution tarballs. This document describes these files and
+also contains some notes for maintainers. Its contents are:
+
+ Files in the maint directory
+ Updating to a new Unicode release
+ Preparing for a PCRE release
+ Making a PCRE release
+ Long-term ideas (wish list)
+
+
+Files in the maint directory
+----------------------------
Builducptable A Perl script that creates the contents of the ucptable.h file
from two Unicode data files, which themselves are downloaded
@@ -15,7 +29,7 @@ Unicode.tables The files in this directory, Scripts.txt and UnicodeData.txt,
ucptest.c A short C program for testing the Unicode property functions in
pcre_ucp_searchfuncs.c, mainly useful after rebuilding the
- Unicode property table. Compile and run this in the "main"
+ Unicode property table. Compile and run this in the "maint"
directory.
ucptestdata A directory containing two files, testinput1 and testoutput1,
@@ -29,10 +43,235 @@ utf8.c A short, freestanding C program for converting a Unicode code
them as a UTF-8 character and outputs the equivalent code point
in hex.
+
+Updating to a new Unicode release
+---------------------------------
+
When there is a new release of Unicode, the files in Unicode.tables must be
refreshed from the web site, and the Buildupctable script can then be run to
generate a new version of ucptable.h. The ucptest program can be used to check
that the resulting table works properly, using the data files in ucptestdata to
check a number of test characters.
-
-****
+
+
+Preparing for a PCRE release
+----------------------------
+
+This section contains a checklist of things that I consult before building a
+distribution for a new release.
+
+. Ensure that the version number and version date are correct in configure.ac.
+
+. Run ./autogen.sh to ensure everything is up-to-date.
+
+. Compile and test with many different config options, and combinations of
+ options:
+
+ * Totally standard ./configure with no options
+ * --disable-shared
+ * --disable-static
+ * --enable-utf8
+ * --enable-unicode-properties
+ * --disable-cpp
+ * --with-link-size=3 (occasionally check with 4 as well)
+ * --disable-stack-for-recursion
+ * --enable-newline-is-any
+
+ I've never automated this, but perhaps I should. The newline testing could be
+ enhanced; at present, some tests fail unless plain LF is a newline.
+
+. Run perltest.pl on the test data for tests 1 and 4. The output should match
+ the PCRE test output, apart from the version identification at the top. The
+ other tests are not Perl-compatible (they use various special PCRE options).
+
+. Test on a number of different operating systems. In particular, at the moment
+ I can test on Solaris, using Sun's cc compiler (as a change from gcc). Adding
+ -xarch=v9 to the cc options does a 64-bit test, but it also needs -S 64 for
+ pcretest to increase the stack size for test 2. I also test on FreeBSD and
+ Linux (where I develop).
+
+. Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest
+ valgrind", though that takes quite a long time.
+
+. It can also useful to test with Electric Fence, though the fact that it
+ grumbles for missing free() calls can be a nuisance. (A missing free() in
+ pcretest is hardly a big problem.) To build with EF, use:
+
+ LIBS='/usr/lib/libefence.a -lpthread' with ./configure.
+
+ Then all normal runs use it to check for buffer overflow. Also run everything
+ with:
+
+ EF_PROTECT_BELOW=1 <whatever>
+
+ because there have been problems with lookbehinds that looked too far.
+
+. Test with the emulated memmove() function by undefining HAVE_MEMMOVE and
+ HAVE_BCOPY in config.h.
+
+. Documentation: check AUTHORS, COPYING, ChangeLog (check date), INSTALL,
+ LICENCE, NEWS (check date), NON-UNIX-USE, and README. Many of these won't
+ need changing, but over the long term things do change.
+
+. Man pages: Check all man pages for \ not followed by e or f or " because
+ that indicates a markup error.
+
+
+Making a PCRE release
+---------------------
+
+Run PrepareRelease and commit the files that it changes (by removing trailing
+spaces). Then run "make dist" to create the tarballs and the zipball.
+
+Don't forget to update Freshmeat when the new release is out, and to tell
+webmaster@pcre.org and the mailing list.
+
+
+Future ideas (wish list)
+------------------------
+
+This section records a list of ideas so that they do not get forgotten. They
+vary enormously in their usefulness and potential for implementation. Some are
+very sensible; some are rather wacky. Some have been on this list for years;
+others are relatively new.
+
+. Optimization
+
+ There are always ideas for new optimizations so as to speed up pattern
+ matching. Most of them try to save work by recognizing a non-match without
+ having to scan all the possibilities. These are some that I've recorded:
+
+ * /((A{0,5}){0,5}){0,5}(something complex)/ on a non-matching string is very
+ slow, though Perl is fast. Can we speed up somehow? Convert to {0,125}?
+ OTOH, this is pathological - the user could easily fix it.
+
+ * Turn ={4} into ==== ? (for speed). I once did an experiment, and it seems
+ to have little effect, and maybe makes things worse.
+
+ * "Ends with literal string" - note that a single character doesn't gain much
+ over the existing "required byte" (reqbyte) feature that just saves one
+ byte.
+
+ * These probably need to go in study():
+
+ o Remember an initial string rather than just 1 char?
+
+ o A required byte from alternatives - not just the last char, but an
+ earlier one if common to all alternatives.
+
+ o Minimum length of subject needed.
+
+ o Friedl contains other ideas.
+
+. If Perl gets to a consistent state over the settings of capturing sub-
+ patterns inside repeats, see if we can match it. One example of the
+ difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE
+ leaves $2 set. In Perl, it's unset. Changing this in PCRE will be very hard
+ because I think it needs much more state to be remembered.
+
+. Perl 6 will be a revolution. Is it a revolution too far for PCRE?
+
+. Unicode
+
+ * Note that in Perl, \s matches \pZ and similarly for \d, \w and the POSIX
+ character classes. For the moment, I've chosen not to support this for
+ backward compatibility, for speed, and because it would be messy to
+ implement.
+
+ * A different approach to Unicode might be to use a typedef to do everything
+ in unsigned shorts instead of unsigned chars. Actually, we'd have to have a
+ new typedef to distinguish data from bits of compiled pattern that are in
+ bytes, I think. There would need to be conversion functions in and out. I
+ don't think this is particularly trivial - and anyway, Unicode now has
+ characters that need more than 16 bits, so is this at all sensible?
+
+ * There has been a request for direct support of 16-bit characters and
+ UTF-16. However, since Unicode is moving beyond purely 16-bit characters,
+ is this worth it at all? One possible way of handling 16-bit characters
+ would be to "load" them in the same way that UTF-8 characters are loaded.
+
+. Allow errorptr and erroroffset to be NULL. I don't like this idea.
+
+. Line endings:
+
+ * Option to use NUL as a line terminator in subject strings. This could now
+ be done relatively easily since the extension to support LF, CR, and CRLF.
+ If this is done, a suitable option for pcregrep is also required.
+
+. Option to provide the pattern with a length instead of with a NUL terminator.
+ This probably affects quite a few places in the code.
+
+. Catch SIGSEGV for stack overflows?
+
+. "Cut" as described in Jeffrey Friedl's book, p364: \v and \V. The definitions
+ aren't yet clear enough for me. \v flushes saved states so that no
+ backtracking to anything earlier can happen; \V says "no more bumpalong", but
+ does it fail the current match? As described in the book, these aren't really
+ "cut" as in Prolog, are they? NOTE: (a) PCRE once had "cut", but it was
+ removed when atomic groups were introduced. (b) Perl 5.10 has some (*PRUNE)
+ features -- see below.
+
+. A feature to suspend a match via a callout was once requested.
+
+. Option to convert results into character offsets and character lengths.
+
+. Option for pcregrep to scan only the start of a file. I am not keen - this is
+ the job of "head".
+
+. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
+ preceded by a blank line, instead of adding it to every matched line, and (b)
+ support --outputfile=name.
+
+. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 7.
+
+. Add a user pointer to pcre_malloc/free functions -- some option would be
+ needed to retain backward compatibility.
+
+. Define a union for the results from pcre_fullinfo().
+
+. Provide a "random access to the subject" facility so that the way in which it
+ is stored is independent of PCRE. For efficiency, it probably isn't possible
+ to switch this dynamically. It would have to be specified when PCRE was
+ compiled. PCRE would then call a function every time it wanted a character.
+
+. There are new (*PRUNE) facilities in Perl 5.10, some of which it might be
+ relatively easy to implement.
+
+. Also in Perl 5.10 are relative subroutine references (?&-1) and (?&+1) which
+ I didn't know about when I added some 5.10 features for PCRE 7.0. What about
+ (?(-1)... as a condition? That's an obvious extension, even if Perl 5.10
+ doesn't have it.
+
+. Wild thought: the ability to compile from PCRE's internal byte code to a real
+ FSM and a very fast (third) matcher to process the result. There would be
+ even more restrictions than for pcre_dfa_exec(), however. This is not easy.
+
+. Should pcretest have some private locale data, to avoid relying on the
+ available locales for the test data, since different OS have different ideas?
+ This won't be as thorough a test, but perhaps that doesn't really matter.
+
+. pcregrep: add -rs for a sorted recurse? Having to store file names and sort
+ them will of course slow it down.
+
+. Re-arrange test 2: take out the link-size dependent stuff for a separate test
+ that is run only when the link size *is* 2; leave in some non-numbered
+ debugging tests using the new /Z feature.
+
+. Stan Switzer's goto replacement for longjmp, which is apparently very slow on
+ OS-X. This is used when stack recursion is disabled. It would be worth doing
+ some timing tests on other OS.
+
+. Someone suggested --disable-callout to save code space when callouts are
+ never wanted. This seems rather marginal.
+
+. Automate some of the testing before release into a script that compiles with
+ different options and runs the tests in each case.
+
+. How about distributing a fixed pcre_chartables.c file and abandoning the
+ on-the-fly generation using dftables. This will make cross-compiling easier,
+ and in any case, locales are going out of fashion.
+
+Philip Hazel
+Email local part: ph10
+Email domain: cam.ac.uk
+Last updated: 12 March 2007