Allow fixed-length subroutine calls in lookbehinds.

git-svn-id: svn://vcs.exim.org/pcre/code/trunk@454 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-09-22 09:42:11 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2009-09-22 09:42:11 +0000
commit: 13ec83b84a6939e47ebabc1836caec7d94836896 (patch)
tree: 4590c85bd69ba6b50d8a741a3469a023edfc03fc /maint/README
parent: 20dd865c5c8f10036cda34b9870351b702399c08 (diff)
download: pcre-13ec83b84a6939e47ebabc1836caec7d94836896.tar.gz
1 files changed, 59 insertions, 24 deletions
diff --git a/maint/README b/maint/README
index 5168936..aa904d6 100644
--- a/maint/README
+++ b/maint/README
@@ -36,6 +36,8 @@ MultiStage2.py   A Python script that generates the file pcre_ucd.c from three
                  The generated file contains the tables for a 2-stage lookup
                  of Unicode properties.  
 
+README           This file.
+
 Unicode.tables   The files in this directory, DerivedGeneralCategory.txt, 
                  Scripts.txt and UnicodeData.txt, were downloaded from the
                  Unicode web site. They contain information about Unicode
@@ -62,16 +64,15 @@ Updating to a new Unicode release
 ---------------------------------
 
 When there is a new release of Unicode, the files in Unicode.tables must be
-refreshed from the web site. If the new version of Unicode adds new character 
+refreshed from the web site. If the new version of Unicode adds new character
 scripts, the source file ucp.h and both the MultiStage2.py and the
-GenerateUtt.py scripts must be edited to add the new names. Then the
-MultiStage2.py script can then be run to generate a new version of pcre_ucd.c
-and the GenerateUtt.py can be run to generate the tricky tables for inclusion
-in pcre_tables.c.
+GenerateUtt.py scripts must be edited to add the new names. Then MultiStage2.py
+can be run to generate a new version of pcre_ucd.c, and GenerateUtt.py can be
+run to generate the tricky tables for inclusion in pcre_tables.c.
 
-The ucptest program can then be compiled and used to check that the new tables
-in pcre_ucd.c work properly, using the data files in ucptestdata to check a
-number of test characters.
+The ucptest program can be compiled and used to check that the new tables in
+pcre_ucd.c work properly, using the data files in ucptestdata to check a number
+of test characters.
 
 
 Preparing for a PCRE release
@@ -80,8 +81,7 @@ Preparing for a PCRE release
 This section contains a checklist of things that I consult before building a
 distribution for a new release.
 
-. Ensure that the version number and version date are correct in configure.ac,
-  ChangeLog, and NEWS.
+. Ensure that the version number and version date are correct in configure.ac.
   
 . If new build options have been added, ensure that they are added to the CMake
   files as well as to the autoconf files. 
@@ -91,9 +91,11 @@ distribution for a new release.
 . Compile and test with many different config options, and combinations of
   options. The maint/ManyConfigTests script now encapsulates this testing.
 
-. Run perltest.pl on the test data for tests 1 and 4. The output should match
-  the PCRE test output, apart from the version identification at the top. The
-  other tests are not Perl-compatible (they use various special PCRE options).
+. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can 
+  be run with Perl 5.8 or 5.10; the last two require Perl 5.10. The output
+  should match the PCRE test output, apart from the version identification at
+  the start of each test. The other tests are not Perl-compatible (they use
+  various PCRE-specific features or options).
 
 . Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest
   valgrind", though that takes quite a long time.
@@ -116,9 +118,9 @@ distribution for a new release.
   used" warnings for the modules in which there is no call to memmove(). These
   can be ignored.
 
-. Documentation: check AUTHORS, COPYING, ChangeLog (check date), INSTALL,
-  LICENCE, NEWS (check date), NON-UNIX-USE, and README. Many of these won't
-  need changing, but over the long term things do change.
+. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date), 
+  INSTALL, LICENCE, NEWS (check version and date), NON-UNIX-USE, and README.
+  Many of these won't need changing, but over the long term things do change.
 
 . Man pages: Check all man pages for \ not followed by e or f or " because
   that indicates a markup error.
@@ -138,7 +140,7 @@ spaces). Then run "make distcheck" to create the tarballs and the zipball.
 Double-check with "svn status", then create an SVN tagged copy:
 
   svn copy svn://vcs.exim.org/pcre/code/trunk \
-           svn://vcs.exim.org/pcre/code/tags/pcre-7.x 
+           svn://vcs.exim.org/pcre/code/tags/pcre-8.xx 
 
 Don't forget to update Freshmeat when the new release is out, and to tell
 webmaster@pcre.org and the mailing list.
@@ -166,7 +168,7 @@ others are relatively new.
     to have little effect, and maybe makes things worse.
 
   * "Ends with literal string" - note that a single character doesn't gain much
-    over the existing "required byte" (reqbyte) feature that just saves one
+    over the existing "required byte" (reqbyte) feature that just remembers one
     byte.
 
   * These probably need to go in study():
@@ -176,9 +178,14 @@ others are relatively new.
     o A required byte from alternatives - not just the last char, but an
       earlier one if common to all alternatives.
 
-    o Minimum length of subject needed.
+    o Minimum length of subject needed (see also next . bullet).
 
     o Friedl contains other ideas.
+    
+. There was a request for a way of finding the minimum subject length that can
+  match a given pattern. (If this were available, it could be usefully added
+  to study() - see above.) This is easy for simple cases, but I haven't figured 
+  out how to handle recursion.   
 
 . If Perl gets to a consistent state over the settings of capturing sub-
   patterns inside repeats, see if we can match it. One example of the
@@ -213,10 +220,10 @@ others are relatively new.
 
   * Option to use NUL as a line terminator in subject strings. This could now
     be done relatively easily since the extension to support LF, CR, and CRLF.
-    If this is done, a suitable option for pcregrep is also required.
+    If it is done, a suitable option for pcregrep is also required.
 
 . Option to provide the pattern with a length instead of with a NUL terminator.
-  This probably affects quite a few places in the code.
+  This affects quite a few places in the code and is not trivial.
 
 . Catch SIGSEGV for stack overflows?
 
@@ -231,7 +238,7 @@ others are relatively new.
   preceded by a blank line, instead of adding it to every matched line, and (b)
   support --outputfile=name.
 
-. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 7.
+. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 8.
 
 . Add a user pointer to pcre_malloc/free functions -- some option would be
   needed to retain backward compatibility.
@@ -268,9 +275,37 @@ others are relatively new.
 . Callouts with arguments: (?Cn:ARG) for instance.
 
 . A user is going to supply a patch to generalize the API for user-specific 
-  memory allocation so that it is more flexible in threaded environments.
+  memory allocation so that it is more flexible in threaded environments. Thiw 
+  was promised a long time ago, and never appeared...
+  
+. Write a function that generates random matching strings for a compiled regex.
+
+. Write a wrapper to maintain a structure with specified runtime parameters, 
+  such as recurse limit, and pass these to PCRE each time it is called. Also 
+  maybe malloc and free. A user sent a prototype.
+  
+. Pcregrep: an option to specify the output line separator, either as a string 
+  or select from a fixed list. This is not dead easy, because at the moment it 
+  outputs whatever is in the input file.
+  
+. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete, 
+  non-thread-safe patch showed that this can help performance for patterns 
+  where there are many alternatives. However, a simple thread-safe 
+  implementation that I tried made things worse in many simple cases, so this 
+  is not an obviously good thing.
+  
+. Make the longest lookbehind available via pcre_fullinfo(). This is not 
+  straightforward because lookbehinds can be nested inside lookbehinds. This 
+  case will have to be identified, and the amounts added. This should then give 
+  the maximum possible lookbehind length. The reason for wanting this is to 
+  help when implementing multi-segment matching using pcre_exec() with partial
+  matching and overlapping segments.
+  
+. PCRE cannot at present distinguish between subpatterns with different names,
+  but the same number (created by the use of ?|). In order to do so, a way of 
+  remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
 
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 26 August 2008
+Last updated: 20 September 2009
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-09-22 09:42:11 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2009-09-22 09:42:11 +0000
commit	13ec83b84a6939e47ebabc1836caec7d94836896 (patch)
tree	4590c85bd69ba6b50d8a741a3469a023edfc03fc /maint/README
parent	20dd865c5c8f10036cda34b9870351b702399c08 (diff)
download	pcre-13ec83b84a6939e47ebabc1836caec7d94836896.tar.gz