summaryrefslogtreecommitdiff
path: root/maint
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-06-03 19:18:24 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2010-06-03 19:18:24 +0000
commitc8b8f5074c8e0f3ccf5621bf55a5b13b8c32043f (patch)
tree1c305bfeea11677c8369a04f363841e5ccc2d7fa /maint
parentfb40fb6ad1eff9249f36732b6628ef6285ea9a39 (diff)
downloadpcre-c8b8f5074c8e0f3ccf5621bf55a5b13b8c32043f.tar.gz
Prepare for release candidate.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@535 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'maint')
-rw-r--r--maint/README122
1 files changed, 62 insertions, 60 deletions
diff --git a/maint/README b/maint/README
index 82062ea..f6c9102 100644
--- a/maint/README
+++ b/maint/README
@@ -25,20 +25,20 @@ Builducptable A Perl script that creates the contents of the ucptable.h file
GenerateUtt.py A Python script to generate part of the pcre_tables.c file
that contains Unicode script names in a long string with
- offsets, which is tedious to maintain by hand.
+ offsets, which is tedious to maintain by hand.
ManyConfigTests A shell script that runs "configure, make, test" a number of
times with different configuration settings.
-
+
MultiStage2.py A Python script that generates the file pcre_ucd.c from three
Unicode data tables, which are themselves downloaded from the
- Unicode web site. Run this script in the "maint" directory.
+ Unicode web site. Run this script in the "maint" directory.
The generated file contains the tables for a 2-stage lookup
- of Unicode properties.
+ of Unicode properties.
README This file.
-Unicode.tables The files in this directory, DerivedGeneralCategory.txt,
+Unicode.tables The files in this directory, DerivedGeneralCategory.txt,
Scripts.txt and UnicodeData.txt, were downloaded from the
Unicode web site. They contain information about Unicode
characters and scripts.
@@ -71,9 +71,9 @@ can be run to generate a new version of pcre_ucd.c, and GenerateUtt.py can be
run to generate the tricky tables for inclusion in pcre_tables.c.
If MultiStage2.py gives the error "ValueError: list.index(x): x not in list",
-the cause is usually a missing (or misspelt) name in the list of scripts. I
-couldn't find a straightforward list of scripts on the Unicode site, but
-there's a useful Wikipedia page that list them, and notes the Unicode version
+the cause is usually a missing (or misspelt) name in the list of scripts. I
+couldn't find a straightforward list of scripts on the Unicode site, but
+there's a useful Wikipedia page that list them, and notes the Unicode version
in which they were introduced:
http://en.wikipedia.org/wiki/Unicode_scripts#Table_of_Unicode_scripts
@@ -83,7 +83,7 @@ pcre_ucd.c work properly, using the data files in ucptestdata to check a number
of test characters. The source file ucptest.c must be updated whenever new
Unicode script names are added.
-Note also that both the pcresyntax.3 and pcrepattern.3 man pages contain lists
+Note also that both the pcresyntax.3 and pcrepattern.3 man pages contain lists
of Unicode script names.
@@ -94,20 +94,20 @@ This section contains a checklist of things that I consult before building a
distribution for a new release.
. Ensure that the version number and version date are correct in configure.ac.
-
+
. If new build options have been added, ensure that they are added to the CMake
- files as well as to the autoconf files.
+ files as well as to the autoconf files.
. Run ./autogen.sh to ensure everything is up-to-date.
. Compile and test with many different config options, and combinations of
options. The maint/ManyConfigTests script now encapsulates this testing.
-. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can
- be run with Perl 5.8 or 5.10; the last two require Perl 5.10. The output
- should match the PCRE test output, apart from the version identification at
- the start of each test. The other tests are not Perl-compatible (they use
- various PCRE-specific features or options).
+. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can
+ be run with Perl 5.8 or >= 5.10; the last two require Perl >= 5.10. The
+ output should match the PCRE test output, apart from the version
+ identification at the start of each test. The other tests are not
+ Perl-compatible (they use various PCRE-specific features or options).
. Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest
valgrind", though that takes quite a long time.
@@ -130,14 +130,14 @@ distribution for a new release.
used" warnings for the modules in which there is no call to memmove(). These
can be ignored.
-. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date),
+. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date),
INSTALL, LICENCE, NEWS (check version and date), NON-UNIX-USE, and README.
Many of these won't need changing, but over the long term things do change.
. Man pages: Check all man pages for \ not followed by e or f or " because
- that indicates a markup error. However, there is one exception: pcredemo.3,
+ that indicates a markup error. However, there is one exception: pcredemo.3,
which is created from the pcredemo.c program. It contains three instances
- of \\n.
+ of \\n.
. When the release is built, test it on a number of different operating
systems if possible, and using different compilers as well. For example,
@@ -154,10 +154,10 @@ spaces). Then run "make distcheck" to create the tarballs and the zipball.
Double-check with "svn status", then create an SVN tagged copy:
svn copy svn://vcs.exim.org/pcre/code/trunk \
- svn://vcs.exim.org/pcre/code/tags/pcre-8.xx
+ svn://vcs.exim.org/pcre/code/tags/pcre-8.xx
Don't forget to update Freshmeat when the new release is out, and to tell
-webmaster@pcre.org and the mailing list. Also, update the list of version
+webmaster@pcre.org and the mailing list. Also, update the list of version
numbers in Bugzilla (edit products).
@@ -186,7 +186,7 @@ others are relatively new.
over the existing "required byte" (reqbyte) feature that just remembers one
byte.
- * These probably need to go in study():
+ * These probably need to go in pcre_study():
o Remember an initial string rather than just 1 char?
@@ -194,7 +194,14 @@ others are relatively new.
earlier one if common to all alternatives.
o Friedl contains other ideas.
-
+
+ * pcre_study() does not set initial byte flags for Unicode property types
+ such as \p; I don't know how much benefit there would be for, for example,
+ setting the bits for 0-9 and all bytes >= xC0 when a pattern starts with
+ \p{N}.
+
+ * There is scope for more "auto-possessifying" in connection with \p and \P.
+
. If Perl gets to a consistent state over the settings of capturing sub-
patterns inside repeats, see if we can match it. One example of the
difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE
@@ -205,11 +212,6 @@ others are relatively new.
. Unicode
- * Note that in Perl, \s matches \pZ and similarly for \d, \w and the POSIX
- character classes. For the moment, I've chosen not to support this for
- backward compatibility, for speed, and because it would be messy to
- implement.
-
* A different approach to Unicode might be to use a typedef to do everything
in unsigned shorts instead of unsigned chars. Actually, we'd have to have a
new typedef to distinguish data from bits of compiled pattern that are in
@@ -271,54 +273,54 @@ others are relatively new.
. Someone suggested --disable-callout to save code space when callouts are
never wanted. This seems rather marginal.
-
-. Check names that consist entirely of digits: PCRE allows, but do Perl and
- Python, etc?
-
-. A user suggested a parameter to limit the length of string matched, for
- example if the parameter is N, the current match should fail if the matched
- substring exceeds N. This could apply to both match functions. The value
+
+. Check names that consist entirely of digits: PCRE allows, but do Perl and
+ Python, etc?
+
+. A user suggested a parameter to limit the length of string matched, for
+ example if the parameter is N, the current match should fail if the matched
+ substring exceeds N. This could apply to both match functions. The value
could be a new field in the extra block.
-
+
. Callouts with arguments: (?Cn:ARG) for instance.
-. A user is going to supply a patch to generalize the API for user-specific
+. A user is going to supply a patch to generalize the API for user-specific
memory allocation so that it is more flexible in threaded environments. This
was promised a long time ago, and never appeared...
-
+
. Write a function that generates random matching strings for a compiled regex.
-. Write a wrapper to maintain a structure with specified runtime parameters,
- such as recurse limit, and pass these to PCRE each time it is called. Also
+. Write a wrapper to maintain a structure with specified runtime parameters,
+ such as recurse limit, and pass these to PCRE each time it is called. Also
maybe malloc and free. A user sent a prototype.
-
-. Pcregrep: an option to specify the output line separator, either as a string
- or select from a fixed list. This is not dead easy, because at the moment it
+
+. Pcregrep: an option to specify the output line separator, either as a string
+ or select from a fixed list. This is not dead easy, because at the moment it
outputs whatever is in the input file.
-
-. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete,
- non-thread-safe patch showed that this can help performance for patterns
- where there are many alternatives. However, a simple thread-safe
- implementation that I tried made things worse in many simple cases, so this
+
+. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete,
+ non-thread-safe patch showed that this can help performance for patterns
+ where there are many alternatives. However, a simple thread-safe
+ implementation that I tried made things worse in many simple cases, so this
is not an obviously good thing.
-
-. Make the longest lookbehind available via pcre_fullinfo(). This is not
- straightforward because lookbehinds can be nested inside lookbehinds. This
- case will have to be identified, and the amounts added. This should then give
- the maximum possible lookbehind length. The reason for wanting this is to
+
+. Make the longest lookbehind available via pcre_fullinfo(). This is not
+ straightforward because lookbehinds can be nested inside lookbehinds. This
+ case will have to be identified, and the amounts added. This should then give
+ the maximum possible lookbehind length. The reason for wanting this is to
help when implementing multi-segment matching using pcre_exec() with partial
matching and overlapping segments.
-
+
. PCRE cannot at present distinguish between subpatterns with different names,
- but the same number (created by the use of ?|). In order to do so, a way of
+ but the same number (created by the use of ?|). In order to do so, a way of
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
-
-. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
+ Now that (*MARK) has been implemented, it can perhaps be used as a way round
+ this problem.
+
+. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
"something" and the the #ifdef appears only in one place, in "something".
-
-. Support for (*MARK) and arguments for (*PRUNE) and friends.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 10 March 2010
+Last updated: 03 June 2010