diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2010-06-03 19:18:24 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2010-06-03 19:18:24 +0000 |
commit | c8b8f5074c8e0f3ccf5621bf55a5b13b8c32043f (patch) | |
tree | 1c305bfeea11677c8369a04f363841e5ccc2d7fa /maint | |
parent | fb40fb6ad1eff9249f36732b6628ef6285ea9a39 (diff) | |
download | pcre-c8b8f5074c8e0f3ccf5621bf55a5b13b8c32043f.tar.gz |
Prepare for release candidate.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@535 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'maint')
-rw-r--r-- | maint/README | 122 |
1 files changed, 62 insertions, 60 deletions
diff --git a/maint/README b/maint/README index 82062ea..f6c9102 100644 --- a/maint/README +++ b/maint/README @@ -25,20 +25,20 @@ Builducptable A Perl script that creates the contents of the ucptable.h file GenerateUtt.py A Python script to generate part of the pcre_tables.c file that contains Unicode script names in a long string with - offsets, which is tedious to maintain by hand. + offsets, which is tedious to maintain by hand. ManyConfigTests A shell script that runs "configure, make, test" a number of times with different configuration settings. - + MultiStage2.py A Python script that generates the file pcre_ucd.c from three Unicode data tables, which are themselves downloaded from the - Unicode web site. Run this script in the "maint" directory. + Unicode web site. Run this script in the "maint" directory. The generated file contains the tables for a 2-stage lookup - of Unicode properties. + of Unicode properties. README This file. -Unicode.tables The files in this directory, DerivedGeneralCategory.txt, +Unicode.tables The files in this directory, DerivedGeneralCategory.txt, Scripts.txt and UnicodeData.txt, were downloaded from the Unicode web site. They contain information about Unicode characters and scripts. @@ -71,9 +71,9 @@ can be run to generate a new version of pcre_ucd.c, and GenerateUtt.py can be run to generate the tricky tables for inclusion in pcre_tables.c. If MultiStage2.py gives the error "ValueError: list.index(x): x not in list", -the cause is usually a missing (or misspelt) name in the list of scripts. I -couldn't find a straightforward list of scripts on the Unicode site, but -there's a useful Wikipedia page that list them, and notes the Unicode version +the cause is usually a missing (or misspelt) name in the list of scripts. I +couldn't find a straightforward list of scripts on the Unicode site, but +there's a useful Wikipedia page that list them, and notes the Unicode version in which they were introduced: http://en.wikipedia.org/wiki/Unicode_scripts#Table_of_Unicode_scripts @@ -83,7 +83,7 @@ pcre_ucd.c work properly, using the data files in ucptestdata to check a number of test characters. The source file ucptest.c must be updated whenever new Unicode script names are added. -Note also that both the pcresyntax.3 and pcrepattern.3 man pages contain lists +Note also that both the pcresyntax.3 and pcrepattern.3 man pages contain lists of Unicode script names. @@ -94,20 +94,20 @@ This section contains a checklist of things that I consult before building a distribution for a new release. . Ensure that the version number and version date are correct in configure.ac. - + . If new build options have been added, ensure that they are added to the CMake - files as well as to the autoconf files. + files as well as to the autoconf files. . Run ./autogen.sh to ensure everything is up-to-date. . Compile and test with many different config options, and combinations of options. The maint/ManyConfigTests script now encapsulates this testing. -. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can - be run with Perl 5.8 or 5.10; the last two require Perl 5.10. The output - should match the PCRE test output, apart from the version identification at - the start of each test. The other tests are not Perl-compatible (they use - various PCRE-specific features or options). +. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can + be run with Perl 5.8 or >= 5.10; the last two require Perl >= 5.10. The + output should match the PCRE test output, apart from the version + identification at the start of each test. The other tests are not + Perl-compatible (they use various PCRE-specific features or options). . Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest valgrind", though that takes quite a long time. @@ -130,14 +130,14 @@ distribution for a new release. used" warnings for the modules in which there is no call to memmove(). These can be ignored. -. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date), +. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date), INSTALL, LICENCE, NEWS (check version and date), NON-UNIX-USE, and README. Many of these won't need changing, but over the long term things do change. . Man pages: Check all man pages for \ not followed by e or f or " because - that indicates a markup error. However, there is one exception: pcredemo.3, + that indicates a markup error. However, there is one exception: pcredemo.3, which is created from the pcredemo.c program. It contains three instances - of \\n. + of \\n. . When the release is built, test it on a number of different operating systems if possible, and using different compilers as well. For example, @@ -154,10 +154,10 @@ spaces). Then run "make distcheck" to create the tarballs and the zipball. Double-check with "svn status", then create an SVN tagged copy: svn copy svn://vcs.exim.org/pcre/code/trunk \ - svn://vcs.exim.org/pcre/code/tags/pcre-8.xx + svn://vcs.exim.org/pcre/code/tags/pcre-8.xx Don't forget to update Freshmeat when the new release is out, and to tell -webmaster@pcre.org and the mailing list. Also, update the list of version +webmaster@pcre.org and the mailing list. Also, update the list of version numbers in Bugzilla (edit products). @@ -186,7 +186,7 @@ others are relatively new. over the existing "required byte" (reqbyte) feature that just remembers one byte. - * These probably need to go in study(): + * These probably need to go in pcre_study(): o Remember an initial string rather than just 1 char? @@ -194,7 +194,14 @@ others are relatively new. earlier one if common to all alternatives. o Friedl contains other ideas. - + + * pcre_study() does not set initial byte flags for Unicode property types + such as \p; I don't know how much benefit there would be for, for example, + setting the bits for 0-9 and all bytes >= xC0 when a pattern starts with + \p{N}. + + * There is scope for more "auto-possessifying" in connection with \p and \P. + . If Perl gets to a consistent state over the settings of capturing sub- patterns inside repeats, see if we can match it. One example of the difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE @@ -205,11 +212,6 @@ others are relatively new. . Unicode - * Note that in Perl, \s matches \pZ and similarly for \d, \w and the POSIX - character classes. For the moment, I've chosen not to support this for - backward compatibility, for speed, and because it would be messy to - implement. - * A different approach to Unicode might be to use a typedef to do everything in unsigned shorts instead of unsigned chars. Actually, we'd have to have a new typedef to distinguish data from bits of compiled pattern that are in @@ -271,54 +273,54 @@ others are relatively new. . Someone suggested --disable-callout to save code space when callouts are never wanted. This seems rather marginal. - -. Check names that consist entirely of digits: PCRE allows, but do Perl and - Python, etc? - -. A user suggested a parameter to limit the length of string matched, for - example if the parameter is N, the current match should fail if the matched - substring exceeds N. This could apply to both match functions. The value + +. Check names that consist entirely of digits: PCRE allows, but do Perl and + Python, etc? + +. A user suggested a parameter to limit the length of string matched, for + example if the parameter is N, the current match should fail if the matched + substring exceeds N. This could apply to both match functions. The value could be a new field in the extra block. - + . Callouts with arguments: (?Cn:ARG) for instance. -. A user is going to supply a patch to generalize the API for user-specific +. A user is going to supply a patch to generalize the API for user-specific memory allocation so that it is more flexible in threaded environments. This was promised a long time ago, and never appeared... - + . Write a function that generates random matching strings for a compiled regex. -. Write a wrapper to maintain a structure with specified runtime parameters, - such as recurse limit, and pass these to PCRE each time it is called. Also +. Write a wrapper to maintain a structure with specified runtime parameters, + such as recurse limit, and pass these to PCRE each time it is called. Also maybe malloc and free. A user sent a prototype. - -. Pcregrep: an option to specify the output line separator, either as a string - or select from a fixed list. This is not dead easy, because at the moment it + +. Pcregrep: an option to specify the output line separator, either as a string + or select from a fixed list. This is not dead easy, because at the moment it outputs whatever is in the input file. - -. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete, - non-thread-safe patch showed that this can help performance for patterns - where there are many alternatives. However, a simple thread-safe - implementation that I tried made things worse in many simple cases, so this + +. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete, + non-thread-safe patch showed that this can help performance for patterns + where there are many alternatives. However, a simple thread-safe + implementation that I tried made things worse in many simple cases, so this is not an obviously good thing. - -. Make the longest lookbehind available via pcre_fullinfo(). This is not - straightforward because lookbehinds can be nested inside lookbehinds. This - case will have to be identified, and the amounts added. This should then give - the maximum possible lookbehind length. The reason for wanting this is to + +. Make the longest lookbehind available via pcre_fullinfo(). This is not + straightforward because lookbehinds can be nested inside lookbehinds. This + case will have to be identified, and the amounts added. This should then give + the maximum possible lookbehind length. The reason for wanting this is to help when implementing multi-segment matching using pcre_exec() with partial matching and overlapping segments. - + . PCRE cannot at present distinguish between subpatterns with different names, - but the same number (created by the use of ?|). In order to do so, a way of + but the same number (created by the use of ?|). In order to do so, a way of remembering *which* subpattern numbered n matched is needed. Bugzilla #760. - -. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include + Now that (*MARK) has been implemented, it can perhaps be used as a way round + this problem. + +. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include "something" and the the #ifdef appears only in one place, in "something". - -. Support for (*MARK) and arguments for (*PRUNE) and friends. Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 10 March 2010 +Last updated: 03 June 2010 |