Return an error code when pcre2_get_error_message() does not recognize an error

code, and add a pcre2test facility for testing this. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@526 6239d852-aaf2-0410-a92c-79f79f948069
author: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> 2016-06-17 11:30:27 +0000
committer: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> 2016-06-17 11:30:27 +0000
commit: a01b0686cbabaa07150096d3bd6372663b523580 (patch)
tree: b1fed3a8965fb26c2a9c05562c16d1e38089255d /doc/html
parent: fe564b54796c9ab092801d60eb56791fe6417589 (diff)
download: pcre2-a01b0686cbabaa07150096d3bd6372663b523580.tar.gz
3 files changed, 107 insertions, 60 deletions
diff --git a/doc/html/pcre2_get_error_message.html b/doc/html/pcre2_get_error_message.html
index 5d42291..26c80fe 100644
--- a/doc/html/pcre2_get_error_message.html
+++ b/doc/html/pcre2_get_error_message.html
@@ -35,7 +35,10 @@ errors are negative numbers. The arguments are:
   <i>bufflen</i>     the length of the buffer (code units)
 </pre>
 The function returns the length of the message, excluding the trailing zero, or
-a negative error code if the buffer is too small.
+the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In
+this case, the returned message is truncated (but still with a trailing zero).
+If <i>errorcode</i> does not contain a recognized error code number, the
+negative value PCRE2_ERROR_BADDATA is returned.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
index db4e7c1..a397966 100644
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@@ -43,16 +43,17 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
 <li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
 <li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
-<li><a name="TOC31" href="#SEC31">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
-<li><a name="TOC32" href="#SEC32">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
-<li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
-<li><a name="TOC34" href="#SEC34">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
-<li><a name="TOC35" href="#SEC35">DUPLICATE SUBPATTERN NAMES</a>
-<li><a name="TOC36" href="#SEC36">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
-<li><a name="TOC37" href="#SEC37">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
-<li><a name="TOC38" href="#SEC38">SEE ALSO</a>
-<li><a name="TOC39" href="#SEC39">AUTHOR</a>
-<li><a name="TOC40" href="#SEC40">REVISION</a>
+<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
+<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
+<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
+<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
+<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
+<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
+<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
+<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
+<li><a name="TOC39" href="#SEC39">SEE ALSO</a>
+<li><a name="TOC40" href="#SEC40">AUTHOR</a>
+<li><a name="TOC41" href="#SEC41">REVISION</a>
 </ul>
 <P>
 <b>#include &#60;pcre2.h&#62;</b>
@@ -1063,7 +1064,7 @@ The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
 The pattern is defined by a pointer to a string of code units and a length. If
 the pattern is zero-terminated, the length can be specified as
 PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
-contains the compiled pattern and related data.
+contains the compiled pattern and related data, or NULL if an error occurred.
 </P>
 <P>
 If the compile context argument <i>ccontext</i> is NULL, memory for the compiled
@@ -1085,8 +1086,9 @@ to acquire a private copy of shared compiled code.
 <P>
 NOTE: When one of the matching functions is called, pointers to the compiled
 pattern and the subject string are set in the match data block so that they can
-be referenced by the extraction functions. After running a match, you must not
-free a compiled pattern (or a subject string) until after all operations on the
+be referenced by the substring extraction functions. After running a match, you
+must not free a compiled pattern (or a subject string) until after all
+operations on the
 <a href="#matchdatablock">match data block</a>
 have taken place.
 </P>
@@ -1113,13 +1115,20 @@ newline setting) can be provided in a compile context (as described
 </P>
 <P>
 If <i>errorcode</i> or <i>erroroffset</i> is NULL, <b>pcre2_compile()</b> returns
-NULL immediately. Otherwise, if compilation of a pattern fails,
-<b>pcre2_compile()</b> returns NULL, having set these variables to an error code
-and an offset (number of code units) within the pattern, respectively. The
-<b>pcre2_get_error_message()</b> function provides a textual message for each
-error code. Compilation errors are positive numbers, but UTF formatting errors
-are negative numbers. For an invalid UTF-8 or UTF-16 string, the offset is that
-of the first code unit of the failing character.
+NULL immediately. Otherwise, the variables to which these point are set to an
+error code and an offset (number of code units) within the pattern,
+respectively, when <b>pcre2_compile()</b> returns NULL because a compilation
+error has occurred. The values are not defined when compilation is successful
+and <b>pcre2_compile()</b> returns a non-NULL value.
+</P>
+<P>
+The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual error
+message"
+<a href="#geterrormessage">below)</a>
+provides a textual message for each error code. Compilation errors have
+positive error codes; UTF formatting error codes are negative. For an invalid
+UTF-8 or UTF-16 string, the offset is that of the first code unit of the
+failing character.
 </P>
 <P>
 Some errors are not detected until the whole pattern has been scanned; in these
@@ -1488,13 +1497,16 @@ page.
 </P>
 <br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
 <P>
-There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
-it finds an error in the pattern. There are also some negative error codes that
-are used for invalid UTF strings. These are the same as given by
-<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described in the
+There are over 80 positive error codes that <b>pcre2_compile()</b> may return
+(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
+negative error codes that are used for invalid UTF strings. These are the same
+as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
+in the
 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
-page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
-textual error message from any error code.
+page. The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual
+error message"
+<a href="#geterrormessage">below)</a>
+can be called to obtain a textual error message from any error code.
 <a name="jitcompiling"></a></P>
 <br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
 <P>
@@ -2416,11 +2428,13 @@ page.
 <br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
 <P>
 If <b>pcre2_match()</b> fails, it returns a negative number. This can be
-converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
-error codes are also returned by other functions, and are documented with them.
-The codes are given names in the header file. If UTF checking is in force and
-an invalid UTF subject string is detected, one of a number of UTF-specific
-negative error codes is returned. Details are given in the
+converted to a text string by calling the <b>pcre2_get_error_message()</b>
+function (see "Obtaining a textual error message"
+<a href="#geterrormessage">below).</a>
+Negative error codes are also returned by other functions, and are documented
+with them. The codes are given names in the header file. If UTF checking is in
+force and an invalid UTF subject string is detected, one of a number of
+UTF-specific negative error codes is returned. Details are given in the
 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
 page. The following are the other errors that may be returned by
 <b>pcre2_match()</b>:
@@ -2521,8 +2535,29 @@ is attempted.
   PCRE2_ERROR_RECURSIONLIMIT
 </pre>
 The internal recursion limit was reached.
+<a name="geterrormessage"></a></P>
+<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
+<P>
+<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
+<b>  PCRE2_SIZE <i>bufflen</i>);</b>
+</P>
+<P>
+A text message for an error code from any PCRE2 function (compile, match, or 
+auxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code 
+is passed as the first argument, with the remaining two arguments specifying a 
+code unit buffer and its length, into which the text message is placed. Note 
+that the message is returned in code units of the appropriate width for the 
+library that is being used. 
+</P>
+<P>
+The returned message is terminated with a trailing zero, and the function
+returns the number of code units used, excluding the trailing zero. If the
+error number is unknown, the negative error code PCRE2_ERROR_BADDATA is
+returned. If the buffer is too small, the message is truncated (but still with
+a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
+None of the messages are very long; a buffer size of 120 code units is ample.
 <a name="extractbynumber"></a></P>
-<br><a name="SEC31" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
+<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
 <P>
 <b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
 <b>  uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
@@ -2619,7 +2654,7 @@ The substring did not participate in the match. For example, if the pattern is
 (abc)|(def) and the subject is "def", and the ovector contains at least two
 capturing slots, substring number 1 is unset.
 </P>
-<br><a name="SEC32" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
+<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
 <P>
 <b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
 <b>"  PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
@@ -2658,7 +2693,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
 appropriate offset in the ovector, which contain PCRE2_UNSET for unset
 substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
 <a name="extractbyname"></a></P>
-<br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
+<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
 <P>
 <b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
 <b>  PCRE2_SPTR <i>name</i>);</b>
@@ -2718,7 +2753,7 @@ names are not included in the compiled code. The matching process uses only
 numbers. For this reason, the use of different names for subpatterns of the
 same number causes an error at compile time.
 </P>
-<br><a name="SEC34" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
+<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
 <P>
 <b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@@ -2921,9 +2956,11 @@ started, which can happen if \K is used in an assertion).
 </P>
 <P>
 As for all PCRE2 errors, a text message that describes the error can be
-obtained by calling <b>pcre2_get_error_message()</b>.
+obtained by calling the <b>pcre2_get_error_message()</b> function (see
+"Obtaining a textual error message"
+<a href="#geterrormessage">above).</a>
 </P>
-<br><a name="SEC35" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
+<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
 <P>
 <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
 <b>  PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
@@ -2968,7 +3005,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
 relevant entries for the name, you can extract each of their numbers, and hence
 the captured data.
 </P>
-<br><a name="SEC36" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
+<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
 <P>
 The traditional matching function uses a similar algorithm to Perl, which stops
 when it finds the first match at a given point in the subject. If you want to
@@ -2986,7 +3023,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
 other alternatives. Ultimately, when it runs out of matches,
 <b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
 <a name="dfamatch"></a></P>
-<br><a name="SEC37" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
+<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
 <P>
 <b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@@ -3181,13 +3218,13 @@ some plausibility checks are made on the contents of the workspace, which
 should contain data about the previous partial match. If any of these checks
 fail, this error is given.
 </P>
-<br><a name="SEC38" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
 <b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
 <b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
 </P>
-<br><a name="SEC39" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -3196,9 +3233,9 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC40" href="#TOC1">REVISION</a><br>
+<br><a name="SEC41" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 05 June 2016
+Last updated: 17 June 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index bbe8fa5..148c4b3 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -98,7 +98,7 @@ further data is read.
 </P>
 <P>
 For maximum portability, therefore, it is safest to avoid non-printing
-characters in <b>pcre2test</b> input files. There is a facility for specifying 
+characters in <b>pcre2test</b> input files. There is a facility for specifying
 some or all of a pattern's characters as hexadecimal pairs, thus making it
 possible to include binary zeroes in a pattern for testing purposes. Subject
 lines are processed for backslash escapes, which makes it possible to include
@@ -179,6 +179,13 @@ using the <b>pcre2_dfa_match()</b> function instead of the default
 <b>pcre2_match()</b>.
 </P>
 <P>
+<b>-error</b> <i>number[,number,...]</i>
+Call <b>pcre2_get_error_message()</b> for each of the error numbers in the
+comma-separated list, display the resulting messages on the standard output,
+then exit with zero exit code. The numbers may be positive or negative. This is
+a convenience facility for PCRE2 maintainers.
+</P>
+<P>
 <b>-help</b>
 Output a brief summary these options and then exit.
 </P>
@@ -572,7 +579,7 @@ about the pattern:
       null_context              compile with a NULL context
       parens_nest_limit=&#60;n&#62;     set maximum parentheses depth
       posix                     use the POSIX API
-      posix_nosub               use the POSIX API with REG_NOSUB 
+      posix_nosub               use the POSIX API with REG_NOSUB
       push                      push compiled pattern onto the stack
       pushcopy                  push a copy onto the stack
       stackguard=&#60;number&#62;       test the stackguard feature
@@ -662,22 +669,22 @@ default values).
 Specifying pattern characters in hexadecimal
 </b><br>
 <P>
-The <b>hex</b> modifier specifies that the characters of the pattern, except for 
+The <b>hex</b> modifier specifies that the characters of the pattern, except for
 substrings enclosed in single or double quotes, are to be interpreted as pairs
 of hexadecimal digits. This feature is provided as a way of creating patterns
 that contain binary zeros and other non-printing characters. White space is
-permitted between pairs of digits. For example, this pattern contains three 
+permitted between pairs of digits. For example, this pattern contains three
 characters:
 <pre>
   /ab 32 59/hex
 </pre>
-Parts of such a pattern are taken literally if quoted. This pattern contains 
+Parts of such a pattern are taken literally if quoted. This pattern contains
 nine characters, only two of which are specified in hexadecimal:
 <pre>
   /ab "literal" 32/hex
 </pre>
 Either single or double quotes may be used. There is no way of including
-the delimiter within a substring. 
+the delimiter within a substring.
 </P>
 <P>
 By default, <b>pcre2test</b> passes patterns as zero-terminated strings to
@@ -935,8 +942,8 @@ line to contain a new pattern (or a command) instead of a subject line. This
 facility is used when saving compiled patterns to a file, as described in the
 section entitled "Saving and restoring compiled patterns"
 <a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
-pattern is stacked, leaving the original as current, ready to match the 
-following input lines. This provides a way of testing the 
+pattern is stacked, leaving the original as current, ready to match the
+following input lines. This provides a way of testing the
 <b>pcre2_code_copy()</b> function.
 The <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation
 modifiers such as <b>global</b> that act at match time. Any that are specified
@@ -962,7 +969,7 @@ for a description of their effects.
       anchored                  set PCRE2_ANCHORED
       dfa_restart               set PCRE2_DFA_RESTART
       dfa_shortest              set PCRE2_DFA_SHORTEST
-      no_jit                    set PCRE2_NO_JIT 
+      no_jit                    set PCRE2_NO_JIT
       no_utf_check              set PCRE2_NO_UTF_CHECK
       notbol                    set PCRE2_NOTBOL
       notempty                  set PCRE2_NOTEMPTY
@@ -1023,7 +1030,7 @@ pattern.
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
       zero_terminate             pass the subject as zero-terminated
 </pre>
-The effects of these modifiers are described in the following sections. When 
+The effects of these modifiers are described in the following sections. When
 matching via the POSIX wrapper API, the <b>aftertext</b>, <b>allaftertext</b>,
 and <b>ovector</b> subject modifiers work as described below. All other
 modifiers are either ignored, with a warning message, or cause an error.
@@ -1537,8 +1544,8 @@ item to be tested. For example:
 This output indicates that callout number 0 occurred for a match attempt
 starting at the fourth character of the subject string, when the pointer was at
 the seventh character, and when the next pattern item was \d. Just
-one circumflex is output if the start and current positions are the same, or if 
-the current position precedes the start position, which can happen if the 
+one circumflex is output if the start and current positions are the same, or if
+the current position precedes the start position, which can happen if the
 callout is in a lookbehind assertion.
 </P>
 <P>
@@ -1636,7 +1643,7 @@ the <b>pushcopy</b> modifier causes a copy of the compiled pattern to be
 stacked, leaving the original available for immediate matching. By using
 <b>push</b> and/or <b>pushcopy</b>, a number of patterns can be compiled and
 retained. These modifiers are incompatible with <b>posix</b>, and control
-modifiers that act at match time are ignored (with a message) for the stacked 
+modifiers that act at match time are ignored (with a message) for the stacked
 patterns. The <b>jitverify</b> modifier applies only at compile time.
 </P>
 <P>
@@ -1677,8 +1684,8 @@ If <b>jitverify</b> is used with #pop, it does not automatically imply
 <b>jit</b>, which is different behaviour from when it is used on a pattern.
 </P>
 <P>
-The #popcopy command is analagous to the <b>pushcopy</b> modifier in that it 
-makes current a copy of the topmost stack pattern, leaving the original still 
+The #popcopy command is analagous to the <b>pushcopy</b> modifier in that it
+makes current a copy of the topmost stack pattern, leaving the original still
 on the stack.
 </P>
 <br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
@@ -1698,7 +1705,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 05 June 2016
+Last updated: 17 June 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>
author	ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>	2016-06-17 11:30:27 +0000
committer	ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>	2016-06-17 11:30:27 +0000
commit	a01b0686cbabaa07150096d3bd6372663b523580 (patch)
tree	b1fed3a8965fb26c2a9c05562c16d1e38089255d /doc/html
parent	fe564b54796c9ab092801d60eb56791fe6417589 (diff)
download	pcre2-a01b0686cbabaa07150096d3bd6372663b523580.tar.gz