summaryrefslogtreecommitdiff
path: root/doc/pcre2convert.3
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2017-07-12 16:34:49 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2017-07-12 16:34:49 +0000
commit44a7756fbd9de2ba246cb9a2f8c20c85efbaac2e (patch)
tree7159c7d8fd446086b5bd32fa37ca7f93071d362b /doc/pcre2convert.3
parent4165460e9b0d5df7eba3a88b59b95a7ae95d9bb9 (diff)
downloadpcre2-44a7756fbd9de2ba246cb9a2f8c20c85efbaac2e.tar.gz
Document experimental pattern conversion functions and remove unimplemented
features. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@840 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/pcre2convert.3')
-rw-r--r--doc/pcre2convert.3163
1 files changed, 163 insertions, 0 deletions
diff --git a/doc/pcre2convert.3 b/doc/pcre2convert.3
new file mode 100644
index 0000000..3dadf6e
--- /dev/null
+++ b/doc/pcre2convert.3
@@ -0,0 +1,163 @@
+.TH PCRE2CONVERT 3 "12 July 2017" "PCRE2 10.30"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH "EXPERIMENTAL PATTERN CONVERSION FUNCTIONS"
+.rs
+.sp
+This document describes a set of functions that can be used to convert
+"foreign" patterns into PCRE2 regular expressions. This facility is currently
+experimental, and may be changed in future releases. Two kinds of pattern,
+globs and POSIX patterns, are supported.
+.
+.
+.SH "THE CONVERT CONTEXT"
+.rs
+.sp
+.nf
+.B pcre2_convert_context *pcre2_convert_context_create(
+.B " pcre2_general_context *\fIgcontext\fP);"
+.sp
+.B pcre2_convert_context *pcre2_convert_context_copy(
+.B " pcre2_convert_context *\fIcvcontext\fP);"
+.sp
+.B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP);
+.sp
+.B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP,
+.B " uint32_t \fIescape_char\fP);"
+.sp
+.B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP,
+.B " uint32_t \fIseparator_char\fP);"
+.fi
+.sp
+A convert context is used to hold parameters that affect the way that pattern
+conversion works. Like all PCRE2 contexts, you need to use a context only if
+you want to override the defaults. There are the usual create, copy, and free
+functions. If custom memory management functions are set in a general context
+that is passed to \fBpcre2_convert_context_create()\fP, they are used for all
+memory management within the conversion functions.
+.P
+There are only two parameters in the convert context at present. Both apply
+only to glob conversions. The escape character defaults to grave accent under
+Windows, otherwise backslash. It can be set to zero, meaning no escape
+character, or to any punctuation character with a code point less than 256.
+The separator character defaults to backslash under Windows, otherwise forward
+slash. It can be set to forward slash, backslash, or dot.
+.P
+The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
+their second argument is invalid.
+.
+.
+.SH "THE CONVERSION FUNCTION"
+.rs
+.sp
+.nf
+.B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP,
+.B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP,"
+.B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);"
+.sp
+.B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP);
+.fi
+.sp
+The first two arguments of \fBpcre2_pattern_convert()\fP define the foreign
+pattern that is to be converted. The length may be given as
+PCRE2_ZERO_TERMINATED. The \fBoptions\fP argument defines how the pattern is to
+be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
+PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
+One or more of the glob options, or one of the following POSIX options must be
+set to define the type of conversion that is required:
+.sp
+ PCRE2_CONVERT_GLOB
+ PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
+ PCRE2_CONVERT_GLOB_NO_STARSTAR
+ PCRE2_CONVERT_POSIX_BASIC
+ PCRE2_CONVERT_POSIX_EXTENDED
+.sp
+Details of the conversions are given below. The \fBbuffer\fP and \fBblength\fP
+arguments define how the output is handled:
+.P
+If \fBbuffer\fP is NULL, the function just returns the length of the converted
+pattern via \fBblength\fP. This is one less than the length of buffer needed,
+because a terminating zero is always added to the output.
+.P
+If \fBbuffer\fP points to a NULL pointer, an output buffer is obtained using
+the allocator in the context or \fBmalloc()\fP if no context is supplied. A
+pointer to this buffer is placed in the variable to which \fBbuffer\fP points.
+When no longer needed the output buffer must be freed by calling
+\fBpcre2_converted_pattern_free()\fP.
+.P
+If \fBbuffer\fP points to a non-NULL pointer, \fBblength\fP must be set to the
+actual length of the buffer provided (in code units).
+.P
+In all cases, after successful conversion, the variable pointed to by
+\fBblength\fP is updated to the length actually used (in code units), excluding
+the terminating zero that is always added.
+.P
+If an error occurs, the length (via \fBblength\fP) is set to the offset
+within the input pattern where the error was detected. Only gross syntax errors
+are caught; there are plenty of errors that will get passed on for
+\fBpcre2_compile()\fP to discover.
+.P
+The return from \fBpcre2_pattern_convert()\fP is zero on success or a non-zero
+PCRE2 error code. Note that PCRE2 error codes may be positive or negative:
+\fBpcre2_compile()\fP uses mostly positive codes and \fBpcre2_match()\fP
+negative ones; \fBpcre2_convert()\fP uses existing codes of both kinds. A
+textual error message can be obtained by calling
+\fBpcre2_get_error_message()\fP.
+.
+.
+.SH "CONVERTING GLOBS"
+.rs
+.sp
+Globs are used to match file names, and consequently have the concept of a
+"path separator", which defaults to backslash under Windows and forward slash
+otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
+permitted to match separator characters, but the double-star (**) feature
+(which does match separators) is supported.
+.P
+PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
+match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the
+double-star feature disabled. These options may be given together.
+.
+.
+.SH "CONVERTING POSIX PATTERNS"
+.rs
+.sp
+POSIX defines two kinds of regular expression pattern: basic and extended.
+These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
+PCRE2_CONVERT_POSIX_EXTENDED, respectively.
+.P
+In POSIX patterns, backslash is not special in a character class. Unmatched
+closing parentheses are treated as literals.
+.P
+In basic patterns, ? + | {} and () must be escaped to be recognized
+as metacharacters outside a character class. If the first character in the
+pattern is * it is treated as a literal. ^ is a metacharacter only at the start
+of a branch.
+.P
+In extended patterns, a backslash not in a character class always
+makes the next character literal, whatever it is. There are no backreferences.
+.P
+Note: POSIX mandates that the longest possible match at the first matching
+position must be found. This is not what \fBpcre2_match()\fP does; it yields
+the first match that is found. An application can use \fBpcre2_dfa_match()\fP
+to find the longest match, but that does not support backreferences (but then
+neither do POSIX extended patterns).
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 12 July 2017
+Copyright (c) 1997-2017 University of Cambridge.
+.fi