From e874d29e595fc2c60c30c14f3e7e9ab3ff0fe60a Mon Sep 17 00:00:00 2001 From: Shmuel Zeigerman Date: Tue, 7 Nov 2017 18:53:40 +0200 Subject: Changes toward a new release. --- manual.html | 439 +++++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 285 insertions(+), 154 deletions(-) (limited to 'manual.html') diff --git a/manual.html b/manual.html index 9b7da69..8acf6b2 100644 --- a/manual.html +++ b/manual.html @@ -3,7 +3,7 @@ - + Lrexlib Reference Manual @@ -14,60 +14,70 @@

Table of Contents


Lrexlib builds into shared libraries called by default rex_posix.so, -rex_pcre.so, rex_gnu.so, rex_tre.so and rex_onig.so, which can be used with -require.

+rex_pcre.so, rex_pcre2.so, rex_gnu.so, rex_tre.so and rex_onig.so, +which can be used with require.


-

Notes

+

Notes

  1. Most functions and methods in Lrexlib have mandatory and optional arguments. There are no dependencies between arguments in Lrexlib's functions and @@ -82,8 +92,9 @@ MyFunc (arg1, arg2, [arg3], [arg4])

  2. Throughout this document (unless it causes ambiguity), the identifier rex -is used in place of either rex_posix, rex_pcre, rex_gnu, rex_onig or -rex_tre, which are the default namespaces for the corresponding libraries.

    +is used in place of either rex_posix, rex_pcre, rex_pcre2, rex_gnu, +rex_onig or rex_tre, which are the default namespaces for the corresponding +libraries.

  3. All functions that take a regular expression pattern as an argument will generate an error if that pattern is found invalid by the regex library.

    @@ -108,73 +119,60 @@ a length that excludes the NUL.

    the parameter is not supplied or nil is:

      -
    • REG_EXTENDED for POSIX and TRE
    • -
    • 0 for PCRE
    • -
    • ONIG_OPTION_NONE for Oniguruma
    • -
    • SYNTAX_POSIX_EXTENDED for GNU
    • +
    • REG_EXTENDED for POSIX and TRE
    • +
    • 0 for PCRE and PCRE2
    • +
    • ONIG_OPTION_NONE for Oniguruma
    • +
    • SYNTAX_POSIX_EXTENDED for GNU
    -

    PCRE, Oniguruma: cf may also be supplied as a string, whose -characters stand for compilation flags. Combinations of the following +

    PCRE, PCRE2, Oniguruma: cf may also be supplied as a string, +whose characters stand for compilation flags. Combinations of the following characters (case sensitive) are supported:

    ---++++ - - - + + + + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + +

    Character

    -

    PCRE flag

    -

    Oniguruma flag

    -
    CharacterPCRE flagPCRE2 flagOniguruma flag

    i

    -

    PCRE_CASELESS

    -

    ONIG_OPTION_IGNORECASE

    -

    m

    -

    PCRE_MULTILINE

    -

    ONIG_OPTION_NEGATE_SINGLELINE

    -

    s

    -

    PCRE_DOTALL

    -

    ONIG_OPTION_MULTILINE

    -

    x

    -

    PCRE_EXTENDED

    -

    ONIG_OPTION_EXTEND

    -

    U

    -

    PCRE_UNGREEDY

    -

    n/a

    -
    iPCRE_CASELESSPCRE2_CASELESSONIG_OPTION_IGNORECASE
    mPCRE_MULTILINEPCRE2_MULTILINEONIG_OPTION_NEGATE_SINGLELINE
    sPCRE_DOTALLPCRE2_DOTALLONIG_OPTION_MULTILINE
    xPCRE_EXTENDEDPCRE2_EXTENDEDONIG_OPTION_EXTEND
    UPCRE_UNGREEDYPCRE2_UNGREEDYn/a

    X

    -

    PCRE_EXTRA

    -

    n/a

    -
    XPCRE_EXTRAn/an/a
    @@ -186,10 +184,9 @@ characters (case sensitive) are supported:

    the parameter is not supplied or nil, is:

      -
    • 0 for standard POSIX regex library
    • -
    • REG_STARTEND for those POSIX regex libraries that support it, -e.g. Spencer's.
    • -
    • 0 for PCRE, Oniguruma and TRE
    • +
    • 0 for standard POSIX regex library
    • +
    • REG_STARTEND for those POSIX regex libraries that support it, e.g. Spencer's
    • +
    • 0 for PCRE, PCRE2, Oniguruma and TRE
  4. @@ -204,9 +201,9 @@ is discarded, e.g. rex.count(&quo

-

Functions and methods common to all bindings

+

Functions and methods common to all bindings

-

match

+

match

rex.match (subj, patt, [init], [cf], [ef], [larg...])

or

r:match (subj, [init], [ef])

@@ -287,7 +284,7 @@ substring is returned.

-

find

+

find

rex.find (subj, patt, [init], [cf], [ef], [larg...])

or

r:find (subj, [init], [ef])

@@ -369,7 +366,7 @@ the match.

-

gmatch

+

gmatch

rex.gmatch (subj, patt, [cf], [ef], [larg...])

The function is intended for use in the generic for Lua construct. It returns an iterator for repeated matching of the pattern patt in @@ -427,7 +424,7 @@ till the subject fails to match.


-

gsub

+

gsub

rex.gsub (subj, patt, repl, [n], [cf], [ef], [larg...])

This function searches for all matches of the pattern patt in the string subj and replaces them according to the parameters repl and n (see details @@ -593,7 +590,7 @@ next match; n will not be called again;


-

split

+

split

rex.split (subj, sep, [cf], [ef], [larg...])

The function is intended for use in the generic for Lua construct. It is used for splitting a subject string subj into parts (sections). @@ -664,7 +661,7 @@ subject.


-

count

+

count

rex.count (subj, patt, [cf], [ef], [larg...])

This function counts matches of the pattern patt in the string subj.

@@ -721,7 +718,7 @@ subject.


-

flags

+

flags

rex.flags ([tb])

This function returns a table containing the numeric values of the constants defined by the used regex library, with the keys being the (string) names of the @@ -768,8 +765,8 @@ The keys in the tb table are formed from the names of the correspon constants in the used library. They are formed as follows:

  • POSIX, TRE: prefix REG_ is omitted, e.g. REG_ICASE becomes "ICASE".
  • -
  • PCRE: prefix PCRE_ is omitted, e.g. PCRE_CASELESS becomes -"CASELESS".
  • +
  • PCRE: prefix PCRE_ is omitted, e.g. PCRE_CASELESS becomes "CASELESS".
  • +
  • PCRE2: prefix PCRE2_ is omitted, e.g. PCRE2_CASELESS becomes "CASELESS".
  • Oniguruma: names of constants are converted to strings with no alteration, but for ONIG_OPTION_xxx constants, alias strings are created additionally, e.g., the value of ONIG_OPTION_IGNORECASE constant becomes accessible via @@ -786,7 +783,7 @@ RE_SYNTAX_GREP becomes SYNTAX_GREP in Lua.

  • -

    new

    +

    new

    rex.new (patt, [cf], [larg...])

    The function compiles regular expression patt into a regular expression object whose internal representation is corresponding to the library used. The returned @@ -838,7 +835,7 @@ any.


    -

    tfind

    +

    tfind

    r:tfind (subj, [init], [ef])

    The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.

    @@ -890,9 +887,9 @@ string subj, starting from offset init, subject to execution f
  • Substring matches ("captures" in Lua terminology) are returned as a third result, in a table. This table contains false in the positions where the corresponding sub-pattern did not participate in the match.
      -
    1. PCRE, Oniguruma: if named subpatterns are used then the table -also contains substring matches keyed by their correspondent subpattern -names (strings).
    2. +
    3. PCRE, PCRE2, Oniguruma: if named subpatterns are used then +the table also contains substring matches keyed by their correspondent +subpattern names (strings).
  • @@ -906,7 +903,7 @@ names (strings).

    -

    exec

    +

    exec

    r:exec (subj, [init], [ef])

    The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.

    @@ -959,9 +956,9 @@ string subj, starting from offset init, subject to execution f returned as a third result, in a table. This table contains false in the positions where the corresponding sub-pattern did not participate in the match.
      -
    1. PCRE, Oniguruma: if named subpatterns are used then the table -also contains substring matches keyed by their correspondent subpattern -names (strings).
    2. +
    3. PCRE, PCRE2, Oniguruma: if named subpatterns are used then +the table also contains substring matches keyed by their correspondent +subpattern names (strings).
    @@ -980,27 +977,27 @@ names (strings).

    -

    PCRE-only functions and methods

    +

    PCRE-only functions and methods

    -

    new

    +

    new

    rex.new (patt, [cf], [lo])

    The locale (lo) can be either a string (e.g., "French_France.1252"), or a -userdata obtained from a call to maketables. The default value, used when the -parameter is not supplied or nil, is the built-in PCRE set of character +userdata obtained from a call to maketables. The default value, used when +the parameter is not supplied or nil, is the built-in PCRE set of character tables.


    -

    fullinfo

    +

    fullinfo

    [See pcre_fullinfo in the PCRE docs.]

    r:fullinfo ()

    This function returns a table containing information about the compiled pattern. The keys are strings formed in the following way: PCRE_INFO_CAPTURECOUNT -> "CAPTURECOUNT". The values are numbers.

    -

    +
    -

    dfa_exec

    +

    dfa_exec

    [PCRE 6.0 and later. See pcre_dfa_exec in the PCRE docs.]

    r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount])

    The method matches a compiled regular expression r against a given subject @@ -1074,10 +1071,10 @@ first.

    If there are 3 matches found starting at offset 10 and ending at offsets 15, 20 and 25 then the function returns the following: 10, { 25,20,15 }, 3.
    -

    +
-

maketables

+

maketables

[See pcre_maketables in the PCRE docs.]

rex_pcre.maketables ()

Creates a set of character tables corresponding to the current locale and @@ -1086,7 +1083,7 @@ function accepting the locale parameter.


-

config

+

config

[PCRE 4.0 and later. See pcre_config in the PCRE docs.]

rex_pcre.config ([tb])

This function returns a table containing the values of the configuration @@ -1095,8 +1092,8 @@ keyed by their names (strings). If the table argument tb is supplied th is used as the output table, else a new table is created.


-
-

version

+
+

version

[See pcre_version in the PCRE docs.]

rex_pcre.version ()

This function returns a string containing the version of the used PCRE library @@ -1104,10 +1101,144 @@ and its release date.


+
+

PCRE2-only functions and methods

+
+

new

+

rex.new (patt, [cf], [lo])

+

The locale (lo) can be either a string (e.g., "French_France.1252"), or a +userdata obtained from a call to maketables. The default value, used when +the parameter is not supplied or nil, is the built-in PCRE2 set of character +tables.

+
+
+
+

patterninfo

+

[See pcre2_patterninfo in the PCRE2 docs.]

+

r:patterninfo ()

+

This function returns a table containing information about the compiled pattern. +The keys are strings formed in the following way: +PCRE2_INFO_CAPTURECOUNT -> "CAPTURECOUNT". The values are numbers.

+
+
+
+

dfa_exec

+

[See pcre2_dfa_exec in the PCRE2 docs.]

+

r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount])

+

The method matches a compiled regular expression r against a given subject +string subj, using a DFA matching algorithm.

+
+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterDescriptionTypeDefault Value
rregex object produced by newuserdatan/a
subjsubjectstringn/a
[init]start offset in the subject +(can be negative)number1
[ef]execution flags (bitwise OR)numberef
[ovecsize]size of the array for result offsetsnumber100
[wscount]number of elements in the working +space arraynumber50
+
+
+
Returns on success (either full or partial match):
+
    +
  1. The start point of the matches found (a number).
  2. +
  3. A table containing the end points of the matches found, the longer matches +first.
  4. +
  5. The return value of the underlying pcre_dfa_exec call (a number).
  6. +
+
+
Returns on failure (no match):
+
    +
  1. nil
  2. +
+
+
Example:
+
If there are 3 matches found starting at offset 10 and ending at offsets 15, 20 +and 25 then the function returns the following: 10, { 25,20,15 }, 3.
+
+
+
+
+

jit_compile

+

[See pcre2_jit_compile in the PCRE2 docs.]

+

r:jit_compile ([options])

+

Parameter options is a number (a bitwise OR of separate options; +it defaults to PCRE2_JIT_COMPLETE).

+

The method returns true on success or false + error message string on failure.

+
+
+
+

maketables

+

[See pcre2_maketables in the PCRE2 docs.]

+

rex_pcre2.maketables ()

+

Creates a set of character tables corresponding to the current locale and +returns it as a userdata. The returned value can be passed to any Lrexlib +function accepting the locale parameter.

+
+
+
+

config

+

[See pcre2_config in the PCRE2 docs.]

+

rex_pcre2.config ([tb])

+

This function returns a table containing the values of the configuration +parameters used at PCRE2 library build-time. Those parameters (numbers) are +keyed by their names (strings). If the table argument tb is supplied then it +is used as the output table, else a new table is created.

+
+
+
+

version

+

[See pcre2_config(PCRE2_CONFIG_VERSION) in the PCRE2 docs.]

+

rex_pcre2.version ()

+

This function returns a string containing the version of the used PCRE2 library +and its release date.

+
+
+
-

GNU-only functions and methods

-
-

new

+

GNU-only functions and methods

+
+

new

rex.new (patt, [cf], [tr])

If the compilation flags (cf) are not supplied or nil, the default syntax is SYNTAX_POSIX_EXTENDED. Note that this is not the same as passing a value @@ -1119,9 +1250,9 @@ translated when it is being matched.

-

Oniguruma-only functions and methods

-
-

new

+

Oniguruma-only functions and methods

+
+

new

rex.new (patt, [cf], [enc], [syn])

The encoding parameter (enc) must be one of the predefined strings that are formed from the ONIG_ENCODING_xxx identifiers defined in oniguruma.h, by means @@ -1140,7 +1271,7 @@ last setdefaultsyntax "syntax" string set, an error is raised.

-

setdefaultsyntax

+

setdefaultsyntax

rex_onig.setdefaultsyntax (syntax)

This function sets the default syntax for the Oniguruma library, according to the value of the string syntax. The specified syntax will be further used for @@ -1156,8 +1287,8 @@ argument is passed to those functions explicitly.


-
-

version

+
+

version

[See onig_version in the Oniguruma docs.]

rex_onig.version ()

This function returns a string containing the version of the used Oniguruma @@ -1165,7 +1296,7 @@ library.


-

capturecount

+

capturecount

[See onig_number_of_captures in the Oniguruma docs.]

r:capturecount ()

Returns the number of captures in the pattern.

@@ -1173,13 +1304,13 @@ library.


-

TRE-only functions and methods

-
-

new

+

TRE-only functions and methods

+
+

new

rex.new (patt, [cf])

-

atfind

+

atfind

r:atfind (subj, params, [init], [ef])

The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.

@@ -1260,7 +1391,7 @@ in the following fields: cost,
-

aexec

+

aexec

r:aexec (subj, params, [init], [ef])

The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.

@@ -1342,21 +1473,21 @@ the match, in the following fields: cost,
-

have_approx

+

have_approx

r:have_approx ()

The method returns true if the compiled pattern uses approximate matching, and false if not.


-

have_backrefs

+

have_backrefs

r:have_backrefs ()

The method returns true if the compiled pattern has back references, and false if not.


-
-

config

+
+

config

[See tre_config in the TRE docs.]

rex_tre.config ([tb])

This function returns a table containing the values of the configuration @@ -1366,7 +1497,7 @@ is used as the output table, else a new table is created.


-

rex_tre.version

+

rex_tre.version

[See tre_version in the TRE docs.]

rex_tre.version ()

This function returns a string containing the version of the used TRE library.

@@ -1374,7 +1505,7 @@ is used as the output table, else a new table is created.


-

Incompatibilities with previous versions

+

Incompatibilities with previous versions

Incompatibilities between versions 2.8 and 2.7:

    @@ -1403,7 +1534,7 @@ position.

    Incompatibilities between versions 2.1 and 2.0:

      -
    1. match, find, tfind, exec, dfa_exec: only one value (a nil) is +
    2. match, find, tfind, exec, dfa_exec: only one value (a nil) is returned when the subject does not match the pattern. Any other failure generates an error.
    @@ -1427,7 +1558,7 @@ subpatterns
-- cgit v1.2.1