summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorShmuel Zeigerman <solomuz0@gmail.com>2017-11-07 18:51:08 +0200
committerShmuel Zeigerman <solomuz0@gmail.com>2017-11-07 18:51:08 +0200
commit2eec2a42538a5efcaf895e42cf2e63ba0ea7c144 (patch)
treea91ad2031f971a4a65aefb42b5dcc7da089b5a8e
parentbf444a65399ba8c26f00d540454a5d29d9a68ad1 (diff)
downloadlrexlib-2eec2a42538a5efcaf895e42cf2e63ba0ea7c144.tar.gz
Minor changes in code, PCRE2 additions to the manual.
-rw-r--r--LICENSE4
-rw-r--r--README.rst7
-rw-r--r--doc/manual.txt206
-rw-r--r--src/pcre2/lpcre2.c5
4 files changed, 180 insertions, 42 deletions
diff --git a/LICENSE b/LICENSE
index 39332d2..ac0dd4a 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,8 +1,8 @@
License of Lrexlib release
--------------------------
-Copyright (C) Reuben Thomas 2000-2015
-Copyright (C) Shmuel Zeigerman 2004-2015
+Copyright (C) Reuben Thomas 2000-2017
+Copyright (C) Shmuel Zeigerman 2004-2017
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated
diff --git a/README.rst b/README.rst
index 3b3767f..2b58ff7 100644
--- a/README.rst
+++ b/README.rst
@@ -5,16 +5,17 @@ Lrexlib
| and Shmuel Zeigerman (shmuz@013net.net)
**Lrexlib** provides bindings of five regular expression library APIs
-(POSIX_, PCRE_, GNU_, TRE_ and Oniguruma_) to Lua_ >= 5.1.
+(POSIX_, PCRE_, PCRE2_, GNU_, TRE_ and Oniguruma_) to Lua_ >= 5.1.
The bindings for TRE and Oniguruma are not currently complete.
-**Lrexlib** is copyright Reuben Thomas 2000-2015 and copyright Shmuel
-Zeigerman 2004-2015, and is released under the same license as Lua,
+**Lrexlib** is copyright Reuben Thomas 2000-2017 and copyright Shmuel
+Zeigerman 2004-2017, and is released under the same license as Lua,
the MIT_ license (otherwise known as the revised BSD license). There
is no warranty.
.. _POSIX: http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
.. _PCRE: http://www.pcre.org/pcre.txt
+.. _PCRE2: http://www.pcre.org/pcre2.txt
.. _GNU: ftp://ftp.gnu.org/old-gnu/regex/
.. _Oniguruma: https://github.com/kkos/oniguruma
.. _TRE: http://laurikari.net/tre/documentation/
diff --git a/doc/manual.txt b/doc/manual.txt
index 38f81f7..545372c 100644
--- a/doc/manual.txt
+++ b/doc/manual.txt
@@ -8,8 +8,8 @@ Lrexlib Reference Manual
------------------------------------------------------------
**Lrexlib** builds into shared libraries called by default *rex_posix.so*,
-*rex_pcre.so*, *rex_gnu.so*, *rex_tre.so* and *rex_onig.so*, which can be used with
-*require*.
+*rex_pcre.so*, *rex_pcre2.so*, *rex_gnu.so*, *rex_tre.so* and *rex_onig.so*,
+which can be used with *require*.
------------------------------------------------------------
@@ -28,8 +28,9 @@ Notes
MyFunc (arg1, arg2, [arg3], [arg4])
3. Throughout this document (unless it causes ambiguity), the identifier **rex**
- is used in place of either *rex_posix*, *rex_pcre*, *rex_gnu*, *rex_onig* or
- *rex_tre*, which are the default namespaces for the corresponding libraries.
+ is used in place of either *rex_posix*, *rex_pcre*, *rex_pcre2*, *rex_gnu*,
+ *rex_onig* or *rex_tre*, which are the default namespaces for the corresponding
+ libraries.
4. All functions that take a regular expression pattern as an argument will
generate an error if that pattern is found invalid by the regex library.
@@ -53,35 +54,34 @@ Notes
7. The default value for *compilation flags* (*cf*) that Lrexlib uses when
the parameter is not supplied or ``nil`` is:
- * REG_EXTENDED for POSIX and TRE
- * 0 for PCRE
- * ONIG_OPTION_NONE for Oniguruma
- * SYNTAX_POSIX_EXTENDED for GNU
+ * ``REG_EXTENDED`` for POSIX and TRE
+ * ``0`` for PCRE and PCRE2
+ * ``ONIG_OPTION_NONE`` for Oniguruma
+ * ``SYNTAX_POSIX_EXTENDED`` for GNU
- **PCRE**, **Oniguruma**: *cf* may also be supplied as a string, whose
- characters stand for compilation flags. Combinations of the following
+ **PCRE**, **PCRE2**, **Oniguruma**: *cf* may also be supplied as a string,
+ whose characters stand for compilation flags. Combinations of the following
characters (case sensitive) are supported:
- =============== ================== ==============================
- **Character** **PCRE flag** **Oniguruma flag**
- =============== ================== ==============================
- **i** PCRE_CASELESS ONIG_OPTION_IGNORECASE
- **m** PCRE_MULTILINE ONIG_OPTION_NEGATE_SINGLELINE
- **s** PCRE_DOTALL ONIG_OPTION_MULTILINE
- **x** PCRE_EXTENDED ONIG_OPTION_EXTEND
- **U** PCRE_UNGREEDY n/a
- **X** PCRE_EXTRA n/a
- =============== ================== ==============================
+ =============== ================== ================== ==============================
+ **Character** **PCRE flag** **PCRE2 flag** **Oniguruma flag**
+ =============== ================== ================== ==============================
+ **i** PCRE_CASELESS PCRE2_CASELESS ONIG_OPTION_IGNORECASE
+ **m** PCRE_MULTILINE PCRE2_MULTILINE ONIG_OPTION_NEGATE_SINGLELINE
+ **s** PCRE_DOTALL PCRE2_DOTALL ONIG_OPTION_MULTILINE
+ **x** PCRE_EXTENDED PCRE2_EXTENDED ONIG_OPTION_EXTEND
+ **U** PCRE_UNGREEDY PCRE2_UNGREEDY n/a
+ **X** PCRE_EXTRA n/a n/a
+ =============== ================== ================== ==============================
.. _ef:
8. The default value for *execution flags* (*ef*) that Lrexlib uses when
the parameter is not supplied or ``nil``, is:
- * 0 for standard POSIX regex library
- * REG_STARTEND for those POSIX regex libraries that support it,
- e.g. Spencer's.
- * 0 for PCRE, Oniguruma and TRE
+ * ``0`` for standard POSIX regex library
+ * ``REG_STARTEND`` for those POSIX regex libraries that support it, e.g. Spencer's
+ * ``0`` for PCRE, PCRE2, Oniguruma and TRE
.. _larg:
@@ -432,8 +432,8 @@ The keys in the `tb` table are formed from the names of the corresponding
constants in the used library. They are formed as follows:
* **POSIX**, **TRE**: prefix REG\_ is omitted, e.g. REG_ICASE becomes ``"ICASE"``.
-* **PCRE:** prefix PCRE\_ is omitted, e.g. PCRE_CASELESS becomes
- ``"CASELESS"``.
+* **PCRE:** prefix PCRE\_ is omitted, e.g. PCRE_CASELESS becomes ``"CASELESS"``.
+* **PCRE2:** prefix PCRE2\_ is omitted, e.g. PCRE2_CASELESS becomes ``"CASELESS"``.
* **Oniguruma:** names of constants are converted to strings with no alteration,
but for ONIG_OPTION_xxx constants, alias strings are created additionally,
e.g., the value of ONIG_OPTION_IGNORECASE constant becomes accessible via
@@ -504,9 +504,9 @@ string *subj*, starting from offset *init*, subject to execution flags *ef*.
result, in a table. This table contains ``false`` in the positions where the
corresponding sub-pattern did not participate in the match.
- 1. **PCRE**, **Oniguruma**: if *named subpatterns* are used then the table
- also contains substring matches keyed by their correspondent subpattern
- names (strings).
+ 1. **PCRE**, **PCRE2**, **Oniguruma**: if *named subpatterns* are used then
+ the table also contains substring matches keyed by their correspondent
+ subpattern names (strings).
**Returns on failure:**
1. ``nil``
@@ -542,9 +542,9 @@ string *subj*, starting from offset *init*, subject to execution flags *ef*.
positions where the corresponding sub-pattern did not participate in the
match.
- 1. **PCRE**, **Oniguruma**: if *named subpatterns* are used then the table
- also contains substring matches keyed by their correspondent subpattern
- names (strings).
+ 1. **PCRE**, **PCRE2**, **Oniguruma**: if *named subpatterns* are used then
+ the table also contains substring matches keyed by their correspondent
+ subpattern names (strings).
**Returns on failure:**
1. ``nil``
@@ -565,10 +565,12 @@ new
:funcdef:`rex.new (patt, [cf], [lo])`
The locale (*lo*) can be either a string (e.g., "French_France.1252"), or a
-userdata obtained from a call to maketables_. The default value, used when the
-parameter is not supplied or ``nil``, is the built-in PCRE set of character
+userdata obtained from a call to maketables__. The default value, used when
+the parameter is not supplied or ``nil``, is the built-in PCRE set of character
tables.
+__ maketables_pcre_
+
------------------------------------------------------------
fullinfo
@@ -584,6 +586,8 @@ The keys are strings formed in the following way:
------------------------------------------------------------
+.. _dfa_exec_pcre:
+
dfa_exec
--------
@@ -627,6 +631,8 @@ string *subj*, using a DFA matching algorithm.
------------------------------------------------------------
+.. _maketables_pcre:
+
maketables
----------
@@ -668,6 +674,134 @@ and its release date.
------------------------------------------------------------
+PCRE2-only functions and methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+new
+---
+
+:funcdef:`rex.new (patt, [cf], [lo])`
+
+The locale (*lo*) can be either a string (e.g., "French_France.1252"), or a
+userdata obtained from a call to maketables__. The default value, used when
+the parameter is not supplied or ``nil``, is the built-in PCRE2 set of character
+tables.
+
+__ maketables_pcre2_
+
+------------------------------------------------------------
+
+patterninfo
+-----------
+
+[See *pcre2_patterninfo* in the PCRE2 docs.]
+
+:funcdef:`r:patterninfo ()`
+
+This function returns a table containing information about the compiled pattern.
+The keys are strings formed in the following way:
+``PCRE2_INFO_CAPTURECOUNT`` -> ``"CAPTURECOUNT"``. The values are numbers.
+
+------------------------------------------------------------
+
+dfa_exec
+--------
+
+[See *pcre2_dfa_exec* in the PCRE2 docs.]
+
+:funcdef:`r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount])`
+
+The method matches a compiled regular expression *r* against a given subject
+string *subj*, using a DFA matching algorithm.
+
+ +----------+-------------------------------------+--------+-------------+
+ |Parameter | Description | Type |Default Value|
+ +==========+=====================================+========+=============+
+ | r |regex object produced by new |userdata| n/a |
+ +----------+-------------------------------------+--------+-------------+
+ | subj |subject | string | n/a |
+ +----------+-------------------------------------+--------+-------------+
+ | [init] |start offset in the subject | number | 1 |
+ | |(can be negative) | | |
+ +----------+-------------------------------------+--------+-------------+
+ | [ef] |execution flags (bitwise OR) | number | ef_ |
+ +----------+-------------------------------------+--------+-------------+
+ |[ovecsize]|size of the array for result offsets | number | 100 |
+ +----------+-------------------------------------+--------+-------------+
+ |[wscount] |number of elements in the working | number | 50 |
+ | |space array | | |
+ +----------+-------------------------------------+--------+-------------+
+
+**Returns on success (either full or partial match):**
+ 1. The start point of the matches found (a number).
+ 2. A table containing the end points of the matches found, the longer matches
+ first.
+ 3. The return value of the underlying *pcre_dfa_exec* call (a number).
+
+**Returns on failure (no match):**
+ 1. ``nil``
+
+**Example:**
+ If there are 3 matches found starting at offset 10 and ending at offsets 15, 20
+ and 25 then the function returns the following: 10, { 25,20,15 }, 3.
+
+------------------------------------------------------------
+
+jit_compile
+-----------
+
+[See *pcre2_jit_compile* in the PCRE2 docs.]
+
+:funcdef:`r:jit_compile ([options])`
+
+Parameter *options* is a number (a bitwise OR of separate options;
+it defaults to ``PCRE2_JIT_COMPLETE``).
+
+The method returns ``true`` on success or ``false`` + error message string on failure.
+
+------------------------------------------------------------
+
+.. _maketables_pcre2:
+
+maketables
+----------
+
+[See *pcre2_maketables* in the PCRE2 docs.]
+
+:funcdef:`rex_pcre2.maketables ()`
+
+Creates a set of character tables corresponding to the current locale and
+returns it as a userdata. The returned value can be passed to any Lrexlib
+function accepting the *locale* parameter.
+
+------------------------------------------------------------
+
+config
+------
+
+[See *pcre2_config* in the PCRE2 docs.]
+
+:funcdef:`rex_pcre2.config ([tb])`
+
+This function returns a table containing the values of the configuration
+parameters used at PCRE2 library build-time. Those parameters (numbers) are
+keyed by their names (strings). If the table argument *tb* is supplied then it
+is used as the output table, else a new table is created.
+
+------------------------------------------------------------
+
+version
+-------
+
+[See *pcre2_config(PCRE2_CONFIG_VERSION)* in the PCRE2 docs.]
+
+:funcdef:`rex_pcre2.version ()`
+
+This function returns a string containing the version of the used PCRE2 library
+and its release date.
+
+------------------------------------------------------------
+
GNU-only functions and methods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -919,10 +1053,12 @@ Incompatibilities with previous versions
**Incompatibilities between versions 2.1 and 2.0:**
- 1. match_, find_, tfind_, exec_, dfa_exec_: only one value (a ``nil``) is
+ 1. match_, find_, tfind_, exec_, dfa_exec__: only one value (a ``nil``) is
returned when the subject does not match the pattern. Any other failure
generates an error.
+__ dfa_exec_pcre_
+
**Incompatibilities between versions 2.0 and 1.19:**
1. Lua 5.1 is required
diff --git a/src/pcre2/lpcre2.c b/src/pcre2/lpcre2.c
index 9acff39..09ced9c 100644
--- a/src/pcre2/lpcre2.c
+++ b/src/pcre2/lpcre2.c
@@ -6,8 +6,8 @@
#include <locale.h>
#include <ctype.h>
#include <stdint.h>
-
#include <pcre2.h>
+
#include "lua.h"
#include "lauxlib.h"
#include "../common.h"
@@ -202,9 +202,10 @@ static int compile_regex (lua_State *L, const TArgComp *argC, TPcre2 **pud) {
TPcre2 *ud;
ud = (TPcre2*)lua_newuserdata (L, sizeof (TPcre2));
+ memset (ud, 0, sizeof (TPcre2)); /* initialize all members to 0 */
lua_pushvalue (L, ALG_ENVIRONINDEX);
lua_setmetatable (L, -2);
- memset (ud, 0, sizeof (TPcre2)); /* initialize all members to 0 */
+
ud->ccontext = pcre2_compile_context_create(NULL);
if (ud->ccontext == NULL)
return luaL_error (L, "malloc failed");