diff options
author | Björn Gustavsson <bjorn@erlang.org> | 2023-03-16 12:34:26 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-03-16 12:34:26 +0100 |
commit | 0f1216309945ac39d155577615f6c732bd6937a6 (patch) | |
tree | f5d16053a32151cc555aa9589d67565c643d1830 /system/doc | |
parent | 70183699d46a5adf858478bd33ddd04ccf03deed (diff) | |
parent | 2884ebed1e513c97d84d7f715a68c84f113a29ef (diff) | |
download | erlang-0f1216309945ac39d155577615f6c732bd6937a6.tar.gz |
Merge pull request #7017 from bjorng/bjorn/upcoming-re-incompatibility/OTP-18511
Add note about a new regular expression engine in OTP 27
Diffstat (limited to 'system/doc')
-rw-r--r-- | system/doc/general_info/upcoming_incompatibilities.xml | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/system/doc/general_info/upcoming_incompatibilities.xml b/system/doc/general_info/upcoming_incompatibilities.xml index 2d3b6c1c56..b9108e94b7 100644 --- a/system/doc/general_info/upcoming_incompatibilities.xml +++ b/system/doc/general_info/upcoming_incompatibilities.xml @@ -68,6 +68,57 @@ warnings about all occurrences of <c>maybe</c> without quotes. </p> </section> + + <section> + <marker id="new_re_engine"/> + <title>The re module will use a different regular expression engine</title> + + <p>The functionality of module <seeerl + marker="stdlib:re"><c>re</c></seeerl> is currently provided by + the PCRE library, which is no longer actively + maintained. Therefore, in OTP 27, we will switch to a different + regular expression library.</p> + + <p>The source code for PCRE used by the <c>re</c> module has + been modified by the OTP team to ensure that a regular + expression match would yield when matching huge input binaries + and/or when using demanding (back-tracking) regular + expressions. Because of the those modifications, moving to a new + version of PCRE has always been a time-consuming process because + all of the modifications had to be applied by hand again to the + updated PCRE source code.</p> + + <p>Most likely, the new regular expression library will be <url + href="https://github.com/google/re2">RE2</url>. RE2 guarantees + that the match time is linear in the length of input string, and + it also eschews recursion to avoid stack overflow. That should + make it possible to use RE2 without modifying its source + code. For more information about why RE2 is a good choice, see + <url + href="https://github.com/google/re2/wiki/WhyRE2">WhyRE2</url>.</p> + + <p>Some of implications of this change are:</p> + + <list> + <item><p>We expect that the functions in the <c>re</c> module + will continue to be supported, although some of the options are likely + to be dis-continued.</p></item> + + <item><p>It is likely that only pattern matching of UTF8-encoded binaries will be + supported (not Latin1-encoded binaries).</p></item> + + <item><p>In order to guarantee the linear-time performance, + RE2 does not support all the constructs in regular expression + patterns that PCRE do. For example, backreferences and look-around + assertions are not supported. See <url + href="https://github.com/google/re2/wiki/Syntax">Syntax</url> + for a description of what RE2 supports.</p></item> + + <item><p>Compiling a regular expression is likely to be + slower, and thus more can be gained by explicitly compiling + the regular expression before matching with it.</p></item> + </list> + </section> </section> <section> |