summaryrefslogtreecommitdiff
path: root/system/doc
diff options
context:
space:
mode:
authorBjörn Gustavsson <bjorn@erlang.org>2023-03-16 12:34:26 +0100
committerGitHub <noreply@github.com>2023-03-16 12:34:26 +0100
commit0f1216309945ac39d155577615f6c732bd6937a6 (patch)
treef5d16053a32151cc555aa9589d67565c643d1830 /system/doc
parent70183699d46a5adf858478bd33ddd04ccf03deed (diff)
parent2884ebed1e513c97d84d7f715a68c84f113a29ef (diff)
downloaderlang-0f1216309945ac39d155577615f6c732bd6937a6.tar.gz
Merge pull request #7017 from bjorng/bjorn/upcoming-re-incompatibility/OTP-18511
Add note about a new regular expression engine in OTP 27
Diffstat (limited to 'system/doc')
-rw-r--r--system/doc/general_info/upcoming_incompatibilities.xml51
1 files changed, 51 insertions, 0 deletions
diff --git a/system/doc/general_info/upcoming_incompatibilities.xml b/system/doc/general_info/upcoming_incompatibilities.xml
index 2d3b6c1c56..b9108e94b7 100644
--- a/system/doc/general_info/upcoming_incompatibilities.xml
+++ b/system/doc/general_info/upcoming_incompatibilities.xml
@@ -68,6 +68,57 @@
warnings about all occurrences of <c>maybe</c> without quotes.
</p>
</section>
+
+ <section>
+ <marker id="new_re_engine"/>
+ <title>The re module will use a different regular expression engine</title>
+
+ <p>The functionality of module <seeerl
+ marker="stdlib:re"><c>re</c></seeerl> is currently provided by
+ the PCRE library, which is no longer actively
+ maintained. Therefore, in OTP 27, we will switch to a different
+ regular expression library.</p>
+
+ <p>The source code for PCRE used by the <c>re</c> module has
+ been modified by the OTP team to ensure that a regular
+ expression match would yield when matching huge input binaries
+ and/or when using demanding (back-tracking) regular
+ expressions. Because of the those modifications, moving to a new
+ version of PCRE has always been a time-consuming process because
+ all of the modifications had to be applied by hand again to the
+ updated PCRE source code.</p>
+
+ <p>Most likely, the new regular expression library will be <url
+ href="https://github.com/google/re2">RE2</url>. RE2 guarantees
+ that the match time is linear in the length of input string, and
+ it also eschews recursion to avoid stack overflow. That should
+ make it possible to use RE2 without modifying its source
+ code. For more information about why RE2 is a good choice, see
+ <url
+ href="https://github.com/google/re2/wiki/WhyRE2">WhyRE2</url>.</p>
+
+ <p>Some of implications of this change are:</p>
+
+ <list>
+ <item><p>We expect that the functions in the <c>re</c> module
+ will continue to be supported, although some of the options are likely
+ to be dis-continued.</p></item>
+
+ <item><p>It is likely that only pattern matching of UTF8-encoded binaries will be
+ supported (not Latin1-encoded binaries).</p></item>
+
+ <item><p>In order to guarantee the linear-time performance,
+ RE2 does not support all the constructs in regular expression
+ patterns that PCRE do. For example, backreferences and look-around
+ assertions are not supported. See <url
+ href="https://github.com/google/re2/wiki/Syntax">Syntax</url>
+ for a description of what RE2 supports.</p></item>
+
+ <item><p>Compiling a regular expression is likely to be
+ slower, and thus more can be gained by explicitly compiling
+ the regular expression before matching with it.</p></item>
+ </list>
+ </section>
</section>
<section>