summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2022-02-11 06:30:45 +0100
committerHugo van der Sanden <hv@crypt.org>2022-02-18 15:08:24 +0000
commitc45b45416a61f2f56dbe348f33fb1bb07a1d5444 (patch)
tree3af45bebe39f1185787116521a5e2f1afcd813c6 /pod
parentbddb8c791b4cd7f67db06213e35b0f2351c0fea8 (diff)
downloadperl-c45b45416a61f2f56dbe348f33fb1bb07a1d5444.tar.gz
regcomp.c,re.pm: Remove "offsets" debugging code
This code was added by Mark Jason Dominus to aid a regex debugger he wrote for ActiveState. The basic premise is that every opcode in a regex can be attributed back to a contiguous sequence of characters that make up the pattern. This assumption has not been true ever since the "jump" TRIE optimizations were added to the engine. I spoke to MJD many years ago about whether it was ok to remove this from the regex engine and he said he had no objections. An example of a pattern that cannot be handled correctly by this logic is /(?: a x+ | b y+ | c z+ )/x where the (?:a ... | b ... | c ...) parts will now be handled by the TRIE logic and not by the BRANCH/EXACT opcodes that it would have been in the past. The offset debug output cannot handle this type of transformation, and produce nonsense output that mention opcodes that have been optimized away from the final program. The regex compiler is complicated enough without having to maintain this logic. There are essentially no tests for it, and the few tests that do cover it do so as a byproduct of testing other things. Despite the offsets logic only being used in debug supporting it does have a cost to non-debug logic as various internal routines include parameters related to it that are otherwise unused. Note this output is only usable or visible by enabling special flags in re.pm, there is no formal API to access it short of parsing the output of the debug mode of the regex engine, which has changed multiple time over the past years.
Diffstat (limited to 'pod')
-rw-r--r--pod/perlreguts.pod19
1 files changed, 5 insertions, 14 deletions
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index e58aa42535..2aae739d9b 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -828,13 +828,10 @@ regex engine. Since it is specific to perl it is only of curiosity
value to other engine implementations.
typedef struct regexp_internal {
- union {
- U32 *offsets;
- U32 proglen;
- } u;
regnode *regstclass;
struct reg_data *data;
struct reg_code_blocks *code_blocks;
+ U32 proglen;
int name_list_idx;
regnode program[1];
} regexp_internal;
@@ -843,16 +840,6 @@ Description of the attributes is as follows:
=over 5
-=item C<offsets>
-
-Offsets holds a mapping of offset in the C<program>
-to offset in the C<precomp> string. This is only used by ActiveState's
-visual regex debugger.
-
-=item C<proglen>
-
-Stores the length of the compiled program in units of regops.
-
=item C<regstclass>
Special regop that is used by C<re_intuit_start()> to check if a pattern
@@ -905,6 +892,10 @@ pattern. It is made up of the following structures.
struct reg_code_block *cb; /* array of reg_code_block's */
};
+=item C<proglen>
+
+Stores the length of the compiled program in units of regops.
+
=item C<name_list_idx>
This is the index into the data array where an AV is stored that contains