diff options
author | Yves Orton <demerphq@gmail.com> | 2022-02-11 06:30:45 +0100 |
---|---|---|
committer | Hugo van der Sanden <hv@crypt.org> | 2022-02-18 15:08:24 +0000 |
commit | c45b45416a61f2f56dbe348f33fb1bb07a1d5444 (patch) | |
tree | 3af45bebe39f1185787116521a5e2f1afcd813c6 /pod | |
parent | bddb8c791b4cd7f67db06213e35b0f2351c0fea8 (diff) | |
download | perl-c45b45416a61f2f56dbe348f33fb1bb07a1d5444.tar.gz |
regcomp.c,re.pm: Remove "offsets" debugging code
This code was added by Mark Jason Dominus to aid a regex debugger
he wrote for ActiveState. The basic premise is that every opcode
in a regex can be attributed back to a contiguous sequence of characters
that make up the pattern. This assumption has not been true ever since
the "jump" TRIE optimizations were added to the engine.
I spoke to MJD many years ago about whether it was ok to remove this
from the regex engine and he said he had no objections.
An example of a pattern that cannot be handled correctly by this logic is
/(?: a x+ | b y+ | c z+ )/x
where the
(?:a ... | b ... | c ...)
parts will now be handled by the TRIE logic and not by the BRANCH/EXACT
opcodes that it would have been in the past. The offset debug output
cannot handle this type of transformation, and produce nonsense output
that mention opcodes that have been optimized away from the final program.
The regex compiler is complicated enough without having to maintain this
logic. There are essentially no tests for it, and the few tests that do
cover it do so as a byproduct of testing other things. Despite the offsets
logic only being used in debug supporting it does have a cost to non-debug
logic as various internal routines include parameters related to it that
are otherwise unused.
Note this output is only usable or visible by enabling special flags
in re.pm, there is no formal API to access it short of parsing the
output of the debug mode of the regex engine, which has changed multiple
time over the past years.
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlreguts.pod | 19 |
1 files changed, 5 insertions, 14 deletions
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod index e58aa42535..2aae739d9b 100644 --- a/pod/perlreguts.pod +++ b/pod/perlreguts.pod @@ -828,13 +828,10 @@ regex engine. Since it is specific to perl it is only of curiosity value to other engine implementations. typedef struct regexp_internal { - union { - U32 *offsets; - U32 proglen; - } u; regnode *regstclass; struct reg_data *data; struct reg_code_blocks *code_blocks; + U32 proglen; int name_list_idx; regnode program[1]; } regexp_internal; @@ -843,16 +840,6 @@ Description of the attributes is as follows: =over 5 -=item C<offsets> - -Offsets holds a mapping of offset in the C<program> -to offset in the C<precomp> string. This is only used by ActiveState's -visual regex debugger. - -=item C<proglen> - -Stores the length of the compiled program in units of regops. - =item C<regstclass> Special regop that is used by C<re_intuit_start()> to check if a pattern @@ -905,6 +892,10 @@ pattern. It is made up of the following structures. struct reg_code_block *cb; /* array of reg_code_block's */ }; +=item C<proglen> + +Stores the length of the compiled program in units of regops. + =item C<name_list_idx> This is the index into the data array where an AV is stored that contains |