| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130650 heap-use-after-free in S_free_codeblocks
When compiling qr/(?{...})/, a reg_code_blocks structure is allocated
and various SVs are attached to it. Initially this is set to be freed
via a destructor on the savestack, in case of early dying. Later the
structure is attached to the compiling regex, and a boolean flag in the
structure, 'attached', is set to true to show that the destructor no
longer needs to free the struct.
However, it is possible to get three orders of destruction:
1) allocate, push destructor, die early
2) allocate, push destructor, attach to regex, die
2) allocate, push destructor, attach to regex, succeed
In 2, the regex is freed (via the savestack) before the destructor is
called. In 3, the destructor is called, then later the regex is freed.
It turns out perl can't currently handle case 2:
qr'(?{})\6'
Fix this by turning the 'attached' boolean field into an integer refcount,
then keep a count of whether the struct is referenced from the savestack
and/or the regex. Since it normally has a value of 1 or 2, it's similar
to a boolean flag, but crucially it no longer just indicates that the
regex has a pointer to it ('attached'), but that at least one of the
savestack and regex have a pointer to it. So order of freeing no longer
matters.
I also updated S_free_codeblocks() so that it nulls out SV pointers in
the reg_code_blocks struct before freeing them. This is is generally good
practice to avoid double frees, although is probably not needed at the
moment.
|
|
|
|
| |
Except under cpan/ and dist/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #129140] attempting double-free
Thus fixes some leaks and double frees in regexes which contain code
blocks.
During compilation, an array of struct reg_code_block's is malloced.
Initially this is just attached to the RExC_state_t struct local var in
Perl_re_op_compile(). Later it may be attached to a pattern. The difficulty
is ensuring that the array is free()d (and the ref counts contained within
decremented) should compilation croak early, while avoiding double frees
once the array has been attached to a regex.
The current mechanism of making the array the PVX of an SV is a bit flaky,
as the array can be realloced(), and code can be re-entered when utf8 is
detected mid-compilation.
This commit changes the array into separately malloced head and body.
The body contains the actual array, and can be realloced. The head
contains a pointer to the array, plus size and an 'attached' boolean.
This indicates whether the struct has been attached to a regex, and is
effectively a 1-bit ref count.
Whenever a head is allocated, SAVEDESTRUCTOR_X() is used to call
S_free_codeblocks() to free the head and body on scope exit. This function
skips the freeing if 'attached' is true, and this flag is set only at the
point where the head gets attached to the regex.
In one way this complicates the code, since the num_code_blocks field is now
not always available (it's only there is a head has been allocated), but
mainly its simplifies, since all the book-keeping is now done in the two
new static functions S_alloc_code_blocks() and S_free_codeblocks()
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most ops that execute a regex, such as match and subst, are of type PMOP.
A PMOP allows the actual regex to be attached directly to that op, due
to its extra fields.
OP_SPLIT is different; it is just a plain LISTOP, but it always has an
OP_PUSHRE as its first child, which *is* a PMOP and which has the regex
attached.
At runtime, pp_pushre()'s only job is to push itself (i.e. the current
PL_op) onto the stack. Later pp_split() pops this to get access to the
regex it wants to execute.
This is a bit unpleasant, because we're pushing an OP* onto the stack,
which is supposed to be an array of SV*'s. As a bit of a hack, on
DEBUGGING builds we push a PVLV with the PL_op address embedded instead,
but this still isn't very satisfactory.
Now that regexes are first-class SVs, we could push a REGEXP onto the
stack rather than PL_op. However, there is an optimisation of @array =
split which eliminates the assign and embeds the array's GV/padix directly
in the PUSHRE op. So split still needs access to that op. But the pushre
op will always be splitop->op_first anyway, so one possibility is to just
skip executing the pushre altogether, and make pp_split just directly
access op_first instead to get the regex and @array info.
But if we're doing that, then why not just go the full hog and make
OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the
data that was spread across the two ops now combined into just the one
split op.
That is exactly what this commit does.
For a simple compile-time pattern like split(/foo/, $s, 1), the optree
looks like:
before:
<@> split[t2] lK
</> pushre(/"foo"/) s/RTIME
<0> padsv[$s:1,2] s
<$> const(IV 1) s
after:
</> split(/"foo"/)[t2] lK/RTIME
<0> padsv[$s:1,2] s
<$> const[IV 1] s
while for a run-time expression like split(/$pat/, $s, 1),
before:
<@> split[t3] lK
</> pushre() sK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const(IV 1)s
after:
</> split()[t3] lK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const[IV 1] s
This makes the code faster and simpler.
At the same time, two new private flags have been added for OP_SPLIT -
OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the
assign op has been optimised away, and if so, whether the array is
lexical.
Also, deparsing of split has been improved, to the extent that
perl TEST -deparse op/split.t
now passes.
Also, a couple of panic messages in pp_split() have been replaced with
asserts().
|
|
|
|
| |
This has been deprecated since v5.22
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 24be310237a0f8f19cfdb71de1b068b4ce9572a0 I reworked how
we stored the close_paren info in the regexp match state
structure. Unfortunately I missed a subtle aspect of the
logic which meant that in certain cases we were relying
on close_paren being true to avoid comparing it against
a false ARG value for things like CURLYX, which meant that
sometimes we would exit an stack frame prematurely. This
patch fixes that logic and makes it more clear (via macros)
what is going on.
|
|
|
|
|
|
|
|
| |
In most cases the curlyx member is the first thing after the yes state
member, but eval was reversed. While debugging perl #127705 I switched
them to see what would happen, which changed the bug, and ultimately
revealed the cause of the problem. So I am going to leave them in the
"consistent" order.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way we tracked if pattern recursion was infinite did not work
properly. A pattern like
"a"=~/(.(?2))((?<=(?=(?1)).))/
would loop forever, slowly eat up all available ram as it added
pattern recursion stack frames.
This patch changes the rules for recursion so that recursively
entering a given pattern "subroutine" twice from the same position
fails the match. This means that where previously we might have
seen fatal exception we will now simply fail. This means that
"aaabbb"=~/a(?R)?b/
succeeds with $& equal to "aaabbb".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GOSTART is a special case of GOSUB, we can remove a lot of offset twiddling,
and other special casing by unifying them, at pretty much no cost.
GOSUB has 2 arguments, ARG() and ARG2L(), which are interpreted as
a U32 and an I32 respectively. ARG() holds the "parno" we will recurse
into. ARG2L() holds a signed offset to the relevant start node for the
recursion.
Prior to this patch the argument to GOSUB would always be >=, and unlike
other parts of our logic we would not use 0 to represent "start/end" of
pattern, as GOSTART would be used for "recurse to beginning of pattern",
after this patch we use 0 to represent "start/end", and a lot of
complexity "goes away" along with GOSTART regops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we do a GOSUB/GOSTART we need to know where the recursion
"returns" to the previous position in the pattern. Currently
this is done by comparing cur_eval->u.eval.close_paren to the
the argument of CLOSE parens, and within END blocks (for GOSTART).
However, there is a problem. The state machinery for GOSUB/GOSTART
is shared with EVAL ( implementing /(??{ ... })/ ), which also sets
cur_eval->u.eval.close_paren to 0. This for instance breaks /(?(R)YES|NO)/
by making it return true within a /(??{ ... })/ when arguably it
shouldn't. It also is confusing.
So we change the meaning of close_paren so that 0 means "from EVAL",
and that otherwise the value is "+1", so 1 means GOSTART (trigger for
END), and 2 means "CLOSE1", etc. In order to make this transparent
this patch adds the EVAL_CLOSE_PAREN_IS( cur_eval, EXPR ) macro which
does the right thing:
( cur_eval && cur_eval->u.eval.close_paren &&
( ( cur_eval->u.eval.close_paren - 1 ) == EXPR ) )
|
|
|
|
| |
Removes 'the' in front of parameter names in some instances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These 2 macros were created for the Symbian port in commit
"Symbian port of Perl" to replace a direct "extern" token. I guess the
author was unaware of PERL_CALLCONV.
PERL_CALLCONV is the official macro to use. PERL_XS_EXPORT_C and
PERL_EXPORT_C have no usage on cpan grep except for modules with direct
copies of core headers. A defect of using PERL_EXPORT_C and
PERL_XS_EXPORT_C instead of PERL_CALLCONV is that win32/win32.h has no
knowledge of the 2 macros and doesn't set them, and os/os2ish.h doesn't
either. On Win32, since the unix defaults are used instead of Win32
specific "__declspec(dllimport)" token, XS modules use indirect function
stubs in each XS module placed by the CC to call into perl5**.dll instead
of directly calls the core C functions. I observed this in in XS-Typemap's
DLL. To simplify the API, and to decrease the amount of macros needing to
implemented to support each platform, just remove the 2 macros.
Since perl.h's fallback defaults for PERL_CALLCONV are very late in perl.h,
they need to be moved up before function declarations start in perlio.h
(perlio.h is included from iperlsys.h).
win32iop.h contains the "PerlIO" and SV" tokens, so perlio.h must be
included before win32iop.h is. Including perlio.h so early in win32.h,
causes PERL_CALLCONV not be defined since Win32 platform uses the
fallback in perl.h, since win32.h doesn't always define PERL_CALLCONV and
sometimes relies on the fallback. Since win32iop.h contains alot of
declarations, it belongs with other declarations such as those in proto.h
so move it from win32.h to perl.h.
the "free" token in struct regexp_engine conflicts with win32iop's
"#define free win32_free" so rename that member.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
| |
This is another step in the process
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag will prevent () from capturing and filling in $1, $2, etc...
Named captures will still work though, and if used will cause $1, $2, etc...
to be filled in *only* within named groups.
The motivation behind this is to allow the common construct of:
/(?:b|c)a(?:t|n)/
To be rewritten more cleanly as:
/(b|c)a(t|n)/n
When you want grouping but no memory penalty on captures.
You can also use ?n inside of a () directly to avoid capturing, and
?-n inside of a () to negate its effects if you want to capture.
|
| |
|
|
|
|
| |
It is no longer needed as of 1067df30ae9.
|
|
|
|
|
|
|
|
| |
We get an integer overflow message when we left shift a 1 into the
highest bit of a word. This changes the 1's into 1U's to indicate
unsigned. This is done for all the flag bits in the affected word, as
they could get reorderd by someone in the future, unintentionally
reintroducing this problem again.
|
|
|
|
|
|
|
|
|
|
| |
It is planned for a future Perl release to have /xx mean something
different from just /x. To prepare for this, this commit raises a
deprecation warning if someone currently has this usage. A grep of CPAN
did not turn up any instances of this, but this is to be safe anyway.
The added code is more general than actually needed, in case we want to
do this for another flag.
|
|
|
|
|
|
| |
This doesn't actually use the flag yet.
We no longer have to make version-dependent changes to
ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
|
| |
|
|
|
|
|
|
| |
This sets a #define to point in the middle of the free-space, so that
bits at either end can be added without having to adjust many other
defines.
|
|
|
|
|
|
|
| |
This #defined a symbol then did a compile time check that it was the
same as another symbol. This commit simply defines it as the other
symbol directly, and moves it to above the other definitions, which it
no longer is part of. This prepares for the next commit.
|
|
|
|
|
|
| |
We do not need a placeholder for unused flag bits. And removing them
makes the generated regnodes.h more accurate as to what bits are
available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves three bits to create a block of unused bits at the beginning.
The first bit had to be moved to make space for other uses that are
coming in future commits. This breaks binary compatibility, so might as
well move the other two bits so that all the unused bits are
consolidated at the beginning.
This pool of unused bits is the boundary between the bits that are
common to op.h and regexp.h (and in op_reg_common.h) and those that are
separate. It's best to have all the unused bits there, so when we need
to use one, it can be taken from either side, as needed, without us
being trapped into having an available bit, but of the wrong kind.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after return/croak/die/exit, return/break are pointless
(break is not a terminator/separator, it's a goto)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(the initializations will be skipped since the flow will bypass
the start of the block); in some easy cases simply hoist the declarations
out of the block and move them earlier
Note 1: Since after this patch the core is not yet -Wunreachable-code
clean, not enabling that via cflags.SH, one needs to -Accflags=... it.
Note 2: At least with the older gcc 4.4.7 there are far too many
"unreachable code" warnings, which seem to go away with gcc 4.8,
maybe better flow control analysis. Therefore, the warning should
eventually be enabled only for modernish gccs (what about clang and
Intel cc?)
|
|
|
|
|
|
|
| |
This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679.
I don't understand - smoke-me came back happy with three
separate reports... oh well, some other time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after croak/die/exit (or return), break (or return!) are pointless
(break is not a terminator/separator, it's a promise of a jump)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(they will be skipped!); in some easy cases simply hoist the declarations
out of the block and move them earlier
There are still a few places left.
|
|
|
|
| |
(For some odd reason assert() cannot be found and Jenkins becomes apoplectic.)
|
|
|
|
|
|
|
|
| |
The default case really is impossible because all the valid
enums values are already covered in the switch.
The NOT_REACHED; is for the compiler (from perl.h),
the /* NOTREACHED */ is for static analyzers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent
regexes so that the pattern was considered tainted at compile-time, rather
than determining it each time at run-time whenever it executed a
locale-dependent node. Unfortunately due to the conflating of two flags,
RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently
marking a pattern as tainted once it had had a single tainted result.
E.g.
use re qw(taint);
use Scalar::Util qw(tainted);
for ($^X, "abc") {
/(.*)/ or die;
print "not " unless tainted("$1"); print "tainted\n";
};
which from 5.19.9 onwards output:
tainted
tainted
but with this commit (and with 5.19.8 and earlier), it now outputs:
tainted
not tainted
The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g.
$r = qr/$tainted_value/
while the RXf_TAINTED_SEEN flag means that the results of the last match
are tainted, e.g.
use re 'tainted';
$tainted =~ /(.*)/;
# $1 is tainted
Pre 63baef5, the code used to look like:
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
switch(node) {
case
BOUNDL: /* and other locale-specific ops */
turn on RXf_TAINTED_SEEN;
...;
}
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
63baef5 changed it to:
at compile-time:
if (pattern has locale ops)
turn on RXf_TAINTED_SEEN;
at run-time:
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
This commit changes it to:
at compile-time;
if (pattern has locale ops)
turn on RXf_TAINTED;
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently prog->substrs->data[] is a 3 element array of structures.
Elements 0 and 1 record the longest anchored and floating substrings,
while element 2 ('check'), is a copy of the longest of 0 and 1.
Record in a new field, prog->substrs->check_ix, the index of which element
was copied. (Eventually I intend to remove the copy altogether.)
Also for the anchored substr, set max_offset equal to min offset.
Previously it was left as zero and ignored, although if copied to check,
the check copy of max *was* set equal to min. Having this always set will
allow us to make the code simpler.
|
|
|
|
|
| |
In particular, specify that the various offset fields are char rather
than byte counts.
|
| |
|
|
|
|
|
|
|
|
|
| |
The flag tells us that a pattern may match an infinitely long string.
The new member in the regexp struct tells us how long the string might
be.
With these two items we can implement regexp based $/
|
|
|
|
|
|
|
|
|
|
| |
RXf_IS_ANCHORED as a replacement
The only requirement outside of the regex engine is to identify that there is
an anchor involved at all. So we move the 4 anchor flags to intflags and replace
it with a single aggregate flag RXf_IS_ANCHORED in extflags.
This frees up another 3 bits in extflags.
|
|
|
|
| |
So they stay stable as I move other flags from extflags to intflags
|
|
|
|
|
|
|
|
| |
This required removing the RXf_GPOS_CHECK mask as it uses one flag
that will stay in extflags for now (RXf_ANCH_GPOS), and one flag that
moves to intflags (RXf_GPOS_SEEN). This mask is strange however, as
you cant have RXf_ANCH_GPOS without having RXf_GPOS_SEEN so I dont
know why we test both. Further investigation required.
|
| |
|
|
|
|
|
| |
Includes some improvements to how we dump regexps so that when a regexp
is for the standard perl engine we also show the intflags for the engine
|
|
|
|
| |
plus some typo fixes. I probably changed some things in perlintern, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of getting the regexp engine to handle long strings, this com-
mit changes any variables, parameters and struct members that hold
lengths of the string being matched against (or parts thereof) to use
SSize_t or STRLEN instead of [IU]32.
To avoid having to change any logic, I kept the signedness the same.
I did not change anything that affects the length of the regular
expression itself, so regexps are still practically limited to
I32_MAX. Changing that would involve changing the size of regnodes,
which would be a lot more involved.
These changes should fix bugs, but are very hard to test. In most
cases, I don’t know the regexp engine well enough to come up with test
cases that test the paths in question with long strings. In other
cases I don’t have a box with enough memory to test the fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using I32 for the fields that record information about the location of
a fixed string that must be found for a regular expression to match
can result in match failures, because I32 is not large enough to store
offsets >= 2**31.
SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32
bits on 32-bit platforms.
This commit changes enough instances of I32 to SSize_t to get the
added test passing and suppress compiler warnings. A later commit
will change many more.
|
| |
|
|
|
|
|
|
|
|
|
| |
Change the internal fields for storing positions so that //g in scalar
context can move past the 2**31 character threshold. Before this com-
mit, the numbers would wrap, resulting in assertion failures.
The changes in this commit are only enough to get the added test pass-
ing. Stay tuned for more.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at -e line 1.
0
So pos() should be stored as a character offset.
The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.
This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.
|
|
|
|
|
|
| |
In the API, rename the 'screamer' arg to be 'sv' instead;
update the description of the functions args;
improve the documentation of the REXEC_* flags for the 'flags' arg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On something like:
$_ = "123456789";
pos = 6;
s/.(?=.\G)/X/g;
each iteration could in theory start with pos one character to the left
of the previous position, and with the substitution replacing bits that
it has already replaced. Since that way madness lies, ban any attempt by
s/// to substitute to the left of a previous position.
To implement this, add a new flag to regexec(), REXEC_FAIL_ON_UNDERFLOW.
This tells regexec() to return failure even if the match itself succeeded,
but where the start of $& is before the passed stringarg point.
This change caused one existing test to fail (which was added about a year
ago):
$_="abcdef";
s/bc|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge;
print; # used to print "aXX[c-d][d-e][e-f]"; now prints "aXXdef"
I think that that test relies on ambiguous behaviour, and that my change
makes things saner.
Note that s/// with \G is generally very under-tested.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally a /g match starts its processing at the previous pos() (or at
char 0 if pos is not set); however in the case of something like /abc\G/
we actually need to start 3 characters before pos. This has been handled
by the *callers* of regexec() subtracting prog->gofs from the stringarg
arg before calling it, or by setting stringarg to strbeg for floating,
such as /\w+\G/.
This is clearly wrong: the callers of regexec() shouldn't need to worry
about the details of getting \G right: move this code into regexec()
itself.
(Note that although this commit passes all tests, it quite possibly isn't
logically correct. It will get fixed up further during the next few
commits)
|