| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This rewrites one comment to include more explanation of the difference
between Perl_regnext() and REGNODE_AFTER().
|
|
|
|
|
|
|
|
|
|
| |
This adds REGNODE_AFTER_varies() which is used when the called *knows*
that the current regnode is variable length. We then use it to handle
EXACTish style nodes as determined by PL_regnode_arg_len_varies.
As part of this patch Perl_regnext() Perl_regnode_after() and
Perl_check_regnode_after() are moved to reginline.h, which is loaded via
regcomp.c only when we are compiling the regex engine.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is really easy to get confused about the difference between
NEXTOPER() and regnext() of a regnode. The two concepts are related,
similar, but importantly distinct. NEXTOPER() is also defined in such a
way that it is easy to abuse and misunderstand and encourages producing
code that is fragile to larger change, effectively "baking in"
assumptions to the code that are difficult to discover by searching.
Changing the type and storage requirements of a regnode may break things
in subtle and hard to debug ways.
An example of how NEXTOPER() is problematic is that this:
NEXTOPER(NEXTOPER(branch)) does not mean "find the second node after the
branch node", it means "jump forward by a regnode which happens to be
two regnodes large". In other words NEXTOPER is just a fancy way of
writing "node+1".
This patch replaces NEXTOPER() with three new macros:
REGNODE_AFTER_dynamic(node)
REGNODE_AFTER_opcode(node,op)
REGNODE_AFTER_type(node,tregnode_OPNAME)
The first is the most generic case, it jumps forward by the size of the
node, and determines that size by consulting OP(node). The second is
where you have already extracted OP(node), and the third is where you
know the actual structure that you want to jump forward by. Every
regnode type has a corresponding type, which is known at compile time,
so using the third will produce the most efficient code. However in many
cases the code operates on one of several types, whose size may be the
same now, but may change in the future, in which case one of the other
forms is preferred. The run time logic in regexec.c should probably
only use the REGNODE_AFTER_type() interface.
Note that there is also a REGNODE_BEFORE() which replaces PREVOPER(),
which is used in a specific piece of legacy logic but should not be
used otherwise. It is not safe to go backwards from an arbitrary node,
we simply have no way to know how large the previous node is and thus
where it starts.
This patch includes some logic that validates assumptions during DEBUG
mode which should catch errors from resizing regnodes.
After this patch changing the size of an existing regnode should be
relatively safe and errors related to sizing should trigger assertion
fails.
This patch includes changes to perlreguts.pod to explain this stuff
better.
|
|
|
|
|
|
|
|
|
| |
In a follow up patch we will use this data from regexec.c which
currently cannot see the variable.
This changes a comment in regen/mk_invlists.pl which necessitated
rebuilding several files related to unicode. Only the hashes associated
with mk_invlists.pl were changed.
|
|
|
|
| |
For: https://github.com/Perl/perl5/issues/19885
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This changes the name_list_idx attribute from I32 to a U32 as it will
never be negative, and as of a963d6d5acabdd8c7 a 0 can be safely used
to represent "no value" for items in the 'data' array.
I noticed this while cleaning up the offsets debug logic and updating
the perlreguts documentation, so I figured I might as well clean it up
at the same time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was added by Mark Jason Dominus to aid a regex debugger
he wrote for ActiveState. The basic premise is that every opcode
in a regex can be attributed back to a contiguous sequence of characters
that make up the pattern. This assumption has not been true ever since
the "jump" TRIE optimizations were added to the engine.
I spoke to MJD many years ago about whether it was ok to remove this
from the regex engine and he said he had no objections.
An example of a pattern that cannot be handled correctly by this logic is
/(?: a x+ | b y+ | c z+ )/x
where the
(?:a ... | b ... | c ...)
parts will now be handled by the TRIE logic and not by the BRANCH/EXACT
opcodes that it would have been in the past. The offset debug output
cannot handle this type of transformation, and produce nonsense output
that mention opcodes that have been optimized away from the final program.
The regex compiler is complicated enough without having to maintain this
logic. There are essentially no tests for it, and the few tests that do
cover it do so as a byproduct of testing other things. Despite the offsets
logic only being used in debug supporting it does have a cost to non-debug
logic as various internal routines include parameters related to it that
are otherwise unused.
Note this output is only usable or visible by enabling special flags
in re.pm, there is no formal API to access it short of parsing the
output of the debug mode of the regex engine, which has changed multiple
time over the past years.
|
|
|
|
|
|
|
| |
Various changes have been made to struct regexp_internal over
time which have not been documented. This updates the docs to
match the code as it is now in preparation of changing the docs
in subsequent commits.
|
|
|
|
|
|
|
|
| |
Many of the files in perl are for one thing only, and hence their
embedded documentation will be for that one thing. By creating a hash
here of them, those files don't have to worry about what section that
documentation goes under, and so it can be completely changed without
affecting them.
|
| |
|
|
|
|
|
|
|
| |
This makes changes, mainly in dealing with the removal of the sizing
pass in 5.30.
Patches welcome for other fixes.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This commit finishes (at least for now) removing some of the overloading
of the term class. A 'regnode_charclass_class' node contains space for
storing the posix classes it matches that are never defined until the
moment of matching because they are subject to the current run-time
locale. This commit creates a typedef 'regnode_charclass_posixl'
synonym that doesn't re-use the term 'class' for two different purposes.
|
|
|
|
|
| |
Various changes have been made to regcomp.c that didn't make it into
perlreguts until now.
|
| |
|
| |
|
| |
|
|
|
|
| |
It's been unused since commit e9105d30edfbaa7f in July 2009.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I looked at all the instances of spaces around -- and in most cases
converted the sentences to use more appropriate punctuation. In
general, the -- in the perl docs seem to be there only to make
really complicated and really long sentences.
I didn't look at the closed em-dashes. They probably have the same
sentence-complexity problem.
I left some open em-dashes in place. Those are the ones used in
lists.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are all in the pod/ directory, and only the first is a code fix.
There was also a single lingering ISO 8859-1 encoding that missed the
UTF-8 upconvert. The rest are cleanups for typos, some of which seem
to have been around for a rather long time: spelling errors, incorrect
possessives, and extra, missing, or duplicated words.
If you actually read through, I bet you'll realize what sparked this. :)
--tom
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c74340f9 added backreferences as well as the idea of a ->swap
regex pointer to keep track of the match offsets in case of backtracking.
The problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match
data that caused the utf8 swash to get built.
This change should close out RT #60508
|
|
|
|
|
| |
Message-ID: <47BFFDCB.60107@profvince.com>
p4raw-id: //depot/perl@33366
|
|
|
|
|
| |
(largely from perlreguts)
p4raw-id: //depot/perl@30922
|
|
|
|
|
| |
Message-ID: <51dd1af80704061441v4b972257ta4c95230bdbc47c5@mail.gmail.com>
p4raw-id: //depot/perl@30920
|
|
|
|
|
| |
Message-ID: <9b18b3110704031646p7ac8dbearf9e41397a5f884d8@mail.gmail.com>
p4raw-id: //depot/perl@30841
|
|
|
|
|
| |
Message-ID: <9b18b3110703191740m6bf21942p6521f3016ed8092f@mail.gmail.com>
p4raw-id: //depot/perl@30647
|
|
|
|
|
| |
Message-Id: <F6284B08-4B4E-467A-AFB2-8A71154FDD08@rectangular.com>
p4raw-id: //depot/perl@30630
|
|
|
|
|
|
|
| |
specific.
Message-ID: <9b18b3110611301306p5cad5deal4aa55559b8c8defd@mail.gmail.com>
p4raw-id: //depot/perl@29430
|
|
|
|
|
| |
Message-ID: <9b18b3110611150329l206e4552w887ae5f0a3f7ca80@mail.gmail.com>
p4raw-id: //depot/perl@29279
|
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110611121429g1fc9d6c1t4007dc711f9e8396@mail.gmail.com>
Plus a couple tweaks to ext/re/re.pm and t/op/pat.t to those patches
to apply cleanly.
p4raw-id: //depot/perl@29252
|
|
|
|
|
| |
Message-ID: <9b18b3110610120545m3002e17cqace30f908b0e2277@mail.gmail.com>
p4raw-id: //depot/perl@28999
|
|
|
|
|
| |
plus fix a broken internal POD link
p4raw-id: //depot/perl@28428
|
|
|
| |
p4raw-id: //depot/perl@28425
|
|
|
|
|
| |
Message-ID: <9b18b3110606111401o143b2f57rd17bf117979853e7@mail.gmail.com>
p4raw-id: //depot/perl@28380
|
|
p4raw-id: //depot/perl@28372
|