summaryrefslogtreecommitdiff
path: root/pod/perlreguts.pod
Commit message (Collapse)AuthorAgeFilesLines
* regex engine - improved comments explaining REGNODE_AFTER()Yves Orton2022-08-031-1/+1
| | | | | This rewrites one comment to include more explanation of the difference between Perl_regnext() and REGNODE_AFTER().
* regex engine - integrate regnode_after() support for EXACTish nodesYves Orton2022-08-031-6/+6
| | | | | | | | | | This adds REGNODE_AFTER_varies() which is used when the called *knows* that the current regnode is variable length. We then use it to handle EXACTish style nodes as determined by PL_regnode_arg_len_varies. As part of this patch Perl_regnext() Perl_regnode_after() and Perl_check_regnode_after() are moved to reginline.h, which is loaded via regcomp.c only when we are compiling the regex engine.
* regex engine - Rename PL_regarglen to PL_regnode_arg_lenYves Orton2022-08-031-1/+1
|
* regcomp.c - rename NEXTOPER to REGNODE_AFTER and related logicYves Orton2022-08-031-27/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is really easy to get confused about the difference between NEXTOPER() and regnext() of a regnode. The two concepts are related, similar, but importantly distinct. NEXTOPER() is also defined in such a way that it is easy to abuse and misunderstand and encourages producing code that is fragile to larger change, effectively "baking in" assumptions to the code that are difficult to discover by searching. Changing the type and storage requirements of a regnode may break things in subtle and hard to debug ways. An example of how NEXTOPER() is problematic is that this: NEXTOPER(NEXTOPER(branch)) does not mean "find the second node after the branch node", it means "jump forward by a regnode which happens to be two regnodes large". In other words NEXTOPER is just a fancy way of writing "node+1". This patch replaces NEXTOPER() with three new macros: REGNODE_AFTER_dynamic(node) REGNODE_AFTER_opcode(node,op) REGNODE_AFTER_type(node,tregnode_OPNAME) The first is the most generic case, it jumps forward by the size of the node, and determines that size by consulting OP(node). The second is where you have already extracted OP(node), and the third is where you know the actual structure that you want to jump forward by. Every regnode type has a corresponding type, which is known at compile time, so using the third will produce the most efficient code. However in many cases the code operates on one of several types, whose size may be the same now, but may change in the future, in which case one of the other forms is preferred. The run time logic in regexec.c should probably only use the REGNODE_AFTER_type() interface. Note that there is also a REGNODE_BEFORE() which replaces PREVOPER(), which is used in a specific piece of legacy logic but should not be used otherwise. It is not safe to go backwards from an arbitrary node, we simply have no way to know how large the previous node is and thus where it starts. This patch includes some logic that validates assumptions during DEBUG mode which should catch errors from resizing regnodes. After this patch changing the size of an existing regnode should be relatively safe and errors related to sizing should trigger assertion fails. This patch includes changes to perlreguts.pod to explain this stuff better.
* regen/regcomp.pl - Make regarglen available as PL_regarglen in regexec.cYves Orton2022-08-031-1/+1
| | | | | | | | | In a follow up patch we will use this data from regexec.c which currently cannot see the variable. This changes a comment in regen/mk_invlists.pl which necessitated rebuilding several files related to unicode. Only the hashes associated with mk_invlists.pl were changed.
* Remove typo spotted by user poti1James E Keenan2022-06-231-2/+1
| | | | For: https://github.com/Perl/perl5/issues/19885
* perlintern: regnode typedef is documented in perlregutsKarl Williamson2022-05-071-0/+2
|
* regcomp.h: change regexp_internal attribute from I32 to U32Yves Orton2022-02-181-3/+3
| | | | | | | | | | This changes the name_list_idx attribute from I32 to a U32 as it will never be negative, and as of a963d6d5acabdd8c7 a 0 can be safely used to represent "no value" for items in the 'data' array. I noticed this while cleaning up the offsets debug logic and updating the perlreguts documentation, so I figured I might as well clean it up at the same time.
* regcomp.c,re.pm: Remove "offsets" debugging codeYves Orton2022-02-181-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code was added by Mark Jason Dominus to aid a regex debugger he wrote for ActiveState. The basic premise is that every opcode in a regex can be attributed back to a contiguous sequence of characters that make up the pattern. This assumption has not been true ever since the "jump" TRIE optimizations were added to the engine. I spoke to MJD many years ago about whether it was ok to remove this from the regex engine and he said he had no objections. An example of a pattern that cannot be handled correctly by this logic is /(?: a x+ | b y+ | c z+ )/x where the (?:a ... | b ... | c ...) parts will now be handled by the TRIE logic and not by the BRANCH/EXACT opcodes that it would have been in the past. The offset debug output cannot handle this type of transformation, and produce nonsense output that mention opcodes that have been optimized away from the final program. The regex compiler is complicated enough without having to maintain this logic. There are essentially no tests for it, and the few tests that do cover it do so as a byproduct of testing other things. Despite the offsets logic only being used in debug supporting it does have a cost to non-debug logic as various internal routines include parameters related to it that are otherwise unused. Note this output is only usable or visible by enabling special flags in re.pm, there is no formal API to access it short of parsing the output of the debug mode of the regex engine, which has changed multiple time over the past years.
* perlreguts.pod: synchronize regexp_internal docs with codeYves Orton2022-02-181-15/+49
| | | | | | | Various changes have been made to struct regexp_internal over time which have not been documented. This updates the docs to match the code as it is now in preparation of changing the docs in subsequent commits.
* autodoc.pl: Specify scn for single-purpose filesKarl Williamson2020-11-061-1/+0
| | | | | | | | Many of the files in perl are for one thing only, and hence their embedded documentation will be for that one thing. By creating a hash here of them, those files don't have to worry about what section that documentation goes under, and so it can be completely changed without affecting them.
* perlreguts: Note pregcomp/regexec are documented hereKarl Williamson2020-09-051-0/+4
|
* perlreguts: UpdateKarl Williamson2020-02-151-45/+54
| | | | | | | This makes changes, mainly in dealing with the removal of the sizing pass in 5.30. Patches welcome for other fixes.
* Move more URLs from http:// to https://Max Maischein2019-10-111-2/+2
|
* pod/*: remove deprecated L<"section"> and L<section> syntaxLukas Mai2016-06-111-1/+1
|
* regcomp.h: Create new typedef synonym for clarityKarl Williamson2013-09-241-3/+3
| | | | | | | | | This commit finishes (at least for now) removing some of the overloading of the term class. A 'regnode_charclass_class' node contains space for storing the posix classes it matches that are never defined until the moment of matching because they are subject to the current run-time locale. This commit creates a typedef 'regnode_charclass_posixl' synonym that doesn't re-use the term 'class' for two different purposes.
* perlreguts: Bring up-to-dateKarl Williamson2013-09-241-36/+12
| | | | | Various changes have been made to regcomp.c that didn't make it into perlreguts until now.
* perlreguts.pod: NitsKarl Williamson2013-09-241-5/+5
|
* typo fix for reguts podDavid Steinbrunner2013-05-251-1/+1
|
* Document the uses of NULL returns in the regex parsing code.Nicholas Clark2013-03-191-0/+46
|
* Eliminate 'swap' from struct regexp_internal.Nicholas Clark2013-02-201-10/+0
| | | | It's been unused since commit e9105d30edfbaa7f in July 2009.
* perlreguts: Fit long verbatim lines to 79 colsKarl Williamson2012-09-271-34/+40
|
* * Em dash cleanup in pod/brian d foy2010-01-131-1/+1
| | | | | | | | | | | | | I looked at all the instances of spaces around -- and in most cases converted the sentences to use more appropriate punctuation. In general, the -- in the perl docs seem to be there only to make really complicated and really long sentences. I didn't look at the closed em-dashes. They probably have the same sentence-complexity problem. I left some open em-dashes in place. Those are the ones used in lists.
* PATCH: minor typo cleanup of pod/ directoryTom Christiansen2010-01-051-4/+4
| | | | | | | | | | | | | | These are all in the pod/ directory, and only the first is a code fix. There was also a single lingering ISO 8859-1 encoding that missed the UTF-8 upconvert. The rest are cleanups for typos, some of which seem to have been around for a rather long time: spelling errors, incorrect possessives, and extra, missing, or duplicated words. If you actually read through, I bet you'll realize what sparked this. :) --tom Signed-off-by: Abigail <abigail@abigail.be>
* much better swap logic to support reentrancy and fix assert failureGeorge Greer2009-07-261-7/+6
| | | | | | | | | | | Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. This change should close out RT #60508
* Re: [PATCH] POD fixesVincent Pit2008-02-251-1/+1
| | | | | Message-ID: <47BFFDCB.60107@profvince.com> p4raw-id: //depot/perl@33366
* Add the perlreapi man page, by Ævar Arnfjörð BjarmasonRafael Garcia-Suarez2007-04-121-289/+39
| | | | | (largely from perlreguts) p4raw-id: //depot/perl@30922
* Re: [PATCH] perlreguts.pod: use the unicode name for ß and show the codepointÆvar Arnfjörð Bjarmason2007-04-121-2/+2
| | | | | Message-ID: <51dd1af80704061441v4b972257ta4c95230bdbc47c5@mail.gmail.com> p4raw-id: //depot/perl@30920
* Re: pmdynflags and thread safetyYves Orton2007-04-041-14/+20
| | | | | Message-ID: <9b18b3110704031646p7ac8dbearf9e41397a5f884d8@mail.gmail.com> p4raw-id: //depot/perl@30841
* feel the the baß (encoding problems in the regex engine)Yves Orton2007-03-201-14/+25
| | | | | Message-ID: <9b18b3110703191740m6bf21942p6521f3016ed8092f@mail.gmail.com> p4raw-id: //depot/perl@30647
* Re: perlreguts: Copy-editing and wishlistMarvin Humphrey2007-03-191-29/+29
| | | | | Message-Id: <F6284B08-4B4E-467A-AFB2-8A71154FDD08@rectangular.com> p4raw-id: //depot/perl@30630
* Continue split of perl internal regexp structures from ones that are engine ↵Yves Orton2006-12-011-118/+213
| | | | | | | specific. Message-ID: <9b18b3110611301306p5cad5deal4aa55559b8c8defd@mail.gmail.com> p4raw-id: //depot/perl@29430
* Re: [PATCH] Fix RT#19049 and add relative backreferencesYves Orton2006-11-151-1/+10
| | | | | Message-ID: <9b18b3110611150329l206e4552w887ae5f0a3f7ca80@mail.gmail.com> p4raw-id: //depot/perl@29279
* Regex Utility Functions and Substituion Fix (XML::Twig core dump)Yves Orton2006-11-131-7/+23
| | | | | | | | Message-ID: <9b18b3110611121429g1fc9d6c1t4007dc711f9e8396@mail.gmail.com> Plus a couple tweaks to ext/re/re.pm and t/op/pat.t to those patches to apply cleanly. p4raw-id: //depot/perl@29252
* More regexp documentationYves Orton2006-10-121-42/+207
| | | | | Message-ID: <9b18b3110610120545m3002e17cqace30f908b0e2277@mail.gmail.com> p4raw-id: //depot/perl@28999
* More perlreguts nits by Dominic Dunlop,Rafael Garcia-Suarez2006-06-261-4/+5
| | | | | plus fix a broken internal POD link p4raw-id: //depot/perl@28428
* Nits to perlreguts.pod by Dominic DunlopRafael Garcia-Suarez2006-06-251-163/+198
| | | p4raw-id: //depot/perl@28425
* Re: [PATCH] regexec/regcomp.c cleanupsYves Orton2006-06-111-201/+278
| | | | | Message-ID: <9b18b3110606111401o143b2f57rd17bf117979853e7@mail.gmail.com> p4raw-id: //depot/perl@28380
* Add the perlreguts manpage, by Yves OrtonRafael Garcia-Suarez2006-06-081-0/+722
p4raw-id: //depot/perl@28372