| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces PL_regnode_arg_len, PL_regnode_arg_len_varies,
PL_regnode_off_by_arg and PL_regnode_kind with a single PL_regnode_info
array, which is an array of struct regnode_meta, which contains the same
data but as a struct. Since PL_regnode_name is only used in debugging
builds of the regex engine we keep it separate. If we add more debug
properties it might be good to create a PL_regnode_debug_info[] to hold
that data instead.
This means when we add new properties we do not need to modify any
secondary sources to add new properites, just the struct definition
and regen/regcomp.pl
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This is in preparation for a future patch, so we can access
PL_reg_off_by_arg() from an inline function in regexec.c
|
|
|
|
|
|
|
|
|
| |
In a follow up patch we will use this data from regexec.c which
currently cannot see the variable.
This changes a comment in regen/mk_invlists.pl which necessitated
rebuilding several files related to unicode. Only the hashes associated
with mk_invlists.pl were changed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A long standing bug in Perl that has gone undetected is that the array
is global that is created when changing locales and tells fc() and qr//i
matching what the folds are in the new locale.
What this means is that any program only has one set of fold definitions
that apply to all threads within it, even if we claim that the locales
are thread-safe on the given platform. One possibility for this going
undetected so long is that no one is using locales on multi-threaded
systems much. Another possibility is that modern UTF-8 locales have the
same set of folds as any other one.
It is a simple matter to make the fold array per-thread instead of
per-process, and that solves the problem transparently to other code.
I discovered this stress-testing locale handling under threads. That
test will be added in a future commit.
In order to keep from having a dTHX inside foldEQ_locale, it has to have
a pTHX_ parameter. This means that the other functions that function
pointer variables get assigned to point to have to have an identical
signature, which means adding pTHX_ to functions that don't require it.
The bodies of all these are known to the compiler, since they are all
in inline.h or in the same .c file as where they are called. Hence the
compiler can optimize out the unused parameter.
Two calls of STR_WITH_LEN also have to be changed because of C
preprocessor limitations; perhaps there is another way to do it that I'm
unfamiliar with.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will prove useful in future commits on platforms that have 64 bit
capability.
The deBruijn sequence used here, taken from the internet, differs
from the 32 bit one in how they treat a word with no set bits. But this
is considered undefined behavior, so that difference is immaterial.
Apparently figuring this out uses brute force methods, and so I decided
to live with this difference, rather than to expend the time needed to
bring them into sync.
|
|
|
|
|
|
| |
This moves the code from regcomp.c to inline.h that calculates the
position of the lone set bit in a U32. This is in preparation for use
by other call sites.
|
|
|
|
|
|
|
| |
These categorize the many types of EXACT nodes, so that code can refer
to a particular subset of such nodes without having to list all of them
out. This simplifies some 'if' statements, and makes updating things
easier.
|
|
|
|
|
|
|
|
| |
This was originally added for MinGW, which no longer needs it, and
only still used by Symbian, which is now removed.
This also leaves perlapi.[ch] empty, but we keep the header for CPAN
backwards compatibility.
|
|
|
|
|
| |
This hasn't actually been used since commit
8922e4382e9c1488fdbe46a0f52493860dc897a6.
|
|
|
|
| |
This is in preparation for it to be raised in other files
|
|
|
|
|
| |
and the associated commits, at least until a way to make
wrap_op_checker() work is available.
|
|
|
|
| |
Fixes issue #14816
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the various Perl_PerlSock_dup2_cloexec() type functions so that
t/porting/liberl.a passes under -DPERL_GLOBAL_STRUCT_PRIVATE builds.
In these builds it is forbidden to have any static variables, but each
of these functions (via convoluted macros) has a static var called
'strategy' which records, for each function, whether a run-time probe
has been done to determine the best way of achieving close-exec
functionality, and the result.
Replace them all with 'global' vars: PL_strategy_dup2 etc.
NB these vars aren't thread-safe but it doesn't really matter, as the
worst that can happen is for a redundant probe or two to be done before
a suitable "don't probe any more" value is written to the var and seen
by all the threads.
|
|
|
|
|
|
|
|
|
| |
This fixes compiler warnings "performing pointer arithmetic on a null
pointer has undefined behavior"
There are several ways to fix this. This one was suggested by
Tomasz Konojacki++. Instead of trying to point to address 1 and 2, two
variables are created, and we point to them. const is cast away.
|
|
|
|
|
|
|
| |
They are defined in the Perl library but referenced in the Perl
executable, but the Perl executable can't see them unless they
are exported by the library, and some linkers only make a symbol
a public export if they've been told to explicitly.
|
|
|
|
|
|
|
|
|
|
| |
it's like PL_sv_no, except that its string value is "0" rather than "".
It can be used for example where pp function wants to push a zero return
value on the stack. The next commit will start to use it.
Also update the SvIMMORTAL() to be more efficient: it now checks whether
the SV's address is in a range rather than individually checking against
&PL_sv_undef, &PL_sv_no etc.
|
|
|
|
|
|
|
|
|
| |
I added the global string constant PL_isa_DOES recently. This caused
t/porting/libperl.t to fail under -DPERL_GLOBAL_STRUCT_PRIVATE builds.
This commit makes PL_isa_DOES be declared and defined in a similar
way to other such global constants. This is pure cargo-culting - I have no
real idea of the point of all the EXTCONST, INIT and globvar.sym stuff.
|
|
|
|
|
|
|
| |
A compiled pattern requires a byte for each non-default modifier, like
/i. Previously, the worst case was presumed in allocating the space
(every modifier being non-default). Now, only the actual needed space
is reserved.
|
|
|
|
|
| |
The global const PL_inf and PL_nan have dual nature:
the .nv has the NV, the .u8 has the bytes.
|
|
|
|
|
|
|
| |
By prepending 'PL_' to each line in globvar.sym, it
a) makes makedef.pl slightly simpler,
b) makes it easier to spot all usage of a particular var when you
do 'git grep PL_foo'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new config file, regen/op_private, which contains all the
information about the flags and descriptions for the OP op_private field.
Previously, the flags themselves were defined in op.h, accompanied by
textual descriptions (sometimes inaccurate or incomplete).
For display purposes, there were short labels for each flag found in
Concise.pm, and another set of labels for Perl_do_op_dump() in dump.c.
These two sets of labels differed from each other in spelling (e.g.
REFC verses REFCOUNT), and differed in completeness and accuracy.
With this commit, all the data to generate the defines and the labels is
derived from a single source, and are generated automatically by 'make
regen'. It also contains complete data on which bits are used for what by
each op. So any attempt to add a new flag for a particular op where that
bit is already in use, will raise an error in make regen. This compares
to the previous practice of reading the descriptions in op.h and hoping
for the best.
It also makes use of data in regen/opcodes: for example, regen/op_private
specifies that all ops flagged as 'T' get the OPpTARGET_MY flag.
Since the set of labels used by Concise and Perl_do_op_dump() differed,
I've standardised on the Concise version. Thus this commit changes the
output produced by Concise only marginally, while Perl_do_op_dump() is
considerably different. As well as the change in labels (and missing
labels), Perl_do_op_dump() formerly had a bug whereby any unrecognised
bits would not be shown if there was at least one recognised bit.
So while Concise displayed (and still does) "LVINTRO,2", Perl_do_op_dump()
has changed:
- PRIVATE = (INTRO)
+ PRIVATE = (LVINTRO,0x2)
Concise has mainly changed in that a few op/bit combinations weren't being
shown symbolically, and now are. I've avoiding fixing the ones that would
break tests; they'll be fixed up in the next few commits.
A few new OPp* flags have been added:
OPpARG1_MASK
OPpARG2_MASK
OPpARG3_MASK
OPpARG4_MASK
OPpHINT_M_VMSISH_STATUS
OPpHINT_M_VMSISH_TIME
OPpHINT_STRICT_REFS
The last three are analogues for existing HINT_* flags. The former four
reflect that many ops some of the lower few bits of op_private to indicate
how many args the op expects. While (for now) this is still displayed as,
e.g. "LVINTRO,2", the definitions in regen/op_private now fully account
for which ops use which bits for the arg count.
There is a new module, B::Op_private, which allows this new data to be
accessed from Perl. For example,
use B::Op_private;
my $name = $B::Op_private::bits{aelem}{7}; # OPpLVAL_INTRO
my $value = $B::Op_private::defines{$name}; # 128
my $label = $B::Op_private::labels{$name}; # LVINTRO
There are several new constant PL_* tables. PL_op_private_valid[]
specifies for each op number, which bits are valid for that op. In a
couple of commits' time, op_free() will use this on debugging builds to
assert that no ops gained any private flags which we don't know about.
In fact it was by using such a temporary assert repeatedly against the
test suite, that I tracked down most of the inconsistencies and errors in
the current flag data.
The other PL_op_private_* tables contain a compact representation of all
the ops/bits/labels in a format suitable for Perl_do_op_dump() to decode
Op_private. Overall, the perl binary is about 500 bytes smaller on my
system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
| |
|
| |
|
|
|
|
|
|
| |
regcomp.c folds the string in these two nodes except in one case.
Change that case to correspond with the predominant behavior. This
enables future optimizations
|
|
|
|
|
|
|
|
| |
global.sym was a file listing the exported symbols, generated by regen/embed.pl
from embed.fnc and regen/opcodes, which was only used by makedef.pl
Move the code that generates global.sym from regen/embed.pl to makedef.pl,
and thereby eliminate the need to ship a 907 line generated file.
|
|
|
|
|
|
|
| |
PL_sh_path needs some form of special case because it is conditionally
defined either in perlvar.h or perl.h, but globvar.sym mentions all symbols
unconditionally, and undef -DPERL_GLOBAL_STRUCT perlvar.h is parsed as an
unconditional skip list.
|
|
|
|
|
| |
f1fb874192252653 added these 6 new global variables, but omitted to add them
to the list of exported symbols.
|
| |
|
|
|
|
|
| |
They exist solely to ensure that Perl_runops_standard and Perl_runops_debug
are linked in - nothing assigns to either variable, and nothing reads them.
|
|
|
|
| |
Make them const U16 - they should have been const from the start.
|
|
|
|
|
| |
To get the initialisation to work, the location of #include patchlevel.h needs
to be moved.
|
|
|
|
|
|
|
|
|
|
|
| |
They were converted in perl.h from const char[] to #define in 31fb120917c4f65d,
then re-instated as const char[], but in perlvars.h, in 3fe35a814d0a98f4.
There's no need for compile-time constants to jump through the hoops of
perlvars.h, even for Symbian, as the various "EXTCONST" variables already in
perl.h demonstrate.
These were the only 3 users of the the PERLVARISC macro, so eliminate that, and
all related code.
|
|
|
|
|
|
|
|
|
|
|
| |
Use it to eliminate the large switch statement in Perl_sv_magic().
As the table needs to be keyed on magic type, which is expressed as C character
constants, the order depends on the compiler's character set. Frustratingly,
EBCDIC variants don't agree on the code points for '~' and ']', which we use
here. Instead of having (at least) 4 tables, get the local runtime to sort the
table for us. Hence the regen script writes out the (unsorted) mg_raw.h, which
generate_uudmap sorts to generate mg_data.h
|
|
|
|
|
|
| |
As it's a 1 to 1 mapping with the vtables in PL_magic_vtables[], refactor
Perl_do_magic_dump() to index into it directly to find the name for an
arbitrary mg_virtual, avoiding a long switch statement.
|
|
|
|
|
|
| |
Define each PL_vtbl_* name as a macro which expands to the correct array
element. Using a single array instead of multiple named variables will allow
the simplification of various pieces of code.
|
|
|
|
|
| |
Magic with a NULL vtable is equivalent to magic with a vtable of all 0s.
On CPAN, only Apache::Peek's code for 5.005 is referencing it.
|
|
|
|
|
|
|
|
|
| |
As PL_charclass is a constant, it doesn't need to be accessed via the global
struct. It should be exported via globvar.sym, not PERLVARA() in perlvars.h
[With a PERVARA() it all compiles perfectly, once C<dVAR>s are added where
now needed, but the build loops forever because the (real) charclass array is
never initialised]
|
|
|
|
| |
Win32 builds have been broken since de1ac46b without this.
|
|
|
|
| |
Win32+gcc build)
|
|
|
|
| |
This exposes the current top-level interpreter phase to perl space.
|
|
|
|
| |
global.sym is generated; is there a way to automate globvar.sym?
|
| |
|