| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
A compiled pattern requires a byte for each non-default modifier, like
/i. Previously, the worst case was presumed in allocating the space
(every modifier being non-default). Now, only the actual needed space
is reserved.
|
|
|
|
|
| |
The global const PL_inf and PL_nan have dual nature:
the .nv has the NV, the .u8 has the bytes.
|
|
|
|
|
|
|
| |
By prepending 'PL_' to each line in globvar.sym, it
a) makes makedef.pl slightly simpler,
b) makes it easier to spot all usage of a particular var when you
do 'git grep PL_foo'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new config file, regen/op_private, which contains all the
information about the flags and descriptions for the OP op_private field.
Previously, the flags themselves were defined in op.h, accompanied by
textual descriptions (sometimes inaccurate or incomplete).
For display purposes, there were short labels for each flag found in
Concise.pm, and another set of labels for Perl_do_op_dump() in dump.c.
These two sets of labels differed from each other in spelling (e.g.
REFC verses REFCOUNT), and differed in completeness and accuracy.
With this commit, all the data to generate the defines and the labels is
derived from a single source, and are generated automatically by 'make
regen'. It also contains complete data on which bits are used for what by
each op. So any attempt to add a new flag for a particular op where that
bit is already in use, will raise an error in make regen. This compares
to the previous practice of reading the descriptions in op.h and hoping
for the best.
It also makes use of data in regen/opcodes: for example, regen/op_private
specifies that all ops flagged as 'T' get the OPpTARGET_MY flag.
Since the set of labels used by Concise and Perl_do_op_dump() differed,
I've standardised on the Concise version. Thus this commit changes the
output produced by Concise only marginally, while Perl_do_op_dump() is
considerably different. As well as the change in labels (and missing
labels), Perl_do_op_dump() formerly had a bug whereby any unrecognised
bits would not be shown if there was at least one recognised bit.
So while Concise displayed (and still does) "LVINTRO,2", Perl_do_op_dump()
has changed:
- PRIVATE = (INTRO)
+ PRIVATE = (LVINTRO,0x2)
Concise has mainly changed in that a few op/bit combinations weren't being
shown symbolically, and now are. I've avoiding fixing the ones that would
break tests; they'll be fixed up in the next few commits.
A few new OPp* flags have been added:
OPpARG1_MASK
OPpARG2_MASK
OPpARG3_MASK
OPpARG4_MASK
OPpHINT_M_VMSISH_STATUS
OPpHINT_M_VMSISH_TIME
OPpHINT_STRICT_REFS
The last three are analogues for existing HINT_* flags. The former four
reflect that many ops some of the lower few bits of op_private to indicate
how many args the op expects. While (for now) this is still displayed as,
e.g. "LVINTRO,2", the definitions in regen/op_private now fully account
for which ops use which bits for the arg count.
There is a new module, B::Op_private, which allows this new data to be
accessed from Perl. For example,
use B::Op_private;
my $name = $B::Op_private::bits{aelem}{7}; # OPpLVAL_INTRO
my $value = $B::Op_private::defines{$name}; # 128
my $label = $B::Op_private::labels{$name}; # LVINTRO
There are several new constant PL_* tables. PL_op_private_valid[]
specifies for each op number, which bits are valid for that op. In a
couple of commits' time, op_free() will use this on debugging builds to
assert that no ops gained any private flags which we don't know about.
In fact it was by using such a temporary assert repeatedly against the
test suite, that I tracked down most of the inconsistencies and errors in
the current flag data.
The other PL_op_private_* tables contain a compact representation of all
the ops/bits/labels in a format suitable for Perl_do_op_dump() to decode
Op_private. Overall, the perl binary is about 500 bytes smaller on my
system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
| |
|
| |
|
|
|
|
|
|
| |
regcomp.c folds the string in these two nodes except in one case.
Change that case to correspond with the predominant behavior. This
enables future optimizations
|
|
|
|
|
|
|
|
| |
global.sym was a file listing the exported symbols, generated by regen/embed.pl
from embed.fnc and regen/opcodes, which was only used by makedef.pl
Move the code that generates global.sym from regen/embed.pl to makedef.pl,
and thereby eliminate the need to ship a 907 line generated file.
|
|
|
|
|
|
|
| |
PL_sh_path needs some form of special case because it is conditionally
defined either in perlvar.h or perl.h, but globvar.sym mentions all symbols
unconditionally, and undef -DPERL_GLOBAL_STRUCT perlvar.h is parsed as an
unconditional skip list.
|
|
|
|
|
| |
f1fb874192252653 added these 6 new global variables, but omitted to add them
to the list of exported symbols.
|
| |
|
|
|
|
|
| |
They exist solely to ensure that Perl_runops_standard and Perl_runops_debug
are linked in - nothing assigns to either variable, and nothing reads them.
|
|
|
|
| |
Make them const U16 - they should have been const from the start.
|
|
|
|
|
| |
To get the initialisation to work, the location of #include patchlevel.h needs
to be moved.
|
|
|
|
|
|
|
|
|
|
|
| |
They were converted in perl.h from const char[] to #define in 31fb120917c4f65d,
then re-instated as const char[], but in perlvars.h, in 3fe35a814d0a98f4.
There's no need for compile-time constants to jump through the hoops of
perlvars.h, even for Symbian, as the various "EXTCONST" variables already in
perl.h demonstrate.
These were the only 3 users of the the PERLVARISC macro, so eliminate that, and
all related code.
|
|
|
|
|
|
|
|
|
|
|
| |
Use it to eliminate the large switch statement in Perl_sv_magic().
As the table needs to be keyed on magic type, which is expressed as C character
constants, the order depends on the compiler's character set. Frustratingly,
EBCDIC variants don't agree on the code points for '~' and ']', which we use
here. Instead of having (at least) 4 tables, get the local runtime to sort the
table for us. Hence the regen script writes out the (unsorted) mg_raw.h, which
generate_uudmap sorts to generate mg_data.h
|
|
|
|
|
|
| |
As it's a 1 to 1 mapping with the vtables in PL_magic_vtables[], refactor
Perl_do_magic_dump() to index into it directly to find the name for an
arbitrary mg_virtual, avoiding a long switch statement.
|
|
|
|
|
|
| |
Define each PL_vtbl_* name as a macro which expands to the correct array
element. Using a single array instead of multiple named variables will allow
the simplification of various pieces of code.
|
|
|
|
|
| |
Magic with a NULL vtable is equivalent to magic with a vtable of all 0s.
On CPAN, only Apache::Peek's code for 5.005 is referencing it.
|
|
|
|
|
|
|
|
|
| |
As PL_charclass is a constant, it doesn't need to be accessed via the global
struct. It should be exported via globvar.sym, not PERLVARA() in perlvars.h
[With a PERVARA() it all compiles perfectly, once C<dVAR>s are added where
now needed, but the build loops forever because the (real) charclass array is
never initialised]
|
|
|
|
| |
Win32 builds have been broken since de1ac46b without this.
|
|
|
|
| |
Win32+gcc build)
|
|
|
|
| |
This exposes the current top-level interpreter phase to perl space.
|
|
|
|
| |
global.sym is generated; is there a way to automate globvar.sym?
|
| |
|
|
|
|
|
|
|
|
| |
From: "Jan Dubois" <jand@activestate.com>
Message-ID: <02c001c895eb$9bc3e920$d34bbb60$@com>
(with one tweak--it should be PL_bincompat_options!)
p4raw-id: //depot/perl@33644
|
|
|
|
|
|
|
| |
see), so it can easily be a static variable inside gv.c. This allows
the implementation to be changed in future Perls within the 5.10.x
series.
p4raw-id: //depot/perl@32116
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Fri, 29 Jun 2007 23:38:07 +0200
Message-ID: <20070629213807.GA14454@abigail.nl>
Subject: [PATCH pod/perlre.pod] Keeping up with the changes.
From: Abigail <abigail@abigail.be>
Date: Sat, 30 Jun 2007 01:24:36 +0200
Message-ID: <20070629232436.GA15326@abigail.nl>
Plus tweaks, and debug enahancements.
p4raw-id: //depot/perl@31506
|
|
|
|
|
|
| |
PL_reg_name properly. Hopefuly this will fix it, but I don't have
access to any platform where I can test this directly.
p4raw-id: //depot/perl@30461
|
|
|
|
|
|
| |
Instead remove &PL_vtbl_glob from globvar.sym.
p4raw-link: @27295 on //depot/perl: 3476b56103cbe13508b1fd6b46ae7b9cb6e0f7ed
p4raw-id: //depot/perl@27296
|
|
|
|
|
| |
renaming of the global variable.
p4raw-id: //depot/perl@24424
|
|
|
|
|
| |
Message-ID: <B356D8F434D20B40A8CEDAEC305A1F2453D653@esebe105.NOE.Nokia.com>
p4raw-id: //depot/perl@24271
|
|
|
| |
p4raw-id: //depot/perl@4602
|
|
|
| |
p4raw-id: //depot/perl@2746
|
|
|
| |
p4raw-id: //depot/perl@2241
|
|
(objpp.h is gone, embed.pl now does some of that); objXSUB.h
should soon be automated also; the global variables that
escaped the PL_foo conversion are now reined in; renamed
MAGIC in regcomp.h to REG_MAGIC to avoid collision with the
type of same name; duplicated lists of pp_things in various
places is now gone; result has only been tested on win32
p4raw-id: //depot/perl@2133
|