| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Indent in newly formed block
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use locale;
fc("\N{LATIN CAPITAL LETTER SHARP S}")
eq 2 x fc("\N{LATIN SMALL LETTER LONG S}")
should return true, as the SHARP S folds to two 's's in a row, and the
LONG S is an antique variant of 's', and folds to s. Until this commit,
the expression was false.
Similarly, the following should match, but didn't until this commit:
"\N{LATIN SMALL LETTER SHARP S}" =~ /\N{LATIN SMALL LETTER LONG S}{2}/iaa
The reason these didn't work properly is that in both cases the actual
fold to 's' is disallowed. In the first case because of locale; and in
the second because of /aa. And the code wasn't smart enough to realize
that these were legal.
The fix is to special case these so that the fold of sharp s (both
capital and small) is two LONG S's under /aa; as is the fold of the
capital sharp s under locale. The latter is user-visible, and the
documentation of fc() now points that out. I believe this is such an
edge case that no mention of it need be done in perldelta.
|
|
|
|
| |
This will be used in future commits to pass more flags.
|
|
|
|
|
|
|
| |
UTF8_IS_ABOVE_LATIN1() is equivalent to
(! UTF8_IS_INVARIANT && !UTF8_IS_DOWNGRADEABLE_START)
So we can use just it, for clearer code with fewer branches.
|
|
|
|
|
| |
This causes each deprecated function to have a prominent note to that
effect in its API documentation.
|
|
|
|
|
| |
The previous commit added macros to do some case changing. This
commit uses them in the core, where appropriate.
|
|
|
|
|
|
|
|
|
|
|
| |
The case changing macros are now almost all documented. The exception
is toUPPER_LC, which may change in 5.19
In addition the functions in utf8.c that these macros call now refer to
them instead of having their own documentation. People should really be
using the macros instead of calling the functions directly. I'm not
deprecating the functions because I can't foresee the need to change
them, so code that uses them should continue to be ok.
|
|
|
|
| |
This variable is always set just below.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Commit 8d919b0a35f2b57a changed the storage location of the string in
SVt_REGEXP. It updated most code to deal with this, but missed the use of
SvPVX_const() in Perl_sv_uni_display(). This breaks dumping regular
expressions which have the UTF-8 flag set.
|
| |
|
|
|
|
| |
This follows the suggestion by Aristotle Pagaltzis.
|
|
|
|
|
|
|
| |
This also changes isIDCONT_utf8() to use the Perl definition, which
excludes any \W characters (the Unicode definition includes a few of
these). Tests are also added. These macros remain undocumented for
now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
handy.h has character classification macros to determine if a UTF-8
encoded character is of a given type FOO, such as isALPHA_utf8(), etc.
Code that calls these should have first made sure that the parameter is
legal UTF-8. Prior to this patch, false was silently returned for all
illegal UTF-8. Now, in most instances, a deprecation warning is raised.
This is to catch bugs, and prepare for eventual elimination of this
check, which fails to catch read-off-end-of-buffer malformations anyway.
(One idea would be to leave the check in for DEBUGGING builds.)
The cases where no deprecation warning is raised as a result of this
commit is for the classes where the character does not have to be
converted to a code point for its inclusion to be determined. For
example, if malformed UTF-8 is checked to see if it is ASCII, we only
need to check that it is one of the 128 ASCII characters. If it isn't,
we don't bother to see if it is malformed or not. There are other
cases, as well, such as with isSPACE(), where we check if the UTF-8 is
one of a very finite set, without checking for malformedness.
This commit causes a number of apparent bugs to be shown by the Perl
test suite. These do not cause actual failures.
|
|
|
|
|
|
|
|
|
|
|
| |
This is so we can deprecate non-core use of the existing one in a future
commit. XS coders should be using the macros in handy.h instead of
calling such functions directly. A future commit will deprecate all of
them, but first the core uses of this one must change so they don't
generate deprecation messages. I will not have a chance to look for
some time, but I suspect that most uses of this function in the core
should be changed to use something else, but in the meantime, the
non-core uses can be deprecated.
|
|
|
|
|
|
| |
These names are synonyms for specific array elements, and were used
temporarily until all uses of them were removed. This commit removes
the remaining uses, and the definitions
|
|
|
|
|
| |
These functions were used internally as helpers for matching \X in
regular expressions. They are no longer used.
|
|
|
|
|
|
| |
This function uses table lookup to replace 9 more specific functions,
which can be deprecated. They should not have been exposed to the
public API in the first place
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perl has had an undocumented macro isALNUMC() for a long time. I want
to document it, but the name is very obscure. Neither Yves nor I are
sure what it is. My best guess is "C's alnum". It corresponds to
/[[:alnum:]]/, and so its best name would be isALNUM(). But that is the
name long given to what matches \w. A new synonym, isWORDCHAR(), has
been in place for several releases for that, but the old isALNUM()
should remain for backwards compatibility.
I don't think that the name isALNUMC() should be published, as it is too
close to isALNUM(). I finally came to the conclusion that
isALPHANUMERIC() is the best name; it describes its purpose clearly; the
disadvantage is its long length. I doubt that it will get much use, but
we need something, I think, that we can publish to accomplish this
functionality.
This commit also converts core uses of isALNUMC to isALPHANUMERIC. (I
intended to that separately, but made a mistake in rebasing, and
combined the two patches; and it seemed like not a big enough problem to
separate them out again.)
|
| |
|
|
|
|
|
|
|
|
| |
The return SV* from this function was inconsistent in its reference
count. In some cases it creates a new SV, which has a reference count
of 1, and in some cases it returned an existing SV without incrementing
the reference count. If the caller thought it was getting its own copy,
and decremented the reference count, it could lead to a double free.
|
|
|
|
|
| |
These functions are not used by the Perl core. Code should be using
the equivalent macros in handy.h that may avoid a function call.
|
|
|
|
|
|
|
| |
These functions were marked as XXX to add locale support. It was a
simple matter to do. We support locales for code points under 256,
so just call the appropriate macro for those, returning the Unicode
interpretation for those over 255.
|
|
|
|
|
| |
The Perl definition is slightly more restrictive of what Unicode's
idfirst is. We should use our definition consistently.
|
|
|
|
|
|
|
| |
We think this is meant to stand for C's alphanumeric, that is what is
matched by POSIX [:alnum:]. There were not functions and a dedicated
swash available for accessing it. Future commits will want to use
these.
|
|
|
|
| |
There is a function that does both these together, more efficiently
|
|
|
|
| |
to a place where people more expect to see it.
|
|
|
|
|
|
|
| |
These two macros should have the same results for the same input code
points. Prior to this patch, the _uni() macro returned the official
Unicode ID_Start property, and the _utf8() macro returned Perl's
slightly restricted definition. Now both return Perl's.
|
|
|
|
|
|
| |
This function was added in 5.16, and has no callers in CPAN. It is
undocumented and marked as changeable. Its name has two underscores in
a row by mistake. This removes one of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was discussed in ticket #114820.
This new copy-on-write mechanism stores a reference count for the
PV inside the PV itself, at the very end. (I was using SvEND+1
at first, but parts of the regexp engine expect to be able to do
SvCUR_set(sv,0), which causes the wrong byte of the string to be used
as the reference count.) Only 256 SVs can share the same PV this way.
Also, only strings with allocated space after the trailing null can
be used for copy-on-write.
Much of the code is shared with PERL_OLD_COPY_ON_WRITE. The restric-
tion against doing copy-on-write with magical variables has hence been
inherited, though it is not necessary. A future commit will take
care of that.
I had to modify _core_swash_init to handle $@ differently. The exist-
ing mechanism of copying $@ to a new scalar and back again was very
fragile. With copy-on-write, $@ =~ s/// can cause pp_subst’s string
pointers to become stale. So now we remove the scalar from *@ and
allow the utf8-table-loading code to autovivify a new one. Then we
restore the untouched $@ afterwards if all goes well.
|
|
|
|
|
|
|
| |
This finishes the removal of register declarations started by
eb578fdb5569b91c28466a4d1939e381ff6ceaf4. It neglected the ones in
function parameter declarations, and didn't include things in dist, ext,
and lib, which this does include
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove a large amount of machine code (~4KB for me) from funcs that use
ERRSV making Perl faster and smaller by preventing multiple evaluation.
ERRSV is a macro that contains GvSVn which eventually conditionally calls
Perl_gv_add_by_type. If a SvTRUE or any other multiple evaluation macro
is used on ERRSV, the expansion will, in asm have dozens of calls to
Perl_gv_add_by_type one for each test/deref of the SV in SvTRUE. A less
severe problem exists when multiple funcs (sv_set*) in a row call, each
with ERRSV as an arg. Its recalculated then, Perl_gv_add_by_type and all.
I think ERRSV macro got the func call in commit f5fa9033b8, Perl RT #70862.
Prior to that commit it would be pure derefs I think. Saving the SV* is
still better than looking into interp->gv->gp to get the SV * after each
func call.
I received no responses to
http://www.nntp.perl.org/group/perl.perl5.porters/2012/11/msg195724.html
explaining when the SV is replaced in PL_errgv, so took a conservative
view and assumed callbacks (with Perl stack/ENTER/LEAVE/eval_*/call_*)
can change it. I also assume ERRSV will never return null, this allows a
more efficiently version of SvTRUE to be used.
In Perl_newATTRSUB_flags a wasteful copy to C stack operation with the
string was removed, and a croak_notcontext to remove push instructions to
the stack. I was not sure about the interaction between ERRSV and message
sv, I didn't change it to a more efficient (instruction wise, speed, idk)
format string combining of the not safe string and ERRSV in the croak call.
If such an optimization is done, a compiler potentially will put the not
safe string on the first, unconditionally, then check PL_in_eval, and
then jump to the croak call site, or eval ERRSV, push the SV on the C stack
then push the format string "%"SVf"%s". The C stack allocated const char
array came from commit e1ec3a884f .
In Perl_eval_pv, croak_on_error was checked first to not eval ERRSV unless
necessery. I was not sure about the side effects of using a more efficient
croak_sv instead of Perl_croak (null chars, utf8, etc) so I left a comment.
nocontext used to save an push instruction on implicit sys perl.
In S_doeval, don't open a new block to avoid large whitespace changes.
The NULL assignment should optimize away unless accidental usage of errsv
in the future happens through a code change. There might be a bug here from
commit ecad31f018 since previous a char * was derefed to check for null
char, but ERRSV will never be null, so "Unknown error\n" branch will never
be taken.
For pp_sys.c, in pp_die a new block was opened to not eval ERRSV if
"well-formed exception supplied". The else if else if else blocks all used
ERRSV, so a "SV * errsv = NULL;" and a eval in the conditional with comma
op thing wouldn't work (maybe it would, see toke.c comments later in this
message). pp_warn, I have no comments.
In S_compile_runtime_code, a croak_sv question comes up same as in
Perl_eval_pv.
In S_new_constant, a eval in the conditional is done to avoid evaling
ERRSV if PL_in_eval short circuits. Same thing in Perl_yyerror_pvn.
Perl__core_swash_init I have no comments.
In the future, a SvEMPTYSTRING macro should be considered (not fully
thought out by me) to replace the SvTRUEs with something smaller and
faster when dealing with ERRSV. _nomg is another thing to think about.
In S_init_main_stash there is an opportunity to prevent an extra ERRSV
between "sv_grow(ERRSV, 240);" and "CLEAR_ERRSV();" that was too complicated
for me to optimize.
before perl517.dll
.text 0xc2f77
.rdata 0x212dc
.data 0x3948
after perl517.dll
.text 0xc20d7
.rdata 0x212dc
.data 0x3948
Numbers are from VC 2003 x86 32 bit.
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors the isSPACE_uni, is_SPACE_utf8, isPSXSPC_uni,
and is_PSXSPC_utf8 macros in handy.h, so that no function call need be
done to handle above Latin1 input. These macros are quite small, and
unlikely to grow over time, as Unicode has mostly finished adding white
space equivalents to the Standard. The functions that implement these
in utf8.c are also changed to use the macros instead of generating a
swash. This should speed things up slightly, with less memory used over
time as the swash fills.
|
|
|
|
|
|
|
|
|
|
| |
This adds macros to regen/regcharclass.pl that are usable as part of the
is_XDIGIT_foo() macros in handy.h, so that no function call need be done
to handle above Latin1 input. These macros are quite small, and
unlikely to grow over time. The functions that implement these in
utf8.c are also changed to use the macros instead of generating a swash.
This should speed things up slightly, with less memory used over time as
the swash fills.
|
|
|
|
|
|
|
|
|
|
|
| |
This adds macros to regen/regcharclass.pl that are usable as part of the
is_BLANK_foo() macros in handy.h, so that no function call need be done
to handle above Latin1 input. These macros are quite small, and
unlikely to grow over time, as Unicode has mostly finished adding white
space equivalents to the Standard. The functions that implement these
in utf8.c are also changed to use the macros instead of generating a
swash. This should speed things up slightly, with less memory used over
time as the swash fills.
|
|
|
|
|
|
|
| |
All controls will always be in the Latin1 range by Unicode's stability
policy. This means that we don't have to call is_utf8_cntrl() when the
input to the is_CNTRL_utf8() macro is above Latin1; we can just fail.
And that means that Perl_is_utf8_cntrl() can just use the macro.
|
|
|
|
|
| |
This could remove a layer of function call overhead for this small
function, (if the compiler doesn't already choose to inline it).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 87367d5f9dc9bbf7db1a6cf87820cea76571bf1a changed
core_invlist_init() to return not the swash, but the swash's inversion
list if small enough, allowing a faster binary search than a slower hash
look-up on small lists. Calls to two functions that access swashes were
changed to make this transparent. However, there are two more such
functions which were overlooked, and need to be upgraded to provide such
transparency, should they ever be called on swashes that have been
converted. This commit fixes one of them, but leaves the other, with a
comment, as it's much harder to do, and will not ever likely be
called on such a swash (it is for internal core use only).
|
|
|
|
|
|
|
| |
I suspect this leak also applies to any large character classes.
An HV created with newHV has a reference count of 1, so doing
newRV_inc on it will cause a leak.
|
|
|
|
|
|
|
|
| |
Under some circumstances it could cause a hash to point to a freed
element. But the hash itself was leaking, so it caused on problems,
as no attempt to free its element again was made.
The next commit will stop the hash from leaking.
|
|
|
|
|
|
|
|
| |
If we have just created an SV, it has a reference count of 1, so using
newRV_inc on it will create a leak. So we need to use newRV_noinc and
do SvREFCNT_inc in those cases where the SV is not new.
This has leaked since v5.17.3-117-g87367d5.
|
|
|
|
|
|
|
|
|
| |
memcpy(), which is what Copy() resolves to, is not supposed to handle
the possibility of overlapping source and destination. In some cases
in this code, the source and destination pointers are identical. What
should happen then is a no-op, so just don't do the copy in that case.
If the ptrs aren't identical, they won't otherwise overlap, so the
Copy() is valid except for when they are identical.
|
|
|
|
|
|
|
| |
The 2 lines removed in each function provide an early exit if the input
is malformed UTF-8. Other code executed later makes the same test.
But most inputs are going to be well-formed, so the test will almost
always fail, so will slow things up.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By defining NO_TAINT_SUPPORT, all the various checks that perl does for
tainting become no-ops. It's not an entirely complete change: it doesn't
attempt to remove the taint-related interpreter variables, but instead
virtually eliminates access to it.
Why, you ask? Because it appears to speed up perl's run-time
significantly by avoiding various "are we running under taint" checks
and the like.
This change is not in a state to go into blead yet. The actual way I
implemented it might raise some (valid) objections. Basically, I
replaced all uses of the global taint variables (but not PL_taint_warn!)
with an extra layer of get/set macros (TAINT_get/TAINTING_get).
Furthermore, the change is not complete:
- PL_taint_warn would likely deserve the same treatment.
- Obviously, tests fail. We have tests for -t/-T
- Right now, I added a Perl warn() on startup when -t/-T are detected
but the perl was not compiled support it. It might be argued that it
should be silently ignored! Needs some thinking.
- Code quality concerns - needs review.
- Configure support required.
- Needs thinking: How does this tie in with CPAN XS modules that use
PL_taint and friends? It's easy to backport the new macros via PPPort,
but that doesn't magically change all code out there. Might be
harmless, though, because whenever you're running under
NO_TAINT_SUPPORT, any check of PL_taint/etc is going to come up false.
Thus, the only CPAN code that SHOULD be adversely affected is code
that changes taint state.
|
|
|
|
|
| |
If an %INC hook or $@ assignment dies, then a scalar is leaked. I
don’t know that it is possible to test this.
|
| |
|