| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Previous it was left so that some systems, like Android, didn't get this,
which broke the build.
|
|
|
|
|
|
|
|
|
| |
This extracts out the code of looking up POSIX classes in locales to use
base macros common to all of them. It does this for the NeXT only code
as well as the typical compilations.
This is in preparation for changing the behavior. Certain things look
weird as they are aligned, etc as part of that preparation.
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that the definitions for isASCII_LC and is_BLANK_LC end up
being the same for all three possible #if platform states, so can just
have them once instead of three times.
It is unlikely that the
&& ! defined(USE_NEXT_CTYPE)
is necessary, because HAS_ISASCII likely won't be defined, but this
makes sure that this doesn't change the previous behavior.
|
| |
|
|
|
|
|
|
|
|
| |
handy.h contains a macro that reads a hex digit and returns its value,
with fewer branches than a naive implementation would use. This commit
just copies and modifies it to create two macros for
1) just converting the hex value, without advancing the input; and
2) doing the same for an octal value.
|
|
|
|
|
| |
This macro requires the input to be a hex digit, without testing. It is
prudent to assert that under DEBUGGING.
|
|
|
|
| |
Future commits will want this available outside utf8.h
|
|
|
|
| |
plus some typo fixes. I probably changed some things in perlintern, too.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were a few places that were doing
unsigned_var = cond ? signed_val : unsigned_val;
or similar. Fixed by suitable casts etc.
The four in utf8.c were fixed by assigning to an intermediate
unsigned var; this has the happy side-effect of collapsing
a large macro expansion, where toUPPER_LC() etc evaluate their arg
multiple times.
|
|
|
|
|
|
|
|
|
| |
This adds comments as to how it works, factors out the mask to be
specified only once, and uses isDIGIT instead of isALPHA, as the former
is likely to be slightly more efficient (because isDIGIT doesn't have to
worry about there being non-ASCII digits, and isALPHA does have to worry
about non-ASCII alphas). The result is easier to understand what's
going on.
|
|
|
|
| |
Two adjacent lines were identical. Only one is needed.
|
|
|
|
|
|
| |
The are documented to return UV, but in one definition they return
tolower()/toupper(), which on Linux return a signed value. So
cast away the compiler warnings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 14dd3ad8c9 , a 3 NULL assigns were converted to a Zero() for what
I guess was an optimization. This also caused the large je_buf to be
zeroed even though je_buf was uninit before. At that time, JMPENV had
2 extra members that don't exist anymore. The 2 extra members in JMPENV
were removed in commit 766f891612 . The comment about je_throw was made
obsolete in commit 766f891612 so rework it.
One function call free NULL assign is faster than a memset() call.
je_buf is 0x40 bytes long on 32 bit VC2003 Win32 Perl. No need to zero it
since je_buf is never read unless je_prev is not NULL. Also there is no
need to zero the last 2 members je_ret and je_mustcatch since they are
immediatley assigned to. Move PL_top_env assignment to near je_prev so
compiler tries to optimize better since je_prev is the start of the struct
and hopefully will calculate the pointer once.
Also put some poisoning in case JMPENV gets new members in the future.
To conditionally poison in a macro, PERL_POISON_EXPR is being introduced
instead of 2 different definitions of JMPENV_BOOTSTRAP.
|
|
|
|
|
|
|
|
|
| |
This allows compilers that do support real booleans (C++ or anything
with stdbool.h) to emulate those that don’t.
See ticket #120314.
This patch incorporates suggestions from Craig Berry.
|
|
|
|
|
|
|
|
|
|
|
| |
The special case has been there since 61bb59065bf1b12edab3, most
likely because the VMS C++ compiler, like a lot of other C++
compilers in the 1990s implemented a bool as an int, and making
the type in C compatible seemed like a good idea. But no C++
compiler that's likely to build Perl on VMS has a bool type that
occupies more than one byte now, so remove the special case. We're
unlikely to even see this code since we've had stdbool.h since
DEC C 6.4, released in 2001.
|
|
|
|
|
|
|
| |
This simplifies some of the logic necessary for coping with its various
problems.
Suggested by Nicholas Clark.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a bunch of macros and moves things around to support
conditional compilation when Configure is called with
-DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are
table-driven to not be used, since the table may not be valid when
bringing Perl up for the first time on a non-ASCII platform.
This allows it to compile using the platform's native C library ctype
functions, which should work enough to compile miniperl, and allow the
table to be changed to be valid. Then Configure can be re-run to not
bootstrap, and normal compilation can proceed
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII ones, as the rules for these on ASCII platforms.
A previous commit for 5.18 changed the docs about this issue. This
current commit forces ASCII rules on EBCDIC platforms (even should there
be one that natively uses all 256). To get all 256, the same things
like 'use feature "unicode_strings"' must now be done.
|
|
|
|
|
|
|
|
|
| |
handy.h is included in files that don't include perl.h, and hence not
utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion
macros being available to us. The best way to cope is to use the native
ctype functions. Most, but not all, of the macros in this commit
currently resolve to use those native ones, but a future commit will
change that.
|
|
|
|
|
| |
Now, only one of the macros relies on magic numbers (isPRINT), leading
to clearer definitions.
|
|
|
|
|
|
|
|
| |
These 4 macros can have the same RHS for their ASCII and EBCDIC
versions, so no need to duplicate their definitions
This also enables the EBCDIC versions to not have undefined expansions
when compiling without perl.h
|
|
|
|
|
|
|
|
| |
Now that the Unicode tables are stored in native format, we shouldn't be
doing remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; not all of this has been done yet.
|
|
|
|
|
|
|
|
| |
The conversion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
|
|
|
|
|
|
|
|
|
|
| |
The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
whole Unicode range. In the locations affected by this commit, it is
known that the domain is limited to a single byte, so the simpler ones
whose names contain LATIN1 may be used.
On ASCII platforms, all the macros are null, so there is no effective
change.
|
|
|
|
|
|
|
|
|
| |
This reverts commit 43387ee1abcd83c3c7586b7f7aa86e838d239aac.
Which reverted parts of f019c49e380f764c1ead36fe3602184804292711, but that
reversion may no longer be necessary.
See [perl #116989]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code dealt rather inconsistently with uids and gids. Some
places assumed that they could be safely stored in UVs, others
in IVs, others in ints; All of them should've been using the
macros from config.h instead. Similarly, code that created
SVs or pushed values into the stack was also making incorrect
assumptions -- As a point of reference, only pp_stat did the
right thing:
#if Uid_t_size > IVSIZE
mPUSHn(PL_statcache.st_uid);
#else
# if Uid_t_sign <= 0
mPUSHi(PL_statcache.st_uid);
# else
mPUSHu(PL_statcache.st_uid);
# endif
#endif
The other places were potential bugs, and some were even causing
warnings in some unusual OSs, like haiku or qnx.
This commit ammends the situation by introducing four new macros,
SvUID(), sv_setuid(), SvGID(), and sv_setgid(), and using them
where needed.
|
|
|
|
|
|
|
|
|
|
| |
These two undocumented macros returned the REPLACEMENT CHARACTER if the
input was outside the Latin1 range. This was contrary to all other
similar macros, which return their input if it is invalid. It caused
warnings in some (dumber than average) compilers.
These macros are undocumented; this changes the behavior only of illegal
inputs to them.
|
|
|
|
|
|
|
|
| |
These macros fill in all the missing case changing operations. They
were omitted before because they are identical in their input domains to
other operations. But by adding them here, that detail no longer need be
known by the callers. toFOLD_LC is not documented, as is subject to
change
|
|
|
|
|
|
|
|
|
|
|
| |
The case changing macros are now almost all documented. The exception
is toUPPER_LC, which may change in 5.19
In addition the functions in utf8.c that these macros call now refer to
them instead of having their own documentation. People should really be
using the macros instead of calling the functions directly. I'm not
deprecating the functions because I can't foresee the need to change
them, so code that uses them should continue to be ok.
|
|
|
|
| |
This corresponds to the other case changing macros
|
|
|
|
| |
Other macros have these suffixes, so for uniformity add these.
|
| |
|
|
|
|
| |
The language was confusing, and this also fixes a typo.
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 3c3ecf18c35ad7832c6e454d304b30b2c0fef127, I mistakenly added
documentation for a non-existent macro. It turns out that only the
variants listed for that macro exist, and not the base macro. Since we
are in code freeze, the solution has to be not to change code by adding
the base macro, but to delete the documentation, or change it to refer
to just the existing versions. In order to not cause an entry that is
anomalous to the others, for this release, I'm just getting rid of the
documentation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This appears to resolve these three related tickets:
[perl #116989] S_croak_memory_wrap breaks gcc warning flags detection
[perl #117319] Can't include perl.h without linking to libperl
[perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?)
This patch changes S_croak_memory_wrap from a static (but not inline)
function into an ordinary exported function Perl_croak_memory_wrap.
This has the advantage of allowing programs (particuarly probes, such
as in cflags.SH and Time::HiRes) to include perl.h without linking
against libperl. Since it is not a static function defined within each
compilation unit, the optimizer can no longer remove it when it's not
needed or inline it as needed. This likely negates some of the savings
that motivated the original commit 380f764c1ead36fe3602184804292711.
However, calling the simpler function Perl_croak_memory_wrap() still
does take less set-up than the previous version, so it may still be a
slight win. Specific cross-platform measurements are welcome.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
This commit is documentation only, mostly just removing the special
mentions of EBCDIC.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three pairs of characters that Perl recognizes as
metacharacters in regular expression patterns: {}, [], and (). These
can be used as well to delimit patterns, as in:
m{foo}
s(foo)(bar)
Since they are metacharacters, they have special meaning to regular
expression patterns, and it turns out that you can't turn off that
special meaning by the normal means of preceding them with a backslash,
if you use them, paired, within a pattern delimitted by them. For
example, in
m{foo\{1,3\}}
the backslashes do not change the behavior, and this matches "f", "o"
followed by one to three more occurrences of "o".
Usages like this, where they are interpreted as metacharacters, are
exceedingly rare; we think there are none, for example, in all of CPAN.
Hence, this deprecation should affect very little code. It does give
notice, however, that any such code needs to change, which will in turn
allow us to change the behavior in future Perl versions so that the
backslashes do have an effect, and without fear that we are silently
breaking any existing code.
=head1 Performance Enhancements
|
|
|
|
|
| |
It was handling above-Latin1 code points as IDstarts instead of
continues.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property
\p{Cased}. This commit introduces a pseudo-Posix class, internally named
'cased', to represent this. This class isn't specifiable by the user,
except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug
output will say ':cased:'.
The regex parsing either of :lower: or :upper: will change them into
:cased:, where already existing logic can handle this, just like any
other class.
This commit fixes the regression introduced in
3018b823898645e44b8c37c70ac5c6302b031381, and that these have never
worked under 'use locale'. The next commit will un-TODO the tests for
these things.
|
|
|
|
|
|
|
|
| |
Until recently, these were needed to be (or it made sense to be) in
numerical value of what the rhs of each #define evaluates to. But now,
they are all initialized to something else, and the numerical value is
not even apparent. Alphabetical order gives a logical ordering to help
a reader find things.
|
| |
|
|
|
|
|
|
|
| |
This also changes isIDCONT_utf8() to use the Perl definition, which
excludes any \W characters (the Unicode definition includes a few of
these). Tests are also added. These macros remain undocumented for
now.
|
|
|
|
|
|
| |
These names are synonyms for specific array elements, and were used
temporarily until all uses of them were removed. This commit removes
the remaining uses, and the definitions
|
| |
|