| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These 2 macros were created for the Symbian port in commit
"Symbian port of Perl" to replace a direct "extern" token. I guess the
author was unaware of PERL_CALLCONV.
PERL_CALLCONV is the official macro to use. PERL_XS_EXPORT_C and
PERL_EXPORT_C have no usage on cpan grep except for modules with direct
copies of core headers. A defect of using PERL_EXPORT_C and
PERL_XS_EXPORT_C instead of PERL_CALLCONV is that win32/win32.h has no
knowledge of the 2 macros and doesn't set them, and os/os2ish.h doesn't
either. On Win32, since the unix defaults are used instead of Win32
specific "__declspec(dllimport)" token, XS modules use indirect function
stubs in each XS module placed by the CC to call into perl5**.dll instead
of directly calls the core C functions. I observed this in in XS-Typemap's
DLL. To simplify the API, and to decrease the amount of macros needing to
implemented to support each platform, just remove the 2 macros.
Since perl.h's fallback defaults for PERL_CALLCONV are very late in perl.h,
they need to be moved up before function declarations start in perlio.h
(perlio.h is included from iperlsys.h).
win32iop.h contains the "PerlIO" and SV" tokens, so perlio.h must be
included before win32iop.h is. Including perlio.h so early in win32.h,
causes PERL_CALLCONV not be defined since Win32 platform uses the
fallback in perl.h, since win32.h doesn't always define PERL_CALLCONV and
sometimes relies on the fallback. Since win32iop.h contains alot of
declarations, it belongs with other declarations such as those in proto.h
so move it from win32.h to perl.h.
the "free" token in struct regexp_engine conflicts with win32iop's
"#define free win32_free" so rename that member.
|
|
|
|
|
| |
These are mentioned in some other pods. It's best to bring them into
perlapi, and refer to them from the other pods.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
|
| |
On my previous commit, I failed to save the edited handy.h file where
I had edited the comments, so they were missed from the commit.
|
|
|
|
|
|
| |
My recent mods to make it often constant-fold at time-time were
generating Coverity warnings when the expression happened to equal
(cond ? 1 : 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MEM_WRAP_CHECK(n,t) checks whether n * sizeof(t) exceeds the
memory size, and so is likely to wrap.
When the type of n is small (e.g. a U8), you used to get compiler warnings
about a comparison always being true. This was avoided by adding 0.0. Now
Coverity complains that you're doing a floating-point comparison with the
results of an integer division.
Instead of adding 0.0, instead add some more compile-time checks
that will cause the runtime check to be skipped when the maximum value
of n (as determined by sizeof(n)) is a lot less than memory size.
On my 64-bit system this also pleasingly makes the executable 8384 bytes
smaller, implying that in many cases, the run-time check is now being
skipped.
|
|
|
|
|
|
|
| |
This was experimentally introduced in 5.18, and no issues were raised,
except that it got us to thinking and spurred us to stop allowing $^X,
where 'X' is a non-printable control character, and that change caused
some issues.
|
|
|
|
| |
See thread http://nntp.perl.org/group/perl.perl5.porters/224999
|
|
|
|
|
|
| |
It occurred to me that these macros could have an xor applied to a
signed value if the argument is signed, whereas the xor is expecting
unsigned.
|
|
|
|
|
| |
Modify apidoc.pl to warn about duplicate apidoc entries, and
remove duplicates for av_tindex and toLOWER_LC
|
|
|
|
|
|
|
| |
Perl's toLOWER_LC() etc macros are specified as having U8 arg and return,
while the underlying macro may call the OS's tolower() function which is
int. Stop the compiler warning about mismatched sign in conditional by
casting the result of the OS function.
|
|
|
|
|
|
| |
These being missing caused 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc
to have to be reverted. It only shows up on platforms that don't have
an isblank() libc function.
|
|
|
|
|
| |
In EBCDIC only macros, an argument previously was failed to be
dereferenced, and there was an extra ==. A few comment changes as well
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit isASCII on EBCDIC platforms was defined as the
isascii() libc function. It turns out that this doesn't work properly.
It needed to be this way back when EBCDIC was bootstrapped onto the
target machine, but now, various header files are furnished with the
requisite definitions, so this is no longer necessary.
The problem with isascii() is that it is locale-dependent, unlike on
ASCII platforms. This means that instead of getting a standard ASCII
definition, it returns whatever the underlying locale says, even if
there is no 'use locale' anywhere in the program. Starting with this
commit, the isASCII definition now comes from the l1_char_class_tab.h
file which we know is accurate and not locale-dependent.
This header can be used in compilations of utility programs where perl.h
is not available. For these, there are alternate, more complicated
definitions, should they be needed in those utility programs. Several
of those definitions prior to this commit also used locale-sensitive
isfoo() functions. The bulk of this commit refactors those definitions
to not use these functions as much as possible. As noted in the
added comments in the code, the one remaining use of such a function is
only for the lesser-used control characters. Likely these aren't used
in the utility programs.
|
|
|
|
|
|
|
| |
This section of code is normally not compiled, but when circumstances
call for it to be compiled, it may be missing the macro defined in this
commit, which is trivial on ASCII platforms, so just define it if
missing
|
|
|
|
|
|
|
| |
This section of code is compiled only when perl.h is not available, i.e.
for utility programs. I periodically test that it still works, and this
time a macro was added to the other branch of the #if, but not this one.
This commit adds a trivial one to the missing area.
|
|
|
|
|
| |
Removes obsolete comment, and adds text to make it easier to find
matching #else and #endif of a #if
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The recently introduced macro isMNEMONIC_CNTRL has a look-up and several
tests in it, which occupy time and space. Since it was only used for
debugging, that did not matter much, but future commits will use it in
more mainline code. This commit changes it to be a single look-up,
using up one of the spare bits available for that purpose in
PL_charclass. There are enough available bits that we aren't likely to
run out, really ever. (We can always add a 2nd word of bits if
necessary.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds to handy.h isALPHA_FOLD_EQ(c1,c2) which efficiently tests if
c1 and c2 are the same character, case-insensitively. For example
isALPHA_FOLD_EQ(c, 's') returns true if and only if <c> is 's' or 'S'.
isALPHA_FOLD_NE() is also added by this commit.
At least one of c1 and c2 must be known to be in [A-Za-z] or this macro
doesn't work properly. (There is an assert for this in the macro in
DEBUGGING builds). That is why the name includes "ALPHA", so you won't
forget when using it.
This functionality has been in regcomp.c for a while, under a different
name. I had thought that the only reason to make it more generally
available was potential speed gain, but recent gcc versions optimize to
the same code, so I thought there wasn't any point to doing so.
But I now think that using this makes things easier to read (and
certainly shorter to type in). Once you grok what this macro does, it
simplifies what you have to keep in your mind when reading logical
expressions with multiple operands. That something can be either upper
or lower case can be a distraction to understanding the larger point of
the expression.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit also causes escaped (by a backslash) "(", "[", and "{" to be
considered literally. In the previous 2 Perl versions, the escaping was
ignored, and a (default-on) deprecation warning was raised. Now that we
have warned for 2 release cycles, we can change the meaning.of escaping
to actually do something
Warning when a literal left brace is not escaped by a backslash, will
allow us to eventually use this character in more contexts as being
meta, allowing us to extend the language. For example, the lower limit
of a quantifier could be omited, and better error checking instituted,
or things like \w could be followed by a {...} indicating some special
word character, like \w{Greek} to restrict to just Greek word
characters.
We tried to do this in v5.16, and many CPAN modules changed to backslash
their left braces at that time. However we had to back out that change
before 5.16 shipped because it turned out that escaping a left brace in
some contexts didn't work, namely when the brace would normally be a
metacharacter (for example surrounding a quantifier), and the pattern
delimiters were { }. Instead we raised the useless backslash warning
mentioned above, which has now been there for the requisite 2 cycles.
This patch partially reverts 2 patches. The first,
e62d0b1335a7959680be5f7e56910067d6f33c1f, partially reverted
the deprecation of unescaped literal left brace. The other,
4d68ffa0f7f345bc1ae6751744518ba4bc3859bd, instituted the deprecation of
the useless left-characters.
Note that, as in the original attempt to deprecate, we don't raise a
warning if the left brace is the first character in the pattern. This
is because in that position it can't be a metacharacter, so we don't
require any disambiguation, and we found that if we did raise an error,
there were quite a few places where this occurred.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is not very user friendly to list functions as
"Functions found in file FOO". Better is to group them by purpose, as
many were already. I went through and placed the ones that weren't
already so grouped into groups. Patches welcome if you have a better
classification.
I changed the headings of some so that the important disctinction was
the first word so that they are placed in the file more appropriately.
And a couple of ones that I had created myself, I came up with a name
that I think is better than the original
|
|
|
|
|
|
|
| |
Windows doesn't follow the Posix standard for their functions like
isalnum(), isdigit(), etc. This forces compliance by changing the
macros that are the interfaces to those functions to be smarter than
just calling the raw functions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few characters in the Latin1 range that can be folded to by
above-Latin1 characters. Some of these are folded to as part of a
single character fold, like KELVIN SIGN folds to 'k'. More are folded
to as part of a multi-character fold. Until this commit, there wasn't a
quick way to distinguish between the two classes. A couple of places
only want the single-character ones. It is more efficient to look for
just those than to include the multi-char ones which end up not doing
anything. This uses a bit in l1_char_class_tab.h to indicate those
characters that are in the desired class.
|
|
|
|
|
|
| |
The definition was incorrect. When going from control to printable
name, we need to go from Latin1 -> Native, so that e.g., a 65 gets
turned into the native 'A'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively reverts commit 3ded5eb052cdc3f861ec0c0ff85348086d653be0.
That commit created a scheme to bootstrap Perl onto a non-ASCII
platform, by adding the allowing a Configure option that caused the
compiled code to bypass a number of normal macro definitions and use
slower, generic ones, sufficient to get miniperl to compile on the
target architecture. One would then use miniperl to run a few scripts
that would re-order certain header files, Using this one could then
recompile all of perl, and once that was done, use it to recompile to
use the normal fast macros.
This worked, but was a cumbersome process. We now have the
infrastructure, since commit 6ff677df5d6fe0f52ca0b6736f8b5a46ac402943,
to cross compile on an ASCII platform to EBCDIC, the likely only
non-ASCII character set to ever be used. So the new infrastructure will
be used in future commits.
|
|
|
|
| |
The comments say it all
|
|
|
|
|
|
| |
Some functions that take a string/length pair can have embedded NULs and
don't have to be NUL terminated; others are the opposite. This adds
text to clarify the issue.
|
| |
|
|
|
|
|
|
|
| |
It turns out that the EBCDIC definitions can be made the same as the
ASCII ones, so this moves the ASCII definitions to the spot where other
ones common to the 2 platforms reside, and removes the EBCDIC ones. In
other words it combines separate definitions into common ones.
|
|
|
|
| |
See hints/darwin.sh for details.
|
|
|
|
|
| |
Prevent the failure for 32-bit builds on C89 compilers introduced in
f4e3fd268af3.
|
|
|
|
|
|
|
|
| |
(1) Prefer the native int/long over long long (not in C89!) or __int64.
(2) Define them only if necessary, they might be defined in <stdint.h> by C99
(3) However, note the C99. They might not be available in strict C89.
(4) In OS X they are defined with ULL/LL, which will not be
to the liking of C89 pedantic gcc.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the C_ARRAY_LENGTH instead of sizeof(c_array)/sizeof(c_array[0])
or sizeof(c_array)/sizeof(type_of_element_in_c_array), and C_ARRAY_END
for c_array + C_ARRAY_LENGTH(c_array).
While doing this found potential off-by-one error in sv.c:Perl_sv_magic:
how > C_ARRAY_LENGTH(PL_magic_data)
should probably have been
how >= C_ARRAY_LENGTH(PL_magic_data)
No tests fail, but this seems to be more of an internal sanity check.
|
|
|
|
|
|
| |
In certain places in the documentation, "5.20" is no longer applicable.
Also, a message referred to in perldiag got reworded, but our checks did
not catch that perldiag should have been updated.
|
|
|
|
|
|
|
|
| |
I've gone through pp_hot.c and scope.c and added LIKELY() or UNLIKELY()
to all conditionals where I understand the code well enough to know that
a particular branch is or isn't likely to be taken very often.
I also processed some of the .h files which contain commonly used macros.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no change for ASCII platforms. For EBCDIC ones, toCTRL('?")
and its inverse are special cased to map to/from the APC control
character, which is the outlier control on these platforms. The reason
to special case this is that otherwise toCTRL('?') would map to a
graphic character, not a control. By outlier, I mean it is the one
control not in the single block where all the other controls are placed.
Further, it corresponds on two of the platforms with 0xFF, which is
would be an EBCDIC rub-out character corresponding to an ASCII rub-out
(or DEL) 0x7F, which is what toCTRL('?') maps to on ASCII. This is an
outlier control on ASCII not being a member of the C0 nor C1 controls.
Hence this make '?' mean the outlier control on both platforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isblank() is a C99 construct, Perl tries to handle the use of this on
C89 platforms by using the standard hard-coded definition. However,
this code was not updated to account for UTF-8 locales when handling for
those was recently added (31f05a37c), since in a UTF-8 locale the
no-break space is also considered to be a blank.
This commit fixes that. Previously regcomp.c generated the hard-coded
definitions when there was no isblank(), using #ifdef'd code. That
special handling was removed, and [:blank:] is always treated just like
any other POSIX class. The specialness of it is hidden entirely in
handy.h. This simplifies the regcomp.c code slightly. I considered
removing the special handling for isascii(), also a C99 construct, in
the name of simplicity over the slight speed that would be lost. But
the special handling is only a single line in two places, so I left it
in.
|
|
|
|
|
|
|
| |
isascii() is used as a fallback for isASCII(). This would be on an
unusual platform or under unusual circumstances. isascii() may return
values besides 0 and 1 which can cause things that are expecting a bool
to fail.
|
|
|
|
|
|
|
| |
This was due to a logic error in toFOLD_LC() introduced in
31f05a37c4e9c37a7263491f2fc0237d836e1a80. It affected only the code
point at 0xB5 and shows up only in locales in which the character at that
code point is an uppercase letter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
|
|
|
|
| |
Previous it was left so that some systems, like Android, didn't get this,
which broke the build.
|
|
|
|
|
|
|
|
|
| |
This extracts out the code of looking up POSIX classes in locales to use
base macros common to all of them. It does this for the NeXT only code
as well as the typical compilations.
This is in preparation for changing the behavior. Certain things look
weird as they are aligned, etc as part of that preparation.
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that the definitions for isASCII_LC and is_BLANK_LC end up
being the same for all three possible #if platform states, so can just
have them once instead of three times.
It is unlikely that the
&& ! defined(USE_NEXT_CTYPE)
is necessary, because HAS_ISASCII likely won't be defined, but this
makes sure that this doesn't change the previous behavior.
|
| |
|
|
|
|
|
|
|
|
| |
handy.h contains a macro that reads a hex digit and returns its value,
with fewer branches than a naive implementation would use. This commit
just copies and modifies it to create two macros for
1) just converting the hex value, without advancing the input; and
2) doing the same for an octal value.
|
|
|
|
|
| |
This macro requires the input to be a hex digit, without testing. It is
prudent to assert that under DEBUGGING.
|
|
|
|
| |
Future commits will want this available outside utf8.h
|