| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
This encapsulates a common paradigm, making sure that it is done
correctly for the platform's size.
|
|
|
|
|
| |
Mostly in comments and docs, but some in diagnostic messages and one
case of 'or die die'.
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the (almost) duplicate code in this function to display
mnemonics for control characters that have them. The reason the two
pieces of code aren't precisely the same is that the other function also
uses \b as a mnemonic for backspace. Using all possible mnemonics is
desirable, so a flag is added for pv_uni_display to now use \b. This is
now by default enabled in double-quoted strings, but not regex patterns
(as \b there means something quite different except in character classes).
B.pm is changed to expect \b.
|
| |
|
|
|
|
|
|
|
| |
These macros were missed in dd1a3ba7882ca70c1e85b0fd6c03d07856672075
and 059703b088f44d5665f67fba0b9d80cad89085fd.
Using them would cause things to fail to compile
|
|
|
|
| |
These are needed only to allow some modules to stay updated with blead.
|
|
|
|
| |
It makes things a little clearer.
|
| |
|
|
|
|
|
| |
Higher has been reserved for core use, and a future commit will want to
finally do this.
|
|
|
|
|
|
|
| |
At the moment the _ASSERT_() is the one which has been showing large
expansions. Change so it doesn't do anything if PERL_SMALL_MACRO_BUFFER
is defined. That means various other calls that use
PERL_SMALL_MACRO_BUFFER can be simplified to not use it.
|
| |
|
|
|
|
| |
I forgot an arg in a macro it calls.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a safer version of UTF8SKIP for use when the input could be
possibly malformed. It uses strnlen() to not read past a NUL in the
input. Since Perl adds NULs to the end of SV's, this will likely
prevent reading beyond the end of a buffer.
A still safer version could be written that doesn't look for just a NUL,
but any unexpected byte, and stops just before that. I suspect that is
overkill, and since strnlen() can be very fast, I went with this
approach instead. Nothing precludes adding another version that does
this full checking
|
| |
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
| |
This can be derived from other values, removing an EBCDIC dependency
|
|
|
|
|
| |
This variable can be defined from the same base in both UTF-8 and
UTF-EBCDIC, and doing so eliminates an EBCDIC dependency.
|
| |
|
|
|
|
| |
The called macro does the cast already
|
|
|
|
|
| |
By doing an '| 0' with a parameter in a macro expansion, a C syntax
error will be generated. This is free protection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It has been deprecated since 5.26 to use various macros that deal with
UTF-8 inputs but don't have a parameter indicating the maximum length
beyond which we should not look. This commit changes all such macros,
as threatened in existing documentation and warning messages, to have an
extra parameter giving the length.
This was originally scheduled to happen in 5.30, but was delayed because
it broke some CPAN modules, and there wasn't really a good way around
it. But now that Devel::PPPort 3.54 is out, ppport.h has new facilities
for getting modules making these changes to work with older Perl
releases.
|
|
|
|
|
| |
A function name with a leading underscore is not legal in C. Instead
add a suffix to differentiate this name from an otherwise identical one.
|
|
|
|
| |
Leading underscored name are reserved for the C implementers
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
For well formed input, there is no change. But for malformed it wasn't
returning the documented length when warnings were enabled, and not
always the documented value when they were disabled.
This is implemented as an inline function, called from both the macro
and the Perl_ form.
Devel::PPPort has sufficient tests for this.
|
|
|
|
|
|
|
| |
This was due to UTF8_SAFE_SKIP(s, e) not allowing s to be as large as e,
and there are legitimate cases where it can be. This commit hardens the
macro so that it never reads above e-1, returning 0 if it otherwise
would be required to. The assertion is changed to 's <= e'.
|
|
|
|
| |
This version of UTF8SKIP refuses to advance beyond the end pointer
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the most obvious and easy things that are no longer needed
since regexes no longer use swashes at all.
tr/// continues, for the time being, to use swashes, so not all swash
handling is removable now. But tr/// doesn't use inversion lists, and
so a bunch of code is ripped out here. Other code could have been, but
I did only the relatively easy stuff. The rest can be ripped out all at
once when tr/// is stops using swashes.
|
|
|
|
|
|
|
|
|
| |
MAX_LEGAL_CP can end up as int depending on the ranges of the types
involved, causing a type mismatch on the format in cp_above_legal_max.
By adding the cast to the macro definition we both prevent the type
mismatch on the format, but also may allow some static analysis tool to
detect comparisons against signed types, which is likely an error.
|
| |
|
|
|
|
| |
This hides an internal detail
|
|
|
|
|
| |
This replaces a complicated trie with a dfa. This should cut down the
number of conditionals encountered in parsing many code points.
|
|
|
|
|
|
|
| |
It was a macro that used a trie. This changes to use the dfa
constructed in previous commits. I didn't bother with taking
measurements. A dfa should have fewer conditionals for many code
points.
|
|
|
|
|
|
|
| |
It was a macro that used a trie. This changes to use the dfa
constructed in previous commits. I didn't bother with taking
measurements. A dfa should require fewer conditionals to be executed
for many code points.
|
| |
|
|
|
|
|
| |
The Perl_utf8n_to_uvchr_buf() version of this function has an assert;
this adds it as well to the macro that bypasses the function.
|
|
|
|
|
|
|
| |
This symbol somehow got deleted, and it really shouldn't have been.
This should not go in perldelta, as we don't want people to be using
this ancient symbol who aren't already are.
|
|
|
|
|
| |
This is propmpted by Encode's needs. When called with the proper
parameter, it returns any warnings instead of displaying them directly.
|
|
|
|
|
|
| |
This UTF-8 to code point translator variant is to meet the needs of
Encode, and provides XS authors with more general capability than
the other decoders.
|
|
|
|
|
| |
An earlier commit had split some comments up. And this adds clarifying
details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It somehow dawned on me that the code is incorrect for
warning/disallowing very high code points. What is really wanted in the
API is to catch UTF-8 that is not necessarily portable. There are
several classes of this, but I'm referring here to just the code points
that are above the Unicode-defined maximum of 0x10FFFF. These can be
considered non-portable, and there is a mechanism in the API to
warn/disallow these.
However an earlier standard defined UTF-8 to handle code points up to
2**31-1. Anything above that is using an extension to UTF-8 that has
never been officially recognized. Perl does use such an extension, and
the API is supposed to have a different mechanism to warn/disallow on
this.
Thus there are two classes of warning/disallowing for above-Unicode code
points. One for things that have some non-Unicode official recognition,
and the other for things that have never had official recognition.
UTF-EBCDIC differs somewhat in this, and since Perl 5.24, we have had a
Perl extension that allows it to handle any code point that fits in a
64-bit word. This kicks in at code points above 2**30-1, a number
different than UTF-8 extended kicks in on ASCII platforms.
Things are also complicated by the fact that the API has provisions for
accepting the overlong UTF-8 malformation. It is possible to use
extended UTF-8 to represent code points smaller than 31-bit ones.
Until this commit, the extended warning/disallowing was based on the
resultant code point, and only when that code point did not fit into 31
bits.
But what is really wanted is if extended UTF-8 was used to represent a
code point, no matter how large the resultant code point is. This
differs from the previous definition, but only for EBCDIC platforms, or
when the overlong malformation was also present. So it does not affect
very many real-world cases.
This commit fixes that. It turns out that it is easier to tell if
something is using extended-UTF8. One just looks at the first byte of a
sequence.
The trailing part of the warning message that gets raised is slightly
changed to be clearer. It's not significant enough to affect perldiag.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next commit will fix the detection of using Perl's extended UTF-8 to
be more accurate. The current name for various flags in the API is
somewhat misleading. What is really wanted to know is if extended UTF-8
was used, not the value of the resultant code point.
This commit basically does
s/ABOVE_31_BIT/PERL_EXTENDED/g
It also similarly changes the name of a hash key in APItest/t/utf8.t.
This intermediary step makes the next commit easier to read.
|