| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This effectively reverts 3ece276e6c0.
It turns out this was a bad idea to make U mean the non-native official
Unicode code points. It may seem to make sense to do so, but broke
multiple CPAN modules which were using U the previous way.
This commit has no effect on ASCII-platform functioning.
|
|
|
|
| |
This code is identical to a few lines above it
|
|
|
|
|
|
|
|
| |
This fixes GH #19091
This is from a rebasing error. The two variable assignments were
supposed to have been superceded by the first one in the function, and
these removed, but they didn't get removed, until now
|
| |
|
|
|
|
|
| |
This commit allows this function to be called with NULL parameters when
the result of these is not needed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code this commit removes made sense when we were using swashes, and
we had to go out to files on disk to find the answers. It used
knowledge of the Unicode character database to skip swaths of scripts
which are caseless.
But now, all that information is stored in C arrays that will be paged
in when accessed, which is done by a binary search. The information
about those swaths is in those arrays. The conditionals removed here
are better spent in executing iterations of the search in L1 cache.
|
|
|
|
|
|
|
|
|
| |
This adds a new function for changing the case of an input code point.
The difference between this and the existing function is that the new
one returns an array of UVs instead of a combination of the first code
point and UTF-8 of the whole thing, a somewhat awkward API that made
more sense when we used swashes. That function is retained for now, at
least, but most of the work is done in the new function.
|
|
|
|
|
|
|
| |
The fancy wrapper macro that does extra things was being used, when all
that is needed is the bare libc function. This is because the code
calling it already wraps it, so avoids calling it unless the bare
version is what is appropriate.
|
| |
|
|
|
|
|
|
| |
Instead of destroying the input by first swapping the bytes, this calls
a base function with the order to use. The non-reverse function is
changed to call the base function with the non-reversed order.
|
|
|
|
|
|
| |
A previous commit has simplified uvoffuni_to_utf8_flags() so that it is
hardly more than the code in this function. So strip out the code and
replace it by a call to uvoffuni_to_utf8_flags().
|
|
|
|
|
| |
This is unnecessary in a .c file, and the code it referred to has been
moved away.
|
| |
|
|
|
|
|
|
|
|
|
| |
Now that the DFA is used by the only callers to this to eliminate the
need to check for e.g., wrong continuation bytes, this function can be
refactored to use a switch statement, which makes it clearer, shorter,
and faster.
The name is changed to indicate its private nature
|
|
|
|
| |
This makes it use the fast DFA for this functionality.
|
|
|
|
| |
There are new macros that suffice to make the determination here.
|
|
|
|
| |
This is now generated by regcharclass.pl
|
|
|
|
| |
The new mname is more mnemonic
|
|
|
|
|
| |
These macros don't need to be macros, as they each are only called from
one place, and that isn't likely to change.
|
| |
|
|
|
|
|
|
|
| |
The previous commit for EBCDIC paved the way for moving some checks for
a code point being for Perl extended UTF-8 out of places where they
cannot succeed. The resultant simplifications more than compensate for
the two extra case statements added by this commit.
|
|
|
|
|
|
| |
Simply by adjusting the case statement labels, and adding an extra case,
the code can avoid checking for a problem on EBCDIC boxes when it would
be impossible for the problem to exist.
|
|
|
|
|
|
|
|
|
|
|
| |
Having a fast UVOFFUNISKIP() allows this function be be refactored to
simplify it.
This commit continues to shortchange large code points and EBCDIC by a
little. For example, it checks if a 4-byte character is above Unicode,
but no 4-byte characters fit that description in UTF-EBCDIC. This will
be fixed in the next commit, which will prepare for further
enhancements.
|
|
|
|
| |
This will make more sense of the next commit
|
|
|
|
|
|
|
|
|
| |
This specialized functionality is used to check the validity of Perl's
extended-length UTF-8, which has some ideosyncratic characteristics from
the shorter sequences. This means this function doesn't have to
consider those differences. It will be used in the next commit to avoid
some work, and to eventually enable is_utf8_char_helper() to be
simplified.
|
|
|
|
|
|
|
|
| |
One of these functions is now only called from the other, and there is
significant overlap in their logic.
This commit refactors them into one resulting function, which is half
the code, and more straight forward.
|
|
|
|
|
| |
The sequences here aren't UTF-8, but UTF, since they are I8 in
UTF-EBCDIC terms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code has hard-coded into it the UTF-8 for the highest representable
code point for various platforms and word sizes. The algorithm is to
compare the input sequence to verify it is <= the highest. But the tail
of each of them has some number of the highest possible continuation
byte. We need not look at the tail, as the input cannot be above the
highest possible. This commit shortens the highest string constants and
exits the loop when we get to where the tail used to be.
This change allows for the complete removal of the code that is #ifdef'd
out that would be used when we allow core to use code points up to
UV_MAX.
|
|
|
|
| |
This makes the code easier to read.
|
|
|
|
| |
This macro is preferred to sizeof()
|
|
|
|
|
|
|
|
|
| |
I've always been uncomfortable with the input constraints this function
had. Now that it has been refactored into using a switch(), new cases
for full generality can be added without affecting performance, and
some conditionals removed before calling it.
The function is renamed to reflect its more generality
|
|
|
|
|
| |
The insight in the previous commit allows this function to become much
more compact.
|
|
|
|
|
| |
I hadn't previously noticed the underlying symmetry between the
platforms.
|
|
|
|
| |
UTF_MIN_CONTINUATION_BYTE is clearer for use in some contexts
|
|
|
|
|
| |
This changes only portions of the capitalization, and the new version is
more in keeping with other function names.
|
|
|
|
|
|
|
|
| |
The previous commit added a convenient place to create a symbol to
indicate that the UTF-8 on this platform includes Perl's nearly-double
length extension. The platforms this isn't needed on are 32-bit ASCII
ones. This symbol allows removing one place where EBCDIC need
be considered, and future commits will use it as well.
|
|
|
|
|
|
| |
This macro has a corresponding, older, name for the non-UTF-8 case. It
makes sense to use the same paradigm, and move the definitions together
so that the comments for one don't have to be repeated for the other.
|
|
|
|
|
|
|
|
|
| |
This commit makes is_HANGUL_ED_utf8_safe() return 0 unconditionally on
EBCDIC platforms. This means its callers don't have to care what
platform is running. Change the two callers to take advantage of this
The commit also changes the description of the macro to be slightly more
accurate
|
|
|
|
| |
The cast does this with out extra instruction
|
|
|
|
|
|
|
|
|
| |
In C the comparison of two pointers is only legal if both point to
within the same object, or to a virtual element one above the high edge
of the object.
The previous code was doing an addition potentially outside that range,
and so the results would be undefined.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This just detabifies to get rid of the mixed tab/space indentation.
Applying consistent indentation and dealing with other tabs are another issue.
Done with `expand -i`.
* vutil.* left alone, it's part of version.
* Left regen managed files alone for now.
|
|
|
|
|
|
| |
The names were intended to force people to not use them outside their
intended scopes. But by restricting those scopes in the first place, we
don't need such unwieldy names
|
| |
|
|
|
|
|
|
|
|
| |
Many of the files in perl are for one thing only, and hence their
embedded documentation will be for that one thing. By creating a hash
here of them, those files don't have to worry about what section that
documentation goes under, and so it can be completely changed without
affecting them.
|
| |
|
|
|
|
|
|
|
|
|
| |
For: https://github.com/Perl/perl5/pull/18201
Committer: Samanta Navarro is now a Perl author.
To keep 'make test_porting' happy: Increment $VERSION in several files.
Regenerate uconfig.h via './perl -Ilib regen/uconfig_h.pl'.
|
|
|
|
|
| |
I didn't mean to apply 8505db87404436956f86e54885cacd73840801c0 just
yet, but having done so, there is a change required in a link
|
|
|
|
| |
And add it to perlintern, and fix grammar in its pod
|