| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This will be used in future commits
|
|
|
|
| |
this makes sure it is never negative
|
|
|
|
| |
Added a SSize_t cast to the sizeof() call.
|
|
|
|
| |
To make sure at compile time that its argument is a ptr
|
|
|
|
|
|
|
| |
This macro relied on a now-removed other macro in 2019,
216dc346ceeeb9b6ba0fdd470ccfe4f8b2a286c4. Fix it and add tests.
This bug was caught by Devel::PPPort
|
|
|
|
|
|
|
| |
These macros asserted both that the passed in parameter occupied no more
than a byte, and that it wasn't a pointer. But pointers occupy more
than a byte, so if it passes the first check, meaning it occupies only a
byte, it will necessarily pass the second, making that check unnecessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macros use
(x) | 0
to get a compiler error if x is a pointer rather than a value. This was
instituted because there was confusion in them as to what they were
called with.
But the purpose of the paradigm wasn't obvious to even some experts; it
was documented in every file in which it was used, but not at every
occurrence. And, not every compiler can cope with them, it turns out.
Making the paradigm into a macro, which this commit does, makes the uses
self-documenting, albeit at the expense of cluttering up the macro
definition somewhat; and allows the mechanism to be turned off if
necessary for some compilers. Since it will be enabled for the majority
of compilers, the potential bugs will be caught anyway.
|
| |
|
|
|
|
|
|
| |
Instead of destroying the input by first swapping the bytes, this calls
a base function with the order to use. The non-reverse function is
changed to call the base function with the non-reversed order.
|
|
|
|
| |
This makes it use the fast DFA for this functionality.
|
| |
|
| |
|
|
|
|
|
|
| |
UTF8_IS_NONCHAR_GIVEN_THAT_NON_SUPER_AND_GE_PROBLEMATIC() is defined just
for backward compatability (though I don't think anyone uses it).
Swap which macro is the base level that the other is defined in terms of
|
|
|
|
|
|
| |
This uses macros recently introduced to remove an EBCDIC dependency and
make the definition simpler. It now uses the DFA, which should speed up
the non-edge case uses.
|
|
|
|
|
| |
The reorganization in the previous commit revealed some undocumented
public macros
|
|
|
|
|
|
| |
This moves the defines for things like surrogates, non-character code
points, etc. to a more logical order, with like adjacent to like, and
before they are otherwise used in the file.
|
|
|
|
| |
By generalizing a macro, we can make it serve both ASCII and EBCDIC
|
|
|
|
|
| |
These two bytes are useful to know in some situations. This commit
changes a couple such places to use the first macro.
|
|
|
|
| |
This is just so that things are clearer to the reader
|
|
|
|
| |
UTF_MIN_CONTINUATION_BYTE is clearer for use in some contexts
|
|
|
|
| |
For convenient code reading
|
|
|
|
|
|
|
|
| |
The previous commit added a convenient place to create a symbol to
indicate that the UTF-8 on this platform includes Perl's nearly-double
length extension. The platforms this isn't needed on are 32-bit ASCII
ones. This symbol allows removing one place where EBCDIC need
be considered, and future commits will use it as well.
|
|
|
|
|
|
| |
The previous commit removed a macro that the comments for this refer to
in explaining its derivation. So use an alternative, that is actually
clearer.
|
|
|
|
|
|
|
| |
Now that previous commits have made it fast to find the position of the
first set bit in a word, we can use a forumla to find how many bytes the
UTF-8 of that will occupy. This allows for simplification of this
macro, removing several conditionals
|
|
|
|
|
|
|
|
|
|
| |
This macro will calculate at compile time, if passed a compile-time
constant, how many UTF-8 bytes are required to represent the parameter.
The macro is a helper which works fine except for edge cases, which a
wrapper is needed to handle.
The commit changes one instance to use this new macro
|
|
|
|
|
|
|
| |
This moves a #define into the common code for ASCII and EBCDIC machines.
It adds a bunch of comments about the value that I wish I hadn't had to
figure out for myself.
|
|
|
|
|
|
| |
This macro has a corresponding, older, name for the non-UTF-8 case. It
makes sense to use the same paradigm, and move the definitions together
so that the comments for one don't have to be repeated for the other.
|
|
|
|
|
| |
A symbol introduced in a previous commit allows this internal macro to
only need a single version, suitable for either EBCDIC or ASCII.
|
|
|
|
| |
This is then used in regcomp.c to avoid an #ifdef EBCDIC
|
|
|
|
|
| |
This info is needed in one other place; doing it here means only
specifying it once.
|
|
|
|
|
| |
This is more clearly named for various uses in this file. It has an
unwieldy length, but is unlikely to be used outside it.
|
|
|
|
|
|
|
|
| |
A slight change to this very low level macro (hence called a lot)
removes the need for a conditional, and causes it to work on single-byte
UTF-8 characters on ASCII platforms.
The definition is also moved to a more logical place in the file
|
|
|
|
| |
This is now defined before first use
|
|
|
|
|
|
| |
Future commits would otherwise make the expansion of this macro too
complicated for some C compilers. Use a less general internal helper
function to avoid that.
|
|
|
|
|
| |
This allows the removal of a conditional in a very low level (called a
lot) macro
|
|
|
|
| |
Reorder the clauses to check first before dereferencing
|
|
|
|
|
| |
This uses inRANGE() with mnemonics to make it clearer with no increase
in the number of conditionals
|
|
|
|
| |
This leads to a single conditional instead of two.
|
|
|
|
|
|
|
|
|
|
| |
This adds branch prediction and re-orders so that an unlikely to succeed
test is done before the likely to succeed one, so that the latter
usually doesn't need to be executed. Since both conditions must succeed
for the entire expression to succeed, this doesn't change what the whole
expresson matches.
s# Please enter the commit message for your changes. Lines starting
|
|
|
|
|
|
|
|
|
|
|
| |
This just detabifies to get rid of the mixed tab/space indentation.
Applying consistent indentation and dealing with other tabs are another issue.
Done with `expand -i`.
* vutil.* left alone, it's part of version.
* Left regen managed files alone for now.
|
| |
|
|
|
|
|
|
|
|
| |
Many of the files in perl are for one thing only, and hence their
embedded documentation will be for that one thing. By creating a hash
here of them, those files don't have to worry about what section that
documentation goes under, and so it can be completely changed without
affecting them.
|
|
|
|
| |
C<L</foo>> renders better in places than L</C<foo>>
|
| |
|
| |
|
|
|
|
|
|
| |
It turns out that this macro would have failed to compile since
commit 538b546eb0f252250a30c08e6af47d0ea7433fa1, in October 2013. So it
is clear no one is using it.
|
|
|
|
|
|
|
|
| |
nBIT_MAX was used instead of nBIT_UMAX
from d223e1ea9ae864c0e563187f1e76 changes
note: at first glance it seems that
nBIT_UMAX is an alias for nBIT_MASK
|
|
|
|
|
| |
It is likely that the data will be well-formed Unicode, and not one of
its special characters, like surrogates or non-characters, nor NUL.
|
|
|
|
|
| |
This encapsulates a common paradigm, making sure that it is done
correctly for the platform's size.
|
|
|
|
|
| |
Mostly in comments and docs, but some in diagnostic messages and one
case of 'or die die'.
|