| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
These character constants were used only for a special edge case in trie
construction that has been removed -- except for one instance in
regexec.c which could just as well be some other character.
|
|
|
|
| |
These will be used in a future commit
|
| |
|
|
|
|
|
|
|
|
| |
This commit changes the code generated by the macros so that they work
right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs. THEY
ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
onto the target platform, and regcharclass.pl can be run there,
generating macros with correct UTF-8.
|
|
|
|
| |
These will be used in future commits
|
|
|
|
|
| |
These messages say the output number is Unicode, but it is really
native, so change to saying is 0xXXXX.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check for the nul char in pathnames and string arguments to
syscalls, return undef and set errno to ENOENT.
Added to the io warnings category syscalls.
Strings with embedded \0 chars were prev. ignored in the syscall but
kept in perl. The hidden payloads in these invalid string args may cause
unnoticed security problems, as they are hard to detect, ignored by
the syscalls but kept around in perl PVs.
Allow an ending \0 though, as several modules add a \0 to
such strings without adjusting the length.
This is based on a change originally by Reini Urban, but pretty much
all of the code has been replaced.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It's possible to programmatically determine almost all the files and
directories which will be created in lib/ by building the extensions.
Hence add a new script regen/lib_cleanup.pl to do this.
This saves having to manually update lib/.gitignore to reflect changes in
the build products of extensions, which has become a small but reoccurring
instance of scut-work.
|
|
|
|
|
| |
We have to stop using File::Compare's compare(), as it doesn't return
diagnostics about what went wrong.
|
|
|
|
|
|
|
| |
The first commit of this topic branch added a dummy 0 element to the end
of certain inversion lists to work around an off-by-one error. This
commit makes the necessary changes to stop that error, and to remove
the dummy element. SvCUR() and invlist_len() now are kept in sync.
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 18505f093a44607b687ae5fe644872f835f66313, which
reverted 241136e0ed70738cccd6c4b20ce12b26231f30e5, thus reinstating the
latter commit. It turns out that the error being chased down was not
due to this commit.
Its original message was:
The inversion lists that are compiled into a C header are now const.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 67434bafe4f2406e7c92e69013aecd446c896a9a, which
reverted 4fdeca7844470c929f35857f49078db1fd124dbc, thus reinstating the
latter commit. It turns out that the error being chased down was not
due to this commit.
Its original message was:
This commit continues the process of separating the header area of
inversion lists from the body. 2 more fields are moved out of the
header portion of the inversion list, and into the header portion of the
SV that contains it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inversion lists" "
This reverts commit de353015643cf10b437d714d3483c1209e079916 which
reverted 533c4e2f08b42d977e5004e823d4849f7473d2d0, thus reinstating it,
plus this commit adds a fix to get it to pass under Address Sanitizer.
The root cause of the problem is that there are two measures of the
length of an inversion list. One is SvCUR(), and the other is
invlist_len(). The original commit caused these to get off-by-one in
some cases. The ultimate solution is to only store one value, and
return the other one based off that. Rather than redo the whole branch,
I've taken an easier way out, which is to add a dummy element at the end
of some inversion lists, so that they aren't off-by-one. Then the other
patches from the original branch will be applied. Each will be
tested with Address Sanitizer. Then the work to fix the underlying
problem will be done.
The original commit's message was:
This commit is the first step to separating the header from the body of
inversion lists. Doing so will allow the compiled-in inversion lists to
be fully read-only.
To invert an inversion list, one simply unshifts a 0 to the front of it
if one is not there, and shifts off the 0 if it does have one.
The current data structure reserves an element at the beginning of each
inversion list that is either 0 or 1. If 0, it means the inversion list
begins there; if 1, it means the inversion list starts at the next
element. Inverting involves flipping this bit.
This commit changes the structure so that there is an additional element
just after the element that flips. This new element is always 0, and
the flipping element now says whether the inversion list begins at the
constant 0 element, or the one after that.
Doing this allows the flipping element to be separated in later commits
from the body of the inversion list, which will always begin with the
constant 0 element. That means that the body of the inversion list can
be const.
|
|
|
|
|
| |
Without this, regen/miniperlmain.pl could end up finding versions which are
out of date, and silently generate an incorrect miniperlmain.c
|
|
|
|
|
|
|
| |
As miniperlmain.c is now generated by ExtUtils::Miniperl (and not the other
way round), there's no reason to have an editor block in the generated file,
as it's not intended to be edited. Instead, add the "generated from" and
read-only headers to miniperlmain.c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now ExtUtils::Miniperl has the master version of {mini,}perlmain.c and is
checked into the repository. miniperlmain.c is now generated by a script
in regen/ which uses ExtUtils::Miniperl.
Tweak ExtUtils::Miniperl::writemain() to take an optional first argument,
a reference to a file handle. This permits the regen script to use the
regen_lib.pl functions for file opening/closing/renaming and TAP generation.
For now check in ExtUtils::Miniperl minimally modified from the version
generated by the former minimod.pl. The next commit will tidy it up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default the code in regen_lib compares the newly written file it has just
closed with the (assumed) existing file, and only overwrites the existing
file if the new file differs. This is a useful behaviour for regeneration
scripts. However, it's not ideal for build scripts called from the Makefile,
as make assumes that targets will be regenerated (and the timestamp touched).
So add an "always update" parameter for the use of Makefile invoked scripts,
such as autodoc.pl. If set, delete any existing file early (so that fatal
errors during the generation don't confuse the build by leaving an existing
stale file around), skip the comparison and skip the diagnostic output
listing the changed files.
Change autodoc.pl to set this parameter.
Correct a typo in an error message in regen_lib's open_new().
|
|
|
|
|
|
| |
Provide a local subroutine wrap(). Pass columns as its first parameter and
set $Text::Wrap::columns, as all uses of Text::Wrap::wrap() were setting
this variable.
|
|
|
|
|
| |
Use hash slices to avoid repeated typeglob dereferences on $fh.
In read_only_top() use a lexical to avoid repeated $args{lang} lookups.
|
|
|
|
|
|
|
| |
This reverts commit 533c4e2f08b42d977e5004e823d4849f7473d2d0.
This continues the backing out of this topic branch. A bisect shows
that the first commit exhibiting an error is the first one in the
branch.
|
|
|
|
|
|
|
| |
This reverts commit 4fdeca7844470c929f35857f49078db1fd124dbc.
This continues the backing out of this topic branch. A bisect shows
that the first commit exhibiting an error is the first one in the
branch.
|
|
|
|
|
|
|
| |
This reverts commit 241136e0ed70738cccd6c4b20ce12b26231f30e5.
This continues the backing out of this topic branch. A bisect shows
that the first commit exhibiting an error is the first one in the
branch.
|
|
|
|
| |
The inversion lists that are compiled into a C header are now const.
|
|
|
|
|
|
|
| |
This commit continues the process of separating the header area of
inversion lists from the body. 2 more fields are moved out of the
header portion of the inversion list, and into the header portion of the
SV that contains it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is the first step to separating the header from the body of
inversion lists. Doing so will allow the compiled-in inversion lists to
be fully read-only.
To invert an inversion list, one simply unshifts a 0 to the front of it
if one is not there, and shifts off the 0 if it does have one.
The current data structure reserves an element at the beginning of each
inversion list that is either 0 or 1. If 0, it means the inversion list
begins there; if 1, it means the inversion list starts at the next
element. Inverting involves flipping this bit.
This commit changes the structure so that there is an additional element
just after the element that flips. This new element is always 0, and
the flipping element now says whether the inversion list begins at the
constant 0 element, or the one after that.
Doing this allows the flipping element to be separated in later commits
from the body of the inversion list, which will always begin with the
constant 0 element. That means that the body of the inversion list can
be const.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
‘perl regen/warnings.pl tree’ would already generate the tree, but it
had to be run separately and then copied and pasted into perllexwarn.
Now regen/warnings.pl modifies perllexwarn in place as part of its
regeneration. The ‘tree’ command line argument will still cause the
tree to be output to STDOUT.
This causes the three missing experimental categories to be listed in
perllexwarn, resolving ticket #118369.
|
|
|
|
| |
Pod needs a commenting style distinct from C and Perl. (ie the empty string)
|
| |
|
|
|
|
|
|
|
| |
PERL_PACK_CAN_SHRIEKSIGN has been unconditionally defined for versions 5.9.x
and greater, and undefined for 5.8.x. As we are never going to need to
port changes back to maint-5.8 any more, eliminate all the 5.8.x related code
and the macro that supports it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use locale;
fc("\N{LATIN CAPITAL LETTER SHARP S}")
eq 2 x fc("\N{LATIN SMALL LETTER LONG S}")
should return true, as the SHARP S folds to two 's's in a row, and the
LONG S is an antique variant of 's', and folds to s. Until this commit,
the expression was false.
Similarly, the following should match, but didn't until this commit:
"\N{LATIN SMALL LETTER SHARP S}" =~ /\N{LATIN SMALL LETTER LONG S}{2}/iaa
The reason these didn't work properly is that in both cases the actual
fold to 's' is disallowed. In the first case because of locale; and in
the second because of /aa. And the code wasn't smart enough to realize
that these were legal.
The fix is to special case these so that the fold of sharp s (both
capital and small) is two LONG S's under /aa; as is the fold of the
capital sharp s under locale. The latter is user-visible, and the
documentation of fc() now points that out. I believe this is such an
edge case that no mention of it need be done in perldelta.
|
|
|
|
| |
These will be used in future commits
|
|
|
|
|
|
| |
I think it's clearer to use Copy. When I wrote this custom macro, we
didn't have the infrastructure to generate a UTF-8 encoded string at
compile time.
|
| |
|
|
|
|
| |
In preparation for future changes.
|
| |
|
|
|
|
|
| |
This was added in the 5.17 series so there's no code relying on its
current name. I think that the abbreviation is clearer.
|
|
|
|
|
|
| |
This now uses the U+ notation to indicate code points, which is
unambiguous not matter what the platform's character set is. (charnames
accepts the U+ notation)
|
|
|
|
|
| |
This was added in the 5.17 series, so can't be yet in the field; and
isn't needed.
|
|
|
|
|
| |
The data can now have comments, which are converted to C and passed
through
|
|
|
|
|
|
| |
Unicode character names can have dashes in them. These aren't accepted
in C macro names. Change so both blanks and the hyphen-minus are
converted to underscores.
|
| |
|
|
|
|
|
|
|
| |
Deleting a hash slice compiles 5 fewer ops, and executes 21 fewer than
looping over the keys to delete each in turn. Whilst this is arguably a
micro-optimisation, it does not increase obfuscation and is in code loaded
by nearly every Perl program, so feels worthwhile.
|
| |
|
|
|
|
| |
These should be mutually exclusive
|
|
|
|
| |
This will be used to deprecate uses of non-ASCII Pattern White Space
|