| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The recent series of commits on handy.h causes x2p to not compile.
These commits had some differences from what I submitted, in that they
moved the new table to a new header file instead of the submitted
perl.h. Unfortunately, this bypasses code in perl.h that figures
out about duplicate definitions, and externs, and so fails on programs
that include handy.h but not perl.h.
This patch changes things so that the table lookup is not used unless
perl.h is included. This is essentially my original patch, but adding
an #include of the new header file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds *_L1() macros for character class lookup, using table
lookup for O(1) performance. These force a Latin-1 interpretation on
ASCII platforms.
There were a couple existing macros that had the suffix U for Unicode
semantics. I thought that those names might be confusing, so settled on
L1 as the least bad name. The older names are kept as synonyms for
backward compatibility. The problem with those names is that these are
actually macros, not functions, and hence can be called with any int,
including any Unicode code point. The U suffix might be mistaken for
indicating they are more general purpose, whereas they are really only
valid for the latin1 subset of Unicode (including the EBCDIC isomorphs).
When called with something outside the latin1 range, they will return
false.
This patch necessitated rearranging a few things in the file. I added
documentation for several more macros, and intend to document the rest.
(This commit was modified from its original form by Steffen.)
|
|
|
|
|
| |
This macro is clearer as to intent over isALNUM, and isn't confusable
with isALNUMC. So document it primarily.
|
| |
|
|
|
|
|
| |
There are a number of macros missing from the documentation. This helps
me figure out which ones.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the macros whose names end in _A to use table lookup
except for the one (isASCII) which always has only one comparison.
The table is in l1_char_class_tab.h.
The advantage of this is speed. It replaces some fairly complicated
expressions with an O(1) look-up and a mask.
It uses the FITS_IN_8_BITS() macro to guarantee that the table bounds
are not exceeded. For legal inputs that are byte size, the optimizer
should get rid of this macro leaving only the lookup and mask.
(This commit was changed from its original form by Steffen.)
|
| |
|
|
|
|
| |
These macros return true only if the parameter is an ASCII character.
|
|
|
|
| |
as is better optimized and suitable for the purpose.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The name isALNUM() is problematic, as it is very close to isALNUMC(),
and doesn't mean exactly what most people might think. I presume the C
in isALNUMC stands for C language or libc, but am not sure. Others
don't know either. But in any event, isALNUM is different from the C
isalnum(), in that it matches the Perl concept of \w, which differs from
the C definition in exactly one place. Perl includes the underscore
character, '_'.
So, I'm adding a isWORDCHAR() macro for future code to use to be more
clear. I thought also about isWORD(), but I think confusion can arise
from thinking that means a whole word. isWORDCHAR_L1() matches in the
Latin1 range, to be equivalent to isALNUMU(). The motivation for using
L1 instead of U will be explained in a commit message for the other L1
macros that are to be added.
|
| |
|
| |
|
|
|
|
|
| |
The only change here is that I sorted these #defines within their
groups, to make it much easier to follow what's going on.
|
|
|
|
| |
It didn't include the Latin1 space components.
|
|
|
|
| |
It doesn't include NBSP
|
|
|
|
|
|
| |
The macro was using the ASCII definition, which doesn't include NEL nor
NBSP. But, libc contains the correct definition, which is usable on
EBCDIC since we don't worry about locales there.
|
|
|
|
|
|
| |
Commit 4125141464884619e852c7b0986a51eba8fe1636 improperly got rid of
EBCDIC handling, as it combined the ASCII and EBCDIC versions, but left
the result in the ASCII-only branch. Just move to the common code.
|
| |
|
|
|
|
| |
This is a synonym for isALNUMU
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this patch, if isASCII() is called with something like '256',
it would return true.
For some reason unknown to me, U64 is defined only inside the perl core.
However, the equivalent U64TYPE is known everywhere, so in the macro
that can be called outside of core, use that instead.
The commit log doesn't give a reason for not defining U64 outside of
core, and no tests in the suite fail when it is defined outside core.
But out of caution, I'm just doing this workaround instead of exposing
U64.
|
|
|
|
|
| |
EBCDIC platforms use isascii(), but is not in all libc's so better to
use our own.
|
|
|
|
|
| |
Previous documentation was wrong for EBCDIC platforms. This fixes that
and adds some more explanation.
|
|
|
|
|
|
| |
toUPPER() and toLOWER() were grouped with the character class functions
(in perlapi), to which they are related, but aren't the same. Create a
new heading for these.
|
|
|
|
|
|
|
|
|
|
| |
8 and 9 are not treated as alphas in parsing as opposed to illegal
octals.
This also adds tests to verify that 1-3 digits work in char classes.
I created an isOCTAL macro in case that lookup gets moved to a bit
field, as I plan to do later, for speed.
|
|
|
|
|
|
|
| |
This makes sure that the index into the arrays used to change between
lower and upper case will fit into their bounds; returning an error
character if not. The check is likely to be optimized out if the index
is stored in 8 bits.
|
|
|
|
|
|
|
| |
This macro is designed to be optimized out if the argument is
byte-length, but otherwise to be a bomb-proof way of making sure that
the argument occupies only 8 bits or fewer in whatever storage class it
is in.
|
|
|
|
| |
New macro lex_stuff_pvs(), wrapping lex_stuff_pvn() for literal strings.
|
|
|
|
|
|
| |
If a bug is found in the handy.h macros, it may be necessary to fix the
duplicates in the cpan module. This may require filing a bug report
there.
|
|
|
|
| |
Refactor the macro append_flags() in dump.c to use it.
|
|
|
|
|
|
|
|
|
|
| |
The function perl_ebcdic_control() is unnecessary, as the toCTRL macro
that calls it can be changed to just map EBCDIC to ASCII first, and then
doing the normal procedure.
This means that EBCDIC and ASCII will no longer diverge. Currently,
EBCIDIC gives a syntax error for inputs outside its domain, whereas the
ASCII version accepts some of them.
|
|
|
|
|
|
|
|
| |
Prior to this patch, there is a potential bug in these two macros, in
which, if they are called with a signed character outside the ASCII
range, it will be negative and they always returned true for negative.
Casting the parameter to an unsigned should fix that by having it be
interpreted as a number above the ASCII range.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ever since perl 4.000 we've only set the POSIX process name via
argv[0]. Unfortunately on Linux the POSIX name isn't used by utilities
like top(1), ps(1) and killall(1).
Now when we set C<$0 = "hello"> both C<qx[ps h $$]> (POSIX) and
C<qx[ps hc $$]> (legacy) will say "hello", instead of the latter being
"perl" as was previously the case.
See also the March 9 2010 thread "Why doesn't assignment to $0 on
Linux also call prctl()?" on perl5-porters.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bool b = (bool)some_int
doesn't necessarily do what you think. In some builds, bool is defined as
char, and that cast's behaviour is thus undefined. So this line in mg.c:
const bool was_temp = (bool)SvTEMP(sv);
was actually setting was_temp to false even when the SVs_TEMP flag was set.
Fix this by replacing all the (bool) casts with a new cBOOL() cast macro
that (hopefully) does the right thing.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to now just about anything has been legal for a character name in
\N{...}. This means that legal code was broken by having \N{3,4} for
example mean [^\n]{3,4}. Such code doesn't come from standard
charnames, but from legal custom translators.
This patch deprecates "unreasonable" names. handy.h is changed by the
addition of macros that taken together define the names we deem
reasonable, namely alpha beginning with alphanumerics and some
punctuations as continuations.
toke.c is changed to parse each name and to raise a warning if any
problematic characters are found.
Some tests and diagnostic documentation are also included.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authors: John Peacock, David Golden and Zefram
The goal of this mega-patch is to enforce strict rules for version
numbers provided to 'package NAME VERSION' while formalizing the prior,
lax rules used for version object creation. Parsing for use() is
unchanged.
version.pm adds two globals, $STRICT and $LAX, containing regular
expressions that define the rules. There are two additional functions
-- version::is_strict and version::is_lax -- that test an argument
against these rules.
However, parsing of strings that might contain version numbers is done
in core via the Perl_scan_version function, which may be called during
compilation or may be called later when version objects are created by
Perl_new_version or Perl_upg_version.
A new helper function, Perl_prescan_version, has been added to validate
a string under either strict or lax rules. This is used in toke.c for
'package NAME VERSION' in strict mode and by Perl_scan_version in lax
mode. It matches the behavior of the verison.pm regular expressions,
but does not use them directly.
A new test file, comp/packagev.t, validates strict and lax behaviors of
'package NAME VERSION' and 'version->new(VERSION)' respectively and
verifies their behavior against the $STRICT and $LAX regular
expressions, as well. Validating these two implementation should help
ensure they each work as intended.
Other files and tests have been modified as necessary to support these
changes.
There is remaining work to be done in a few areas:
* documenting all changes in behavior and new functions
* determining proper treatment of "," as decimal separators in
various locales
* updating diagnostics for new error messages
* porting changes back to the version.pm distribution on CPAN,
including pure-Perl versions
|
|
|
|
|
|
| |
Regenerated after backporting 88a6f4fc380d30c40
Please *do* remember to notify the metaconfig folk when directly patching Configure
Bring back Missing parts
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hi,
Using the attached patch to the blead source (as of a few hours ago), I can
build perl with the following OS/compiler/make combos.
On 32-bit XP:
MSVC++ 7.0 / dmake (uses win32/makefile.mk)
MSVC++ 7.0 / nmake (uses win32/Makefile)
Borland C++ 5.5.1 / dmake
mingw.org's gcc-4.3.0 / dmake
mingw.org's gcc-3.4.5 / dmake
mingw-w64.sf's 32-bit gcc-4.4.3 / dmake
(There's a bug with that last compiler on XP.
The perl it builds on XP hangs on XP, but runs ok if copied across to Vista.
I think this is unrelated to the patches - probably even unrelated to perl.
Without these patches perl will not even build using that last compiler.)
On 64-bit Vista:
32-bit MSVC++ 7.0 / nmake (uses win32/Makefile)
32-bit MSVC++ 7.0 / dmake (uses win32/makfile.mk)
32-bit Borland C++ 5.5.1 / dmake
mingw.org's 32-bit gcc-4.4.0 / dmake
mingw.org's 32-bit gcc-3.4.5 / dmake
mingw-w64.sf's 32-bit gcc-4.4.3 / dmake
mingw-w64.sf's 64-bit gcc-4.4.3 / dmake
mingw-w64.sf's 64-bit x86_64-w64-mingw32-gcc-4.4.3 / dmake
64-bit MicrosoftPlatform SDK for Windows Server 2003 R2 / dmake (uses
win32/makefile.mk)
64-bit MicrosoftPlatform SDK for Windows Server 2003 R2 / nmake (uses
win32/Makefile)
Not all of those builds pass all tests - but where the removal of the
patches still permits perl to build, the same tests still fail. That is,
*nothing* is lost by including these patches - but there are significant
gains.
Each of the above builds was done according to the normal win32
configuration parameters - ie multi-threaded, non debug. No unusual config
settings were applied. (I did build one debug perl on Vista using
mingw-w64.sf's 32-bit gcc-4.4.3 and it built fine.)
Please feel free to apply these patches (with or without modification) -
and, yes, you're more than welcome to blame me if they cause any breakages
;-)
Of course, some of those compilers (Borland, Microsoft, and the compilers
from mingw.org) already build perl *without* having to apply any patches.
It's just the other compilers that need the patches. The purpose of testing
with Borland, Microsoft, and the mingw.org compilers is just to check that
these patches don't break them.
As a final check, I've done a build on my aging linux (mandrake-9.1) box,
gcc-3.2.2. I built with '-des -Duselongdouble -Duse64bitint -Dusedevel'. No
problem with that, either.
If there's additional testing requirements please let me know, and I'll try
to oblige.
I believe the patch applied successfully for me - see below my sig for the
output.
Cheers,
Rob
Rob@desktop2 ~/GIT/blead
$ patch -p0 <blead_diff.diff
patching file dist/threads/threads.xs
patching file handy.h
patching file cpan/ExtUtils-MakeMaker/lib/ExtUtils/MM_Win32.pm
patching file op.c
Hunk #1 succeeded at 5774 (offset 47 lines).
patching file pp_pack.c
patching file util.c
Hunk #1 succeeded at 5366 (offset -28 lines).
patching file win32/makefile.mk
patching file win32/perlhost.h
patching file win32/win32.c
patching file win32/win32.h
patching file README.win32
patching file XSUB.h
|
| |
|
|
|
|
|
|
| |
Perl_deprecate was not part of the public API, and did not have a deprecate()
shortcut macro defined without -DPERL_CORE. Neither codesearch.google.com nor
CPAN::Unpack show any users outside the core.
|
| |
|
|
|
|
|
|
| |
perl.c has the last mentions of PERL_MEM_LOG_ENV*. drop them too.
(rgs: plus some in handy.h's comments too)
|
|
|
|
|
|
|
| |
Most users who want PERL_MEM_LOG want the default implementation,
give it to them. Users providing their own implementation can
obtain current behavior by adding -DPERL_MEM_LOG_NOIMPL.
Frankly, the average user probably wants _ENV by default too.
|
| |
|
|
|
|
|
|
|
|
| |
Other OS parts will follow
From: Steve Peters <steve@fisharerojo.org>
Date: Wed, 25 Mar 2009 10:54:51 -0500
Message-ID: <fd7a59d30903250854q53311f48o6744df7cbfa1d03d@mail.gmail.com>
|