summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* lib/locale.t: Remove tests that need UTF-8 localeKarl Williamson2014-02-191-9/+0
| | | | | | | | | | These tests should not be here because they will only match under a UTF-8 locale, which happens to be the case on the machine I developed them on, but not necessarily always true, and so they are failing. Given the deadline is already past, I'm just removing them for now, and will re-add them later in another place in the file where we know we are using a UTF-8 locale.
* Make taint checking regex compile time instead of runtimeKarl Williamson2014-02-191-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | See discussion at https://rt.perl.org/Ticket/Display.html?id=120675 There are several unresolved items in this discussion, but we did agree that tainting should be dependent only on the regex pattern, and not the particular input string being matched against: "The bottom line is we are moving to the policy that tainting is based on the operation being in locale, without regard to the particular operand's contents passed this time to the operation. This means simpler core code and more consistent tainting results. And it lessens the likelihood that there are paths in the core that should taint but don't" This commit does the minimal work to change regex pattern matching to determine tainting at pattern compilation time. Simply put, if a pattern contains a regnode whose match/not match depends on the run-time locale, any attempt to match against that pattern will taint, regardless of the actual target string or runtime locale in effect. Given this change, there are optimizations that can be made to avoid runtime work, but these are deferred until later. Note that just because a regular expression is compiled under locale doesn't mean that the generated pattern will be tainted. It depends on the actual pattern. For example, the pattern /(.)/ doesn't taint because it will match exactly one character of the input, regardless of locale settings.
* lib/locale.t: Add some test namesKarl Williamson2014-02-191-92/+92
|
* lib/locale.t: Untaint before checking if next thing taintsKarl Williamson2014-02-191-0/+11
| | | | | The tests weren't testing what they purported to, as we should be sure to start with untained values to see if the operation taints.
* Correct number of tests in plan.James E Keenan2014-02-191-1/+1
|
* restore $PERL_OLD_VERSION to English.pmDavid Golden2014-02-182-2/+4
| | | | | | In the dark ages, when $^V replaced $] for $PERL_VERSION, $PERL_OLD_VERSION was added as a comment in the list of deprecated variable. Since $] is *not* deprecated, this commit restores it.
* [perl #121081] workaround different output on VMSTony Cook2014-02-191-5/+6
| | | | | VMS is a special snowflake, deal with the slightly different debugger output it produces.
* Skip locale test on OpenBSD, MirBSD and Bitrig tooChris 'BinGOs' Williams2014-02-171-1/+1
| | | | | | | | | | From the original ticket #115808 the following should produce "Use of uninitialized value in print at -e line 1." $ perl -wle 'use POSIX; print length setlocale POSIX::LC_ALL, "mtfnpy"' 16 So skip this test on OpenBSD, MirBSD and Bitrig
* Expand tabs in diagnostics.pmFather Chrysostomos2014-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise pod like this: The second situation is caused by an eval accessing a lexical subroutine that has gone out of scope, for example, sub f { my sub a {...} sub { eval '\&a' } } f()->(); is turned into this: The second situation is caused by an eval accessing a variable that has gone out of scope, for example, sub f { my $a; sub { eval '$a' } } f()->(); instead of this: The second situation is caused by an eval accessing a variable that has gone out of scope, for example, sub f { my $a; sub { eval '$a' } } f()->(); I don’t know how to test this without literally copying and pasting parts of diagnostics.pm into diagnostics.t. But I have tested it man- ually and it works.
* diagnostics.pm: Eliminate $WHOAMIFather Chrysostomos2014-02-081-5/+6
| | | | | | | | This variable only held the package name. __PACKAGE__ is faster, as it allows constant folding. diagnostics.pm just happens to be older than __PACKAGE__, which was introduced as recently as 1997 (68dc074516).
* Increase $diagnostics::VERSION to 1.34Father Chrysostomos2014-02-081-1/+1
|
* merge basic zefram/purple_signatures into bleadZefram2014-02-062-13/+37
|\
| * Merge blead into zefram/purple_signaturesZefram2014-02-011-8/+18
| |\
| * | subroutine signaturesZefram2014-02-012-13/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declarative syntax to unwrap argument list into lexical variables. "sub foo ($a,$b) {...}" checks number of arguments and puts the arguments into lexical variables. Signatures are not equivalent to the existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only available by enabling a non-default feature, and generate warnings about being experimental. The syntactic clash with prototypes is managed by disabling the short prototype syntax when signatures are enabled.
* | | Don't test locales that are invalid for needed categoriesKarl Williamson2014-02-041-4/+3
| |/ |/| | | | | | | | | | | | | | | | | | | When looking for locales to test, skip ones which aren't defined in every locale category we care about. This was motivated by a Net BSD machine which has a Pig Latin locale, but it is defined only for LC_MESSAGES. This necessitated adding parameters to pass the desired locale(s), and renaming a test function to indicate the current category it is valid for.
* | lib/locale.t: Better debug informationKarl Williamson2014-01-291-8/+18
|/ | | | This adds a couple of lines of information, and sorts some other output
* mktables: Refer to an actual commit numberKarl Williamson2014-01-281-2/+3
|
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-273-28/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* Increase $B::Deparse::VERSION to 1.25Father Chrysostomos2014-01-271-1/+1
|
* [perl #121050] Teach B::Deparse about prototype whitespaceFather Chrysostomos2014-01-272-2/+12
| | | | | | | It has been hanging or unnecessarily using & since commit d16269d835 caused spaces to be preserved in the prototype and stripped when applied during sub call compilation. That commit did not update B::Deparse accordingly.
* Taint more operands with case changesKarl Williamson2014-01-271-10/+12
| | | | | | | | | | The documentation says that Perl taints certain operations when subject to locale rules, such as lc() and ucfirst(). Prior to this commit there were exceptions when the operand to these functions contained no characters whose case change actually varied depending on the locale, for example the empty string or above-Latin1 code points. Changing to conform to the documentation simplifies the core code, and yields more consistent results.
* Move an inversion list generation to mktablesKarl Williamson2014-01-271-4/+18
| | | | | | | Prior to this patch, this was in regen/mk_invlists.pl, but future commits will want it to also be used by the header generated by regen/regcharclass.pl, so use a common source so the logic doesn't have to be duplicated.
* Move more locale code to t/loc_tools.plKarl Williamson2014-01-271-3/+1
| | | | | | This trivial code to determine if a locale is a utf8 one or not is currently used in just one place, but future commits will use it in others, and will make it non-trivial, and non-obvious.
* lib/locale.pm: Pod correctionsKarl Williamson2014-01-231-5/+6
|
* Allow 'use locale' on systems without locales.Karl Williamson2014-01-231-13/+3
| | | | | | | | | | | | | Instead of throwing an error, just go ahead and do the import. This will tell Perl internally to use the current underlying locale, which should be the C locale. Attempts to change the locale will fail. This differs slightly from Brian Fraser's patch, in that his didn't touch $^H, thus 'use locale' was a no-op. He has told me to apply this one, which does affect $^H. The advantage here is that now programs that are run on platforms with and without locales will behave similarly, and should run identically if the locale is not switched from the default.
* lib/ExtUtils/t/Embed.t: Skip tests is cross-compiling and $Config{cc} is not ↵Brian Fraser2014-01-222-5/+7
| | | | available
* Move more common locale finding code into t/loc_tools.plKarl Williamson2014-01-221-26/+11
| | | | | | | As a result of some code meant to do the same thing being in two different places, one got updated, and one didn't. So t/run/locale.t was being skipped for Win32, even though the bug there it was avoiding has been fixed in XP.
* t/loc_tools.pl: Extract out finding locales from locale.tKarl Williamson2014-01-221-197/+5
| | | | | | | | | | | | | | | | | Several different test files need to find the locales on the system, and each currently has rolled its own methods to do that. This commit creates a new file t/loc_tools.pl which is designed to be a common place for these tools. lib/locale.t did the most thorough job of finding locales, so t/loc_tools.pl is built upon what it had, which is now deleted from locale.t. The code in t/loc_tools.pl was copied from lib/locale.t with white space changes and changes to make this be a subroutine, and helper functions renamed to begin with an underscore, and changing the hard-coded list to be in a DATA section so it doesn't have to be actually used unless necessary.
* lib/locale.t: Remove no longer needed SKIPSKarl Williamson2014-01-221-19/+0
| | | | | | locale.t has changed so if tests in some locales fail, it still passes, provided that most locales work. Thus this code whose effect was to know about some broken locales and SKIP them, is no longer needed.
* [perl #120977] bump $warnings::VERSIONTony Cook2014-01-221-1/+1
|
* assume "all" in "use warnings 'FATAL';" and relatedHauke D2014-01-221-1/+5
| | | | | | | | | | | | | Until now, the behavior of the statements use warnings "FATAL"; use warnings "NONFATAL"; no warnings "FATAL"; was unspecified and inconsistent. This change causes them to be handled with an implied "all" at the end of the import list. Tony Cook: fix AUTHORS formatting
* Fix perl5db.t test 41 on VMS.Craig A. Berry2014-01-181-3/+3
| | | | | | We're getting newlines in between items, and the easiest way to deal with it is make them explicit so we expect what we're getting and it's done the same everywhere.
* [perl #118817] avoid using 2 handles to write to the debug outputTony Cook2014-01-151-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the tests were run with the following config: NonStop=0 TTY=db.out LineInfo=db.out This meant that the debugger would write the prologue, command prompts and their results and the epilogue to one handle, and any line trace information to the second handle. Since those handles didn't share a file position, the line trace info would overwrite the prologue, and the epilogue would overwrite part of the line trace info. When TTY=vt100 on Redhat systems this made the epilogue just long enough to overwrite the line trace data that a test matched against, causing the test to fail. To fix this, I avoided setting LineInfo: NonStop=0 TTY=db.out and since LineInfo defaults to using the TTY handle, both types of content are written to db.out *without* overwriting each other. Unfortunately this broke some other tests, since the command prompts which were overwritten by line trace information are now mixed in with the line traces - I've modified the tests that failed to account for the included command lines.
* rename aggref warnings to autoderefRicardo Signes2014-01-144-6/+6
|
* avoid a keys-on-scalar warning in a testRicardo Signes2014-01-141-1/+1
|
* More test tweaksFather Chrysostomos2014-01-143-1/+4
|
* Increase $warnings::VERSION to 1.21Father Chrysostomos2014-01-141-1/+1
|
* Make key/push $scalar experimentalFather Chrysostomos2014-01-142-14/+18
| | | | | We need a better name for the experimental category, but I have not thought of one, even after sleeping on it.
* lib/B/Deparse.t: TODO test for [perl #120950]Karl Williamson2014-01-091-3/+7
| | | | | | | | This moves a test to earlier in the file where it now fails, and makes it TODO. It also creates a copy just after the failure, this time without the TODO, to show that it is order dependent. This is in preparation for some commits that exposed this bug.
* Crash in tab completion with Term::ReadLine::Gnu.Shlomi Fish2014-01-061-2/+2
| | | | | | Perhaps it also affects Term::ReadLine::Perl / Term::ReadLine::Perl5 . I still need to test with PadWalker installed. No tests were added, but it passes all existing tests.
* Unicode::UCD::prop_aliases(): Don't generate spurious warningsKarl Williamson2014-01-012-5/+24
| | | | Certain inputs to prop_aliases caused spurious warning.
* White-space onlyKarl Williamson2013-12-312-185/+192
| | | | | This indents various newly-formed blocks (by the previous commit) in these three files, and reflows lines to fit into 79 columns
* Change format of mktables output binary property tablesKarl Williamson2013-12-313-33/+131
| | | | | | | | | mktables now outputs the tables for binary properties as inversion lists, with a size as the first element. This means simpler handling of these tables in the core, including removal of an entire pass over them (it was done just to get the size). These tables are marked as for internal use by the Perl core only, so their format is changeable at will.
* Change \p{} matching for above-Unicode code pointsKarl Williamson2013-12-313-170/+157
| | | | | | | | | | | | | | http://markmail.org/message/eod7ukhbbh5tnll4 is the beginning of the thread that led to this commit. This commit revises the handling of \p{} and \P{} to treat above-Unicode code points as typical Unicode unassigned ones, and only output a warning during matching when the answer is arguable under strict Unicode rules (that is "matched" for \p{}, and "didn't match" for \P{}). The exception is if the warning category has been made fatal, then it tries hard to always output the warning. The definition of \p{All} is changed to be qr/./s, and no warning is issued at all for matching it against above-Unicode code points.
* mktables: Split off some functionalityKarl Williamson2013-12-311-4/+10
| | | | | | | | This adds a new function that formats a count of code points. Currently it calls the current function that formats a generic number. A future commit will change so that the output of the two functions differ. The reason for this commit is to make that later commit's difference listing smaller and easier to understand.
* mktables: Add \p{Unicode}Karl Williamson2013-12-311-0/+1
| | | | This is a clearer synonym for \p{Any}
* mktables: Separate out defns of \p{Any} and \p{All}Karl Williamson2013-12-311-13/+17
| | | | | This is in preparation to making them mean different things, in a future commit
* mktables: Better comment some variablesKarl Williamson2013-12-311-3/+6
|
* mktables: Calculate debugging information placementKarl Williamson2013-12-311-12/+87
| | | | | | | | | | | | | | When outputting debugging information under the -annotate option, it's nice to line up the columns. This commit does a pass through the tables where the final real data column is variable width so that it can figure out where to put the debugging info so as almost all of the columns can be lined up, and not have to be right-shifted because of overlong real data. Certain tables prior to this commit had been manually eyeballed and column information hard-coded in. This is no longer necessary. This means that one parameter to the write() function is no longer used, and is removed here.
* mktables: White-space onlyKarl Williamson2013-12-311-10/+12
| | | | Outdent a just-removed block, and better align several other statements