summaryrefslogtreecommitdiff
path: root/intrpvar.h
Commit message (Collapse)AuthorAgeFilesLines
* Add API function Perl_langinfo()Karl Williamson2017-09-091-0/+3
| | | | | | This is designed to generally replace nl_langinfo() in XS code. It is thread-safer, hides the quirks of perl's LC_NUMERIC handling, and can be used on systems lacking nl_langinfo.
* Make immortal SVs contiguousDavid Mitchell2017-07-271-1/+7
| | | | | | | | | | | | | | | | | | | Ensure that PL_sv_yes, PL_sv_undef, PL_sv_no and PL_sv_zero are allocated adjacently in memory. This allows the SvIMMORTAL() test to be more efficient, and will (in the next commit) allow SvTRUE() to be more efficient. In MULTIPLICITY builds the constraint is already met by virtue of them being adjacent items in the interpreter struct. For non-MULTIPLICITY builds, they were just 4 global vars with no guarantees of where they would be allocated. For this case, PL_sv_undef are deleted as global vars and replaced with a new global var PL_sv_immortals[4], with #define PL_sv_yes (PL_sv_immortals[0]) etc in their place.
* add PL_sv_zeroDavid Mitchell2017-07-271-0/+8
| | | | | | | | | | it's like PL_sv_no, except that its string value is "0" rather than "". It can be used for example where pp function wants to push a zero return value on the stack. The next commit will start to use it. Also update the SvIMMORTAL() to be more efficient: it now checks whether the SV's address is in a range rather than individually checking against &PL_sv_undef, &PL_sv_no etc.
* Eliminate remaining uses of PL_statbufDagfinn Ilmari Mannsåker2017-06-011-1/+0
| | | | | | | | | | Give Perl_nextargv its own statbuf and pass a pointer to it into Perl_do_open_raw and thence S_openn_cleanup when needed. Also reduce the scope of the existing statbuf in Perl_nextargv to make it clear it's distinct from the one populated by do_open_raw. Fix perldelta entry for PL_statbuf removal
* Improve perlintern.pod docs for PL_dowarnAaron Crane2017-01-071-2/+4
|
* Create inversion list for Assigned code pointsKarl Williamson2016-12-231-0/+1
| | | | This will be used in a future commit.
* Deprecate isFOO_utf8() macrosKarl Williamson2016-12-231-0/+1
| | | | | | These macros are being replaced by a safe version; they now generate a deprecation message at each call site upon the first use there in each program run.
* Change name of PL_ variableKarl Williamson2016-11-281-2/+1
| | | | | | | This variable really means the character that replaces any embedded NULs when doing collation. Change the name accordingly. (Embedded NULs must be replaced because the libc function strxfrm is used, and it operates on C strings which have no embedded NULs.)
* PATCH: [perl #129953] lib/locale.t failures on FREEBSDKarl Williamson2016-11-281-1/+2
| | | | | | | | | | | | | | | | | | | | | I thought this bug was in FREEBSD, but when I went to gather the info needed to report it to the vendor, it turned out to be a mistake I had made. The problem is basically doubly encoding into UTF-8. In order to save CPU time, in a UTF-8 locale, I had stored a string as UTF-8 encoded. This string is to be inserted into a larger string. What I neglected to consider in this situation is that not all strings in such locales need be in UTF-8. The UTF-8 encoded insert could get added to a non-UTF-8 string, and the result later was switched to UTF-8, so the inserted string's bytes were individually converted to UTF-8, effectively a second time. This is a problem only if the inserted string is different when encoded in UTF-8 than not, and for this particular usage, on most platforms it was UTF-8 invariant, so did not show up, except on those platforms where it was variant. The solution is to store the replacement as a code point, and encode it as UTF-8 only if necessary, once. This actually simplifies the code.
* rework perl #129903 - inf recursion from use of empty pattern in regex codeblockYves Orton2016-11-011-0/+1
| | | | | | | | | | | | | | | | | | FC didn't like my previous patch for this issue, so here is the one he likes better. With tests and etc. :-) The basic problem is that code like this: /(?{ s!!! })/ can trigger infinite recursion on the C stack (not the normal perl stack) when the last successful pattern in scope is itself. Since the C stack overflows this manifests as an untrappable error/segfault, which then kills perl. We avoid the segfault by simply forbidding the use of the empty pattern when it would resolve to the currently executing pattern. I imagine with a bit of effort someone can trigger the original SEGV, unlike my original fix which forbade use of the empty pattern in a regex code block. So if someone actually reports such a bug we might have to revert to the older approach of prohibiting this.
* make PL_ pad vars be of type PADOFFSETDavid Mitchell2016-09-261-7/+7
| | | | | | | | | | | | | | | | | | Now that that PADOFFSET is signed, make PL_comppad_name_fill PL_comppad_name_floor PL_padix PL_constpadix PL_padix_floor PL_min_intro_pending PL_max_intro_pending be of type PADOFFSET rather than I32, to match the rest of the pad interface. At the same time, change various I32 local vars in pad.c functions to be PADOFFSET.
* Re-order intrp structFather Chrysostomos2016-08-141-3/+1
|
* Remove PL_maxoFather Chrysostomos2016-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have an interpreter variable using memory, PL_maxo, which is defined to be the same as MAXO, a #defined constant. As far as I can tell, it is never used in lvalue context, in core or on CPAN, except for the initialisation in intrpvar.h. It can simply be removed and replaced with a macro defined as equiva- lent to MAXO. It was added in this commit: commit 84ea024ac9cdf20f21223e686dddea82d5eceb4f Author: Perl 5 Porters <perl5-porters.nicoh.com> Date: Tue Jan 2 23:21:55 1996 +0000 perl 5.002beta1h patch: perl.h 5.002beta1 attempted some memory optimizations, but unfortunately they can result in a memory leak problem. This can be avoided by #define STRANGE_MALLOC. I do that here until consensus is reached on a better strategy for handling the memory optimizations. Include maxo for the maximum number of operations (needed for the Safe extension). But apparently it is not needed for the Safe extension (tests pass without it).
* Remove PL_(lex_)encoding and all dependent codeFather Chrysostomos2016-07-131-3/+0
|
* Do better locale collation in UTF-8 localesKarl Williamson2016-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some platforms, the libc strxfrm() works reasonably well on UTF-8 locales, giving a default collation ordering. It will assume that every string passed to it is in UTF-8. This commit changes Perl to make sure that strxfrm's expectations are met. Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8 string. And this commit makes sure of that as well. So, simply meeting strxfrm's expectations allows Perl to start supporting default collation in UTF-8 locales, and fixes it to work on single-byte locales with UTF-8 input. (Unicode::Collate provides tailorable functionality and is portable to platforms where strxfrm isn't as intelligent, but is a much more heavy-weight solution that may not be needed for particular applications.) There is a problem in non-UTF-8 locales if the passed string contains code points representable only in UTF-8. This commit causes them to be changed, before being passed to strxfrm, into the highest collating character in the locale that doesn't require UTF-8. They then will sort the same as that character, which means after all other characters in the locale but that one. In strings that don't have that character, this will generally provide exactly correct operation. There still is a problem, if that character, in the given locale, combines with adjacent characters to form a specially weighted sequence. Then, the change of these above-255 code points into that character can skew the results. See the commit message for 6696cfa7cc3a0e1e0eab29a11ac131e6f5a3469e for more on this. But it is really an illegal situation to have above-255 code points in a single-byte locale, so this behavior is a reasonable degradation when given illegal input. If two transformed strings compare exactly equal, Perl already uses the un-transformed versions to break ties, and there, these faked-up strings will collate so the above-255 code points sort after everything else, and in code point order amongst themselves.
* locale.c: Change algorithm for strxfrm() trialsKarl Williamson2016-05-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's kind of guess work deciding how big a buffer to give to strxfrm(). If you give it too small a one, it will fail. Prior to this commit, the buffer size was doubled and then strxfrm() was called again, looping until it worked, or we used too much memory. Each time a new locale is made, we try to minimize the necessity of doing this by calculating numbers 'm' and 'b' that can be plugged into the equation mx + b where 'x' is the size of the string passed to strxfrm(). strxfrm() is roughly linear with respect to its input's length, so this generally works without us having to do many loops to get a large enough size. But on many systems, strxfrm(), in failing, returns how much space you should have given it. On such systems, we can just use that number on the 2nd try and not have to keep guessing. This commit changes to do that. But on other systems this doesn't work. So the original method is retained if we determine that there are problems with strxfrm(), either from previous experience, or because using the size returned from the first trial didn't work
* Change mem_collxfrm() algorithm for embedded NULsKarl Williamson2016-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problems in implementing Perl is that the C library routines forbid embedded NUL characters, which Perl accepts. This is true for the case of strxfrm() which handles collation under locale. The best solution as far as functionality goes, would be for Perl to write its own strxfrm replacement which would handle the specific needs of Perl. But that is not going to happen because of the huge complexity in handling it across many platforms. We would have to know the location and format of the locale definition files for every such platform. Some might follow POSIX guidelines, some might not. strxfrm creates a transformation of its input into a new string consisting of weight bytes. In the typical but general case, a 3 character NUL-terminated input string 'A B C 00' (spaces added for readability) gets transformed into something like: A¹ B¹ C¹ 01 A² B² C² 01 A³ B³ C³ 00 where the superscripted characters are weights for the corresponding input characters. Superscript 1 represents (essentially) the primary sorting key; 2, the secondary, etc, for as many levels as the locale definition gives. The 01 byte is likely to be the separator between levels, but not necessarily, and there could be some other mechanisms used on various platforms. To handle embedded NULs, the simplest thing would be to just remove them before passing in to strxfrm(). Then they would be entirely ignored, which might not be what you want. You might want them to have some weight at the tertiary level, for example. It also causes problems because strxfrm is very context sensitive. The locale definition can define weights for specific sequences of any length (and the weights can be multi-byte), and by removing a NUL, two characters now become adjacent that weren't in the input, and they could now form one of those special sequences and thus throw things off. Another way to handle NULs, that seemingly ignores them, but actually doesn't, is the mechanism in use prior to this commit. The input string is split at the NULs, and the substrings are independently passed to strxfrm, and the results concatenated together. This doesn't work either. In our example 'A B C 00', suppose B is a NUL, and should have some weight at the tertiary level. What we want is: A¹ C¹ 01 A² C² 01 A³ B³ C³ 00 But that's not at all what you get. Instead it is: A¹ 01 A² 01 A³ C¹ 01 C² 01 C³ 00 The primary weight of C comes immediately after the teriary weight of A, but more importantly, a NUL, instead of being ignored at the primary levels, is significant at all levels, so that "a\0c" would sort before "ab". Still another possibility is to replace the NUL with some other character before passing it to strxfrm. That was my original plan, to replace each NUL with the character that this code determines has the lowest collation order for the current locale. On strings that don't contain that character, the results would be as good as it gets for that locale. That character is likely to be ignored at higher weight levels, but have some small non-ignored weight at the lowest ones. And hopefully the character would rarely be encountered in practice. When it does happen, it and NUL would sort identically; hardly the end of the world. If the entire strings sorted identically, the NUL-containing one would come out before the other one, since the original Perl strings are used as a tie breaker. However, testing showed a problem with this. If that other character is part of a sequence that has special weighting, the results won't be correct. With gcc, U+00B4 ACUTE ACCENT is the lowest collating character in many UTF-8 locales. It combines in Romanian and Vietnamese with some other characters to change weights, and hence changing NULs into U+B4 screws things up. What I finally have come to is to do is a modification of this final approach, where the possible NUL replacements are limited to just characters that are controls in the locale. NULs are replaced by the lowest collating control. It would really be a defective locale if this control combined with some other character to form a special sequence. Often the character will be a 01, START OF HEADING. In the very unlikely case that there are absolutely no controls in the locale, 01 is used, because we have to replace it with something. The code added by this commit is mostly utf8-ready. A few commits from now will make Perl properly work with UTF-8 (if the platform supports it). But until that time, this isn't a full implementation; it only looks for the lowest-sorting control that is invariant, where the the UTF8ness doesn't matter. The added tests are marked as TODO until then.
* Keep track of if collation locale is UTF-8 or notKarl Williamson2016-05-241-0/+1
| | | | This will be used in future commits
* Add environment variable for -Dr: PERL_DUMP_RE_MAX_LENKarl Williamson2016-02-191-0/+2
| | | | | | | | | | | | | | | | | | | The regex engine when displaying debugging info, say under -Dr, will elide data in order to keep the output from getting too long. For example, the number of code points in all of Unicode matched by \w is quite large, and so when displaying a pattern that matches this, only the first some number of them are printed, and the rest are truncated, represented by "...". Sometimes, one wants to see more than what the compiled-into-the-engine-max shows. This commit creates code to read this environment variable to override the default max lengths. This changes the lengths for everything to the input number, even if they have different compiled maximums in the absence of this variable. I'm not currently documenting this variable, as I don't think it works properly under threads, and we may want to alter the behavior in various ways as a result of gaining experience with using it.
* intrvar.h: document PL_tmps_maxDavid Mitchell2016-02-031-1/+1
| | | | Its name implies that it's the top allocated element; in fact it's top+1.
* Add qr/\b{lb}/Karl Williamson2016-01-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the final Unicode boundary type previously missing from core Perl: the LineBreak one. This feature is already available in the Unicode::LineBreak module, but I've been told that there are portability and some other issues with that module. What's added here is a light-weight version that is lacking the customizable features of the module. This implements the default Line Breaking algorithm, but with the customizations that Unicode is expecting everybody to add, as their test file tests for them. In other words, this passes Unicode's fairly extensive furnished tests, but wouldn't if it didn't include certain customizations specified by Unicode beyond the basic algorithm. The implementation uses a look-up table of the characters surrounding a boundary to see if it is a suitable place to break a line. In a few cases, context needs to be taken into account, so there is code in addition to the lookup table to handle those. This should meet the needs for line breaking of many applications, without having to load the module. The algorithm is somewhat independent of the Unicode version, just like the other boundary types. Only if new rules are added, or existing ones modified is there need to go in and change this code. Otherwise, running regen/mk_invlists.pl should be sufficient when a new Unicode release is done to keep it up-to-date, again like the other Unicode boundary types.
* remove deprecated PL_timesbufDaniel Dragan2016-01-171-5/+0
| | | | Saves memory in interp struct.
* Fix broken fix for RT #127212Aaron Crane2016-01-171-3/+5
| | | | | | As ilmari++ points out, the fix didn't work on builds without PERL_IMPLICIT_CONTEXT (including non-threaded, non-multiplicity) or PERL_DEBUG_READONLY_COW.
* Fix version numbers in intrpvar.h commentsAaron Crane2016-01-171-2/+2
| | | | | | | | | | There are two version numbers in intrpvar.h that have been repeatedly but confusingly bumped by an older version of Porting/bump-perl-version. Now that the porting script ignores intrpvar.h, it's better to restore the version numbers to the way they were when they were originally added. The comment for PERL_LAST_5_18_0_INTERP_MEMBER was introduced in commit d399cf59bde32e412ae99791ae46a871c7337b42, and the comments for PL_timesbuf was introduced in 25983af42cdcf2dc1fea6717dac7aac441b6301d.
* RT #127212: retain binary compatibility across plain/DEBUGGINGAaron Crane2016-01-171-3/+4
| | | | | | | | | | | | | | | | | | | | | Niko Tyni of Debian points out that the size of the interpreter structure differs between plain and -DDEBUGGING builds, and that this breaks binary compatibility of XS modules between such builds. Making the definition of PL_memory_debug_header unconditional on PERL_TRACK_MEMPOOL (which itself is defined only on debug builds) eliminates this needless incompatibility. There is some confusion about whether plain and debug builds are expected to be compatible. Commit 1e8125c621275d18c74bc8dae3bfc3c03929fe1e (July 2010) refers in passing to "binary incompatible perls with the same API version (i.e. the same perl version configured with and without DEBUGGING)". But f2b88940d815760ad254d35a0ee1eb2ed8ce7762 (November 2009) says explicitly that "-DDEBUGGING and not need to be binary compatible with each other", and I think this explicit statement is a better example to follow. Further, this compatibility is clearly useful for our downstream packagers (as reported by Niko), and for any users who'd like to be able to use a debug build for tracking down problems (including those encountered while using modules with XS parts).
* Bump the perl version in various places for 5.23.7David Golden2015-12-211-2/+2
|
* Bump the perl version in various places for 5.23.6Abigail2015-11-201-2/+2
|
* avoid (TAINTING_get && TAINT_get)David Mitchell2015-11-101-1/+1
| | | | | | | | | | In various places we test for both (PL_tainting && PL_tainted). Since if tainting isn't enabled PL_tainted should never get set, it's more efficient to just test for (TAINT_get). We ensure that PL_tainted doesn't actually get set when !PL_tainting by changing some "setting" macros from PL_tainted = TRUE to PL_tainted = PL_tainting.
* Bump version to 5.23.5Steve Hay2015-10-201-2/+2
|
* optimise save/restore of PL_delaymagic.David Mitchell2015-10-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | A few places (pp_push, pp_unshift, pp_aassign) have to set PL_delaymagic on entry, and restore it on exit. These are hot pieces of code. Rather than using ENTER/SAVEI16(PL_delaymagic)/LEAVE, add an extra field to the jumpenv struct, and make the JUMPENV_PUSH / POP macros automatically save and restore this var. This means that pp_push etc only need to do a local save: U16 old_delaymagic = PL_delaymagic; PL_delaymagic = DM_DELAY; .... PL_delaymagic = old_delaymagic; and in case of an exception being raised, PL_delaymagic still gets restored. This transfers the cost of saving PL_delaymagic from each call to pp_aassign etc to each time a new run level is invoked. The latter should be much less frequent. Note that prior to this commit, pp_aassign wasn't actually saving and restoring PL_delaymagic; it was just setting it to 0 at the end. So this commit also makes pp_aassign safe against PL_delaymagic re-entrancy like pp_push and pp_unshift already were.
* Bump the perl version in various places for 5.23.4.Peter Martini2015-09-211-2/+2
|
* Various pods: Add C<> around many typed-as-is thingsKarl Williamson2015-09-031-1/+1
| | | | Removes 'the' in front of parameter names in some instances.
* perlapi, perlintern: Add L<> links to podKarl Williamson2015-09-031-5/+5
|
* Bump the perl version in various places for 5.23.3.Matthew Horsfall2015-08-201-2/+2
|
* document what PL_generation is forDavid Mitchell2015-08-171-1/+2
|
* Eliminate PL_sawalias, GPf_ALIASED_SVDavid Mitchell2015-08-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | These two commits: v5.21.3-759-gff2a62e "Skip no-common-vars optimisation for aliases" v5.21.4-210-gc997e36 "Make list assignment respect foreach aliasing" added a run-time mechanism to detect aliased package variables, by either "*pkg = ...," or "for $pkg (...)", and used that information to enable the OPpASSIGN_COMMON mechanism at runtime for detecting common elements in a list assign, e.g. for $alias ($a, ...) { ($a,$b) = (1,$alias); } The previous commit but one changed the OPpASSIGN_COMMON mechanism such that it no longer uses PL_sawalias. So this var and the mechanism for setting it can now be removed. This commit removes: * the PL_sawalias variable * the GPf_ALIASED_SV GP flag * the SAVEt_GP_ALIASED_SV and save_aliased_sv() save type.
* Bump version to 5.23.2Matthew Horsfall2015-07-201-2/+2
|
* patchlevel: we are now perl v5.23.1Ricardo Signes2015-06-201-2/+2
|
* bump version to v5.23.0Ricardo Signes2015-06-011-2/+2
|
* bump version to v5.22.0 with Porting/bump-perl-versionRicardo Signes2015-05-081-2/+2
|
* Bump version for 5.21.12 (although it's unlikely to happen)Steve Hay2015-04-211-2/+2
|
* Bump version for 5.21.11 (if that happens)Steve Hay2015-03-201-2/+2
|
* Skip PL_warn_locale use unless compiled inKarl Williamson2015-03-071-1/+3
| | | | | | | The use of this variable was inconsistent. It was not dup'ed on thread cloning unless LC_CTYPE is being used, but elsewhere it was. This led to segfaults on threaded builds. Now it isn't touched anywhere unless LC_CTYPE is used.
* added link to announcementSawyer X2015-02-211-2/+2
|
* Add \b{sb}Karl Williamson2015-02-191-0/+1
|
* Add qr/\b{wb}/Karl Williamson2015-02-191-0/+1
|
* Remove obsolete macros/tables for \XKarl Williamson2015-02-191-2/+0
| | | | | A previous commit changed how \X is implemented, and now we don't need these anymore.
* Add qr/\b{gcb}/Karl Williamson2015-02-191-0/+1
| | | | | | | | | | | A function implements seeing if the space between any two characters is a grapheme cluster break. Afer I wrote this, I realized that an array lookup might be a better implementation, but the deadline for v5.22 was too close to change it. I did see that my gcc optimized it down to an array lookup. This makes the implementation of \X go from being complicated to trivial.
* More bumpbing of version number to 5.21.9. Missed this yesterday.Matthew Horsfall2015-01-211-2/+2
|
* Don't raise 'poorly supported' locale warning unnecessarilyKarl Williamson2014-12-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8c6180a91de91a1194f427fc639694f43a903a78 added a warning message for when Perl determines that the program's underlying locale just switched into is poorly supported. At the time it was thought that this would be an extremely rare occurrence. However, a bug in HP-UX - B.11.00/64 causes this message to be raised for the "C" locale. A workaround was done that silenced those. However, before it got fixed, this message would occur gobs of times executing the test suite. It was raised even if the script is not locale-aware, so that the underlying locale was completely irrelevant. There is a good prospect that someone using an older Asian locale as their default would get this message inappropriately, even if they don't use locales, or switch to a supported one before using them. This commit causes the message to be raised only if it actually is relevant. When not in the scope of 'use locale', the message is stored, not raised. Upon the first locale-dependent operation within a bad locale, the saved message is raised, and the storage cleared. I was able to do this without adding extra branching to the main-line non-locale execution code. This was done by adding regnodes which get jumped to by switch statements, and refactoring some existing C tests so they exclude non-locale right off the bat. These changes would have been necessary for another locale warning that I previously agreed to implement, and which is coming a few commits from now. I do not know of any way to add tests in the test suite for this. It is in fact rare for modern locales to have these issues. The way I tested this was to temporarily change the C code so that all locales are viewed as defective, and manually note that the warnings came out where expected, and only where expected. I chose not to try to output this warning on any POSIX functions called. I believe that all that are affected are deprecated or scheduled to be deprecated anyway. And POSIX is closer to the hardware of the machine. For convenience, I also don't output the message for some zero-length pattern matches. If something is going to be matched, the message will likely very soon be raised anyway.