summaryrefslogtreecommitdiff
path: root/embed.fnc
Commit message (Collapse)AuthorAgeFilesLines
* toke.c, S_scan_ident(): Don't take a "end of buffer" argument, use PL_bufendBrian Fraser2013-09-181-1/+1
| | | | | | | | | | | | | | | All but one of scan_ident()'s callers already passed PL_bufend as the removed argument; The one deviant was intuit_more(), which was setting the "end of buffer" argument, to the next close-bracket. This commit modifies intuit_more() to temporarily set PL_bufend and then restore it. This was done as groundwork for the following commit, which will add more uses of PEEKSPACE() to scan_ident() in order to fix some whitespace and line number bugs, and PEEKSPACE() modifies PL_bufend directly if it encounters a newline at the end of the buffer -- that last bit being why changing intuit_more() to modify-and-restore PL_bufend is safe, since the end of the buffer will always be a ']'
* [perl #115928] a consistent (public) rand() implementationTony Cook2013-09-131-0/+2
| | | | | | | | | | | | | | | | Based on Yves's random branch work. This version makes the new random number visible to external modules, for example, List::Util's XS shuffle() implementation. I've also added a 64-bit implementation when HAS_QUAD is true, this should be significantly faster, even on 32-bit CPUs. This is intended to produce exactly the same sequence as the original implementation. The original version of this commit retained the "freebsd" name from Yves's original work for the function and data structure names. I've removed "freebsd" from most function names so the name isn't an issue if we choose to replace the implementation,
* perlapi: Add doc for my_strlcpy, my_strlcatKarl Williamson2013-09-101-2/+2
|
* gv.c: Split part of find_default_stash into gv_is_in_main.Brian Fraser2013-09-111-0/+2
| | | | | gv_is_in_main() checks if an unqualified identifier is in the main:: stash.
* gv.c: Rename magicalize_gv into gv_magicalize, make it more specific.Brian Fraser2013-09-111-1/+1
| | | | | Namely, gv_magicalize no longer stores the GV into the stash, which is gv_fetchpvn_flags' job.
* gv.c, gv_fetchpvn_flags: Split another chunk of magic-checking code.Brian Fraser2013-09-111-0/+1
| | | | | | This bit is called when a GV already exists, but it's name is length-one and it's on the main:: stash, so it might have multiple kinds of magic, like $! and %!, or @+ and %+.
* gv.c: Move the code that magicalizes new globs into magicalize_gv().Brian Fraser2013-09-111-0/+3
|
* gv.c: Begin splitting gv_fetchpvn_flags into smaller helper functions.Brian Fraser2013-09-111-0/+7
| | | | | This commit takes a chunk of code out of gv_fetchpvn_flags and turns it into two fuctions: parse_gv_stash_name and find_default_stash.
* regcomp.c: Make all warnings and error messages UTF-8 cleanBrian Fraser2013-09-101-1/+1
|
* [perl #117265] correctly handle overloaded stringsTony Cook2013-09-091-1/+1
|
* Revert "Let av_push accept NULL values"Father Chrysostomos2013-09-071-1/+1
| | | | | | | | | | | This reverts commit 7b6e8075e45ebc684565efbe3ce7b70435f20c79. It turns out to be problematic, because it causes NULLs on the stack, which XSUBs may trip on. My main reason for it was actually to try to resolve some CPAN failures, but it turns out that other fixes have removed the need for that.
* Fix PerlIO_get_cnt and friendsLeon Timmermans2013-09-071-4/+4
| | | | These functions worked with ints instead of SSize_t,
* Let av_push accept NULL valuesFather Chrysostomos2013-09-061-1/+1
| | | | | | | Now that NULL is used for a nonexistent element, it is easy for XS code to pass it to av_push(). av_store already accepts NULL, and av_push already works with it on non-debugging builds, so there is really no need for this restriction.
* Put AV defelem creation code in one placeFather Chrysostomos2013-09-061-0/+1
|
* [perl #115768] improve (caller)[2] line numbersFather Chrysostomos2013-09-011-1/+2
| | | | | | | | | | | | | warn and die have special code (closest_cop) to find a nulled nextstate op closest to the warn or die op, to get the line number from it. This commit extends that capability to caller, so that if (1) { foo(); } sub foo { warn +(caller)[2] } shows the right line number.
* utf8.c: Remove wrapper functions.Karl Williamson2013-08-291-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the Unicode data is stored in native character set order, it is rare to need to work with the Unicode order. Traditionally, the real work was done in functions that worked with the Unicode order, and wrapper functions (or macros) were used to translate to/from native. There are two groups of functions: one that translates from code point to UTF-8, and the other group goes the opposite direction. This commit changes the base function that translates from UTF-8 to code point to output native instead of Unicode. Those extremely rare instances where Unicode output is needed instead will have to hand-wrap calls to this function with a translation macro, as now described in the API pod. Prior to this, it was the other way, the native was wrapped, and the rare, strict Unicode wasn't. This eliminates a layer of function call overhead for a common case. The base function that translates from code point to UTF-8 retains its Unicode input, as that is more natural to process. However, it is de-emphasized in the pod, with the functionality description moved to the pod for a native input wrapper function. And, those wrappers are now macros in all cases; previously there was function call overhead sometimes. (Equivalent exported functions are retained, however, for XS code that uses the Perl_foo() form.) I had hoped to rebase this commit, squashing it with an earlier commit in this series, eliminating the use of a temporary function name change, but the work involved turns out to be large, with no real payoff.
* utf8.c: Stop using two functionsKarl Williamson2013-08-291-2/+4
| | | | | | | | | | | | | | | | | This is in preparation for deprecating these functions, to force any code that has been using these functions to change. Since the Unicode tables are now stored in native order, these functions should only rarely be needed. However, the functionality of these is needed, and in actuality, on ASCII platforms, the native functions are #defined to these. So what this commit does is rename the functions to something else, and create wrappers with the old names, so that anyone using them will get the deprecation when it actually goes into effect: we are waiting for CPAN files distributed with the core to change before doing the deprecation. According to cpan.grep.me, this should affect fewer than 10 additional CPAN distributions.
* Convert uvuni_to_utf8() to functionKarl Williamson2013-08-291-1/+1
| | | | | | | Code should almost never be dealing with non-native code points This is in preparation for later deprecation when our CPAN modules have been converted away from using it.
* Deprecate utf8_to_uni_buf()Karl Williamson2013-08-291-1/+1
| | | | | | | Now that the tables are stored in native order, there is almost no need for code to be dealing in Unicode order. According to grep.cpan.me, there are no uses of this function in CPAN.
* Deprecate valid_utf8_to_uvuni()Karl Williamson2013-08-291-1/+1
| | | | | | | | | Now that all the tables are stored in native format, there is very little reason to use this function; and those who do need this kind of functionality should be using the bottom level routine, so as to make it clear they are doing nonstandard stuff. According to grep.cpan.me, there are no uses of this function in CPAN.
* utf8.c: Swap which fcn wraps the otherKarl Williamson2013-08-291-2/+1
| | | | This is in preparation for the current wrapee becoming deprecated
* Deprecate NATIVE_TO_NEED and ASCII_TO_NEEDKarl Williamson2013-08-291-0/+2
| | | | | | | | | | | | | | | | | | These macros are no longer called in the Perl core. This commit turns them into functions so that they can use gcc's deprecation facility. I believe these were defective right from the beginning, and I have struggled to understand what's going on. From the name, it appears NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the appropriate parameter indicates that. But that is impossible to do correctly from that API, as for variant characters, it needs to return two bytes. It could only work correctly if ch is an I8 byte, which isn't native, and hence the name would be wrong. Similar arguments for ASCII_TO_NEED. The function S_append_utf8_from_native_byte(const U8 byte, U8** dest) does what I think NATIVE_TO_NEED intended.
* Extract common code to an inline functionKarl Williamson2013-08-291-0/+2
| | | | | This fairly short paradigm is repeated in several places; a later commit will improve it.
* Only predeclare S_sv_or_pv_pos_u2b for -DPERL_CORE or -DPERL_EXTNicholas Clark2013-08-281-0/+2
| | | | | Otherwise when compiling XS code, there is a declaration for a function which is never used, which can cause some compilers to issue a warning.
* [perl #117265] safesyscalls: check embedded nul in syscall argsTony Cook2013-08-261-2/+4
| | | | | | | | | | | | | | | | Check for the nul char in pathnames and string arguments to syscalls, return undef and set errno to ENOENT. Added to the io warnings category syscalls. Strings with embedded \0 chars were prev. ignored in the syscall but kept in perl. The hidden payloads in these invalid string args may cause unnoticed security problems, as they are hard to detect, ignored by the syscalls but kept around in perl PVs. Allow an ending \0 though, as several modules add a \0 to such strings without adjusting the length. This is based on a change originally by Reini Urban, but pretty much all of the code has been replaced.
* Use SSize_t/STRLEN in more places in regexp codeFather Chrysostomos2013-08-251-10/+11
| | | | | | | | | | | | | | | | | | | As part of getting the regexp engine to handle long strings, this com- mit changes any variables, parameters and struct members that hold lengths of the string being matched against (or parts thereof) to use SSize_t or STRLEN instead of [IU]32. To avoid having to change any logic, I kept the signedness the same. I did not change anything that affects the length of the regular expression itself, so regexps are still practically limited to I32_MAX. Changing that would involve changing the size of regnodes, which would be a lot more involved. These changes should fix bugs, but are very hard to test. In most cases, I don’t know the regexp engine well enough to come up with test cases that test the paths in question with long strings. In other cases I don’t have a box with enough memory to test the fix.
* Stop substr re optimisation from rejecting long strsFather Chrysostomos2013-08-251-2/+2
| | | | | | | | | | | | | | Using I32 for the fields that record information about the location of a fixed string that must be found for a regular expression to match can result in match failures, because I32 is not large enough to store offsets >= 2**31. SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32 bits on 32-bit platforms. This commit changes enough instances of I32 to SSize_t to get the added test passing and suppress compiler warnings. A later commit will change many more.
* [perl #116907] Allow //g matching past 2**31 thresholdFather Chrysostomos2013-08-251-1/+1
| | | | | | | | | Change the internal fields for storing positions so that //g in scalar context can move past the 2**31 character threshold. Before this com- mit, the numbers would wrap, resulting in assertion failures. The changes in this commit are only enough to get the added test pass- ing. Stay tuned for more.
* Stop pos() from being confused by changing utf8nessFather Chrysostomos2013-08-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | The value of pos() is stored as a byte offset. If it is stored on a tied variable or a reference (or glob), then the stringification could change, resulting in pos() now pointing to a different character off- set or pointing to the middle of a character: $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print pos $x' 2 $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; print pos $x' Malformed UTF-8 character (unexpected end of string) in match position at -e line 1. 0 So pos() should be stored as a character offset. The regular expression engine expects byte offsets always, so allow it to store bytes when possible (a pure non-magical string) but use char- acters otherwise. This does result in more complexity than I should like, but the alter- native (always storing a character offset) would slow down regular expressions, which is a big no-no.
* Use SSize_t for arraysFather Chrysostomos2013-08-251-13/+14
| | | | | | | | | | Make the array interface 64-bit safe by using SSize_t instead of I32 for array indices. This is based on a patch by Chip Salzenberg. This completes what the previous commit began when it changed av_extend.
* Use SSize_t when extending the stackFather Chrysostomos2013-08-251-3/+4
| | | | | | | | | | | | | | | | (I am referring to what is usually known simply as The Stack.) This partially fixes #119161. By casting the argument to int, we can end up truncating/wrapping it on 64-bit systems, so EXTEND(SP, 2147483648) translates into EXTEND(SP, -1), which does not extend the stack at all. Then writing to the stack in code like ()=1..1000000000000 goes past the end of allocated memory and crashes. I can’t really write a test for this, since instead of crashing it will use more memory than I have available (and then I’ll start for- getting things).
* Use SSize_t for tmps stack offsetsFather Chrysostomos2013-08-251-1/+2
| | | | | | | | | | | | | | | This is a partial fix for #119161. On 64-bit platforms, I32 is too small to hold offsets into a stack that can grow larger than I32_MAX. What happens is the offsets can wrap so we end up referencing and modifying elements with negative indices, corrupting memory, and causing crashes. With this commit, ()=1..1000000000000 stops crashing immediately. Instead, it gobbles up all your memory first, and then, if your com- puter still survives, crashes. The second crash happesn bcause of a similar bug with the argument stack, which the next commit will take care of.
* PATCH: [perl #119443] Blead won't compile on winceKarl Williamson2013-08-231-3/+1
| | | | | | This commit adds #if's to cause locale handling code to compile on platforms that don't have full-featured locale handling. The commits mentioned in the ticket did not adequately cover these situations.
* [perl #3330] warn on increment of an non number/non-magically incable valueTony Cook2013-08-121-0/+2
|
* add adjust_size_and_find_bucket to embed.fncLukas Mai2013-08-111-0/+3
|
* Revert "[perl #117855] Store CopFILEGV in a pad under ithreads"Father Chrysostomos2013-08-091-5/+0
| | | | | | | | | | | | This reverts commit c82ecf346. It turn out to be faulty, because a location shared betweens threads (the cop) was holding a reference count on a pad entry in a particu- lar thread. So when you free the cop, how do you know where to do SvREFCNT_dec? In reverting c82ecf346, this commit still preserves the bug fix from 1311cfc0a7b, but shifts it around.
* Stop ‘used once’ warnings from crashing on circularitiesFather Chrysostomos2013-08-051-1/+1
| | | | | | | | gv_check was only checking for stashes nested directly inside them- selves (*foo:: = *foo::foo) and the main stash. Other stash circularities would cause infinite recursion, blowing the C stack and crashing.
* [perl #117855] Store CopFILEGV in a pad under ithreadsFather Chrysostomos2013-08-051-0/+5
| | | | | | | | | | | | | | | | This saves having to allocate a separate string buffer for every cop (control op; every statement has one). Under non-threaded builds, every cop has a pointer to the GV for that source file, namely *{"_<filename"}. Under threaded builds, the name of the GV used to be stored instead. Now we store an offset into the per-interpreter PL_filegvpad, which points to the GV. This makes no significant speed difference, but it reduces mem- ory usage.
* Extend sv_dump() to dump SVt_INVLISTKarl Williamson2013-08-011-1/+5
| | | | | | | | | This changes the previously unused _invlist_dump() function to be called from sv_dump() to dump inversion list scalars. The format for regular SVt_PVs doesn't give human-friendly output for these. Since these lists are currently not visible outside the Perl core, the format is documented only in comments in the function itself.
* regcomp.c: Extract duplicated code into single fcnKarl Williamson2013-07-301-0/+1
| | | | This code that appears twice is nearly duplicate.
* make Perl_reg_set_capture_string staticDavid Mitchell2013-07-281-8/+0
| | | | | This function was introduced a few commits ago. Since it's now only called from within regexec.c, make it static.
* add Perl_reg_set_capture_string() functionDavid Mitchell2013-07-281-0/+7
| | | | | | | | | | | Cut and paste into a separate function, the block of code in regexec_flags() that is responsible (on successful match) for setting RX_SAVED_COPY, RX_SUBBEG etc, ready for use by capture vars like $1, $&. Although this function is currently only called from one place, we will shortly use it elsewhere too. This should contain no functional changes.
* [perl #79908] Stop sub inlining from breaking closuresFather Chrysostomos2013-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When a closure closes over a variable, it references the variable itself, as opposed to taking a snapshot of its value. This was broken by the constant optimisation added for constant.pm’s sake: { my $x; sub () { $x }; # takes a snapshot of the current value of $x } constant.pm no longer uses that mechanism, except on older perls, so we can remove this hack, causing code like this this to start work- ing again: BEGIN{ my $x = 5; *foo = sub(){$x}; $x = 6 } print foo; # now prints 6, not 5
* Inline list constantsFather Chrysostomos2013-07-251-0/+1
| | | | | | | | | | | These are inlined the same way as 1..5. We have two ops: rv2av | `-- const The const op returns an AV, which is stored in the op tree, and then rv2av flattens it.
* Revert "Remove the non-inline function S_croak_memory_wrap from inline.h."Tony Cook2013-07-241-1/+1
| | | | | | | | | This reverts commit 43387ee1abcd83c3c7586b7f7aa86e838d239aac. Which reverted parts of f019c49e380f764c1ead36fe3602184804292711, but that reversion may no longer be necessary. See [perl #116989]
* [perl #72766] Allow huge pos() settingsFather Chrysostomos2013-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of #116907, too. It also fixes #72924 as a side effect; the next commit will explain. The value of pos($foo) was being stored as an I32, not allowing values above I32_MAX. Change it to SSize_t (the signed equivalent of size_t, representing the maximum string length the OS/compiler supports). This is accomplished by changing the size of the entry in the magic struct, which is the simplest fix. Other parts of the code base can benefit from this, too. We actually cast the pos value to STRLEN (size_t) when reading it, to allow *very* long strings. Only the value -1 is special, meaning there is no pos. So the maximum supported offset is 2**sizeof(size_t)-2. The regexp engine itself still cannot handle large strings, so being able to set pos to large values is useless right now. This is but one piece in a larger puzzle. Changing the size of mg->mg_len also requires that Perl_hv_placeholders_p change its type. This function should in fact not be in the API, since it exists solely to implement the HvPLACEHOLDERS macro. See <https://rt.perl.org/rt3/Ticket/Display.html?id=116907#txn-1237043>.
* Add sv_pos_b2u_flagsFather Chrysostomos2013-07-231-0/+2
| | | | | | | | | This, similar to sv_pos_u2b_flags, is a more friendly variant of sv_pos_u2b that works with 2GB strings and actually returns a value instead of modifying a passed-in value in place through a pointer. The next commit will use this.
* Allow => to quote built-in keywords across linesFather Chrysostomos2013-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the second try. 5969c5766a5d3 had a bug in it under non- MAD builds. If I have a sub I can use its name as a bareword as long as I suffix it with =>, even if the => is on the next line: $ ./perl -Ilib -e 'sub tim; warn tim' -e '=>' tim at -e line 1. If I want to use a built-in keyword’s name as a bareword, I can put => after it: $ ./perl -Ilib -e 'warn time =>' time at -e line 1. But if I combine the two (keyword + newline), it does not work: $ ./perl -Ilib -e 'warn time' -e ' =>' 1373611283 at -e line 1. unless I override the keyword: $ ./perl -Ilib -Msubs=time -e 'warn time' -e ' =>' time at -e line 1. => after a bareword is checked for in two places in toke.c. The first comes before a comment saying ‘NO SKIPSPACE BEFORE HERE!’; it only skips spaces and finds a => on the same line. The second comes later; it skips vertical space and comments, too. But the second check is in a code path that is not reached by keywords that are not overridden (as is the ‘NO SKIPSPACE’ comment). This commit adds an extra check for built-in keywords after we have determined that the keyword is not overridden. In that case, there is no reason we cannot use skipspace, as we no longer have to worry about what PL_oldbufptr etc. point to. This commit leaves __DATA__ and __END__ alone, since they are special, problematic and controversial. (See, e.g., <https://rt.perl.org/rt3/Ticket/Display.html?id=78348#txn-1234355>.) Allowing whitespace to be scanned across line boundaries without increasing the line number (something this commit has to do to make this work) can cause the way PL_linestr is handled to change. PL_linestr usually holds just the current line when reading from a handle. Now it can hold the current line plus the next line or seve- ral lines, depending on how much whitespace is to be found there. When '\n' or '#' was encountered, the lexer would modify the buffer in place and add a null, setting PL_bufend to point to that null. That would make it look as though the end of the line had been reached, and avoided having to scan to find the end of a comment. In string eval and quote-like operators, the end of the comment does have to be scanned for. We can’t just fake EOL and read the next line of input. Under MAD builds, the end of the comment was being scanned for any- way, even when reading from a handle. So everything worked under MAD, which was what I tested 5969c5766a5d3 under. This commit changes the '\n' and '#' handling to match the MAD code (scan for the end of the comment instead of faking a buffer trunca- tion), which 5969c5766a5d3 failed to do.
* Remove redundant field from inversion listsKarl Williamson2013-07-161-2/+1
| | | | | | | The number of elements in an inversion list is a simple calculation based on SvCUR(). Prior to this patch there was a field that contained that number directly, and the two values diverged, causing a bug. A single value can't get out-of-sync with itself.
* Add parameter to internal functionKarl Williamson2013-07-161-1/+1
| | | | | | | The function invlist_set_len() has to be called after the offset header field in an inversion list has been set. To make sure that future maintainers don't forget to do this, add the parameter for the 'offset' to its call, so it can't be called without having this value.