summaryrefslogtreecommitdiff
path: root/doop.c
Commit message (Collapse)AuthorAgeFilesLines
* rmv/de-dup static const char array "strings"Daniel Dragan2018-03-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSVC due to a bug doesn't merge identicals between .o'es or discard these vars and their contents. MEM_WRAP_CHECK_2 has never been used outside of core according to cpan grep MEM_WRAP_CHECK_2 was removed on the "have PERL_MALLOC_WRAP" branch in commit fabdb6c0879 "pre-likely cleanup" without explination, probably bc it was unused. But MEM_WRAP_CHECK_2 was still left on the "no PERL_MALLOC_WRAP" branch, so remove it from the "no" side for tidyness since it was a mistake to leave it there if it was removed from the "yes" side of the #ifdef. Add MEM_WRAP_CHECK_s API, letter "s" means argument is string or static. This lets us get rid of the "%s" argument passed to Perl_croak_nocontext at a couple call sites since we fully control the next and only argument and its guaranteed to be a string literal. This allows merging of 2 "Out of memory during array extend" c strings by linker now. Also change the 2 op.h messages into macros which become string literals at their call sites instead of "read char * from a global char **" which was going on before. VC 2003 32b perl527.dll section size before .text name DE503 virtual size .rdata name 4B621 virtual size after .text name DE503 virtual size .rdata name 4B5D1 virtual size
* S_do_trans_complex(): outdent a block of codeDavid Mitchell2018-02-201-33/+33
| | | | whitespace-only change left over from my recent tr///c fix work
* PATCH: [perl #132750] Silence uninit warningKarl Williamson2018-01-211-1/+1
| | | | | I inspected the code, and there is no problem here; it's a compiler mistake. Nevertheless, smply initializing the variable silences it.
* doop.c: White-space onlyKarl Williamson2018-01-191-10/+10
| | | | Indent to correspond with the new block placed by the previous commit.
* Deprecate above \xFF in bitwise string opsKarl Williamson2018-01-191-0/+5
| | | | | | | | | | | | This is already a fatal error for operations whose outcome depends on them, but in things like "abc" & "def\x{100}" the wide character doesn't actually need to participate in the AND, and so perl doesn't. As a result of the discussion in the thread beginning with http://nntp.perl.org/group/perl.perl5.porters/244884, it was decided to deprecate these ones too.
* doop.c: Use MIN()Karl Williamson2018-01-191-1/+1
| | | | This is slightly cleaner than hand rolling the min.
* tr///: eliminate I32 from the do_trans*() fnsDavid Mitchell2018-01-191-15/+15
| | | | Replace each with a more appropriate type
* tr///: return Size_t count rather than I32David Mitchell2018-01-191-13/+13
| | | | | | Change the signature of all the internal do_trans*() functions to return Size_t rather than I32, so that the count returned by tr//// can cope with strings longer than 2Gb.
* tr///; simplify $utf8 =~ tr/nonutf8/nonutf8/David Mitchell2018-01-191-94/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The run-time code to handle a non-utf8 tr/// against a utf8 string is complex, with many variants of similar code repeated depending on the presence of the /s and /c flags. Simplify them all into a single code block by changing how the translation table is stored. Formerly, the tr struct contained possibly two tables: the basic 0-255 slot one, plus in the presence of /c, a second one to map the implicit search range (\x{100}...) against any residual replacement chars not consumed by the first table. This commit merges the two tables into a single unified whole. For example tr/\x00-\xfe/abcd/c is equivalent to tr/xff-\x{7fffffff}/abcd/ which generates a 259-entry translation table consisting of: 0x00 => -1 0x01 => -1 ... 0xfe => -1 0xff => a 0x100 => b 0x101 => c 0x102 => d In addition we store: 1) the size of the translation table (0x103 in the example above); 2) an extra 'wildcard' entry stored 1 slot beyond the main table, which specifies the action for any codepoints outside the range of the table (i.e. chars 0x103..0x7fffffff). This can be either: a) a character, when the last replacement char is repeated; b) -1 when /c isn't in effect; c) -2 when /d is in effect; c) -3 identity: when the replacement list is empty but not /d. In the example above, this would be 0x103 => d The addition of -3 as a valid slot value is new. This makes the main runtime code for the utf8 string with non-utf8 tr// case look like, at its core: size = tbl->size; mapped_ch = tbl->map[ch >= size ? size : ch]; which then processes mapped_ch based on whether its >=0, or -1/-2/-3. This is a lot simpler than the old scheme, and should generally be faster too.
* tr///c: handle len(replacement charlist) > 32767David Mitchell2018-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | RT #132608 In the non-utf8 case, the /c (complement) flag to tr adds an implied \x{100}-\x{7fffffff} range to the search charlist. If the replacement list contains more chars than are paired with the 0-255 part of the search list, then the excess chars are stored in an extended part of the table. The excess char count was being stored as a short, which caused problems if the replacement list contained more than 32767 excess chars: either substituting the wrong char, or substituting for a char located up to 0xffff bytes in memory before the real translation table. So change it to SSize_t. Note that this is only a problem when the search and replacement charlists are non-utf8, the replacement list contains around 0x8000+ entries, and where the string being translated is utf8 with at least one codepoint >= U+8000.
* add two structs for OP_TRANSDavid Mitchell2018-01-191-23/+28
| | | | | | | | | | | | | | | | | Originally, the op_pv of an OP_TRANS op pointed to a 256-slot array of shorts, which contained the translations. However, in the presence of tr///c, extra information needs to be stored to handle utf8 strings. The 256 slot array was extended, with slot 0x100 holding a length, and slots 0x101 holding some extra chars. This has made things a bit messy, so this commit adds two structs, one being an array of 256 shorts, and the other being the same but with some extra fields. So for example tbl->[0x100] has been replaced with tbl->excess_len. This commit should make no functional difference, but will allow us shortly to fix a bug by changing the type of the excess_len field from short to something bigger, for example.
* S_do_trans_complex(): re-indentDavid Mitchell2018-01-191-6/+6
| | | | outdent a code block following previous commit.
* fix "\x{100}..." =~ tr/.../.../cdDavid Mitchell2018-01-191-25/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In transliterations where the search and replacement charlists are non-utf8, but where the string being modified contains codepoints >= 0x100, then tr/.../.../cd would always delete all such codepoints, rather than potentially mapping some of them. In more detail: in the presence of /c (complement), an implicit 0x100..0x7fffffff is added to a non-utf8 search charlist. If the replacement list is longer than the < 0x100 part of the search list, then the last few replacement chars should in principle be paired off against the first few of (\x100, \x101, ...). However, this wasn't happening. For example, tr/\x00-\xfd/ABCD/cd should be equivalent to tr/\xfe-\x{7fffffff}/ABCD/d which should map: \xfe => A, \xff => B, \x{100} => C, \x{101} => D, and delete \x{102} onwards. But instead, it behaved like tr/\xfe-\x{7fffffff}/AB/d and deleted all codepoints >= 0x100. This commit fixes that by using the extended mapping table format for all /c variants (formerly it excluded /cd). I also changed a variable holding the mapped char from being I32 to UV: principally to avoid a casting mess in the fixed code. This may (or may not), as a side-effect, have fixed possible issues with very large codepoints.
* OP_TRANS: change extended table formatDavid Mitchell2018-01-191-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For non-utf8, OP_TRANS(R) ops have a translation table consisting of an array of 256 shorts attached. For tr///c, this table is extended to hold information about chars in the replacement list which aren't paired with chars in the search list. For example, tr/\x00-AE-\xff/bcdefg/c is equivalent to tr/BCD\x{100}-\x{7fffffff}/bcdefg/ which is equivalent to tr/BCD\x{100}-\x{7fffffff}/bcdefggggggggg..../ Only the BCD => bcd mappings can be stored in the basic 256-slot table, so potentially the following extra information needs recording in an extended table to handle codepoints > 0xff in the string being modified: 1) the extra replacement chars ("efg"); 2) the number of extra replacement chars (3); 3) the "repeat" char ('g'). Currently 2) and 3) are combined: the repeat char is found as the last extra char, and if there are no extra chars, the repeat char is treated as an extra char list of length 1. Similarly, an 'extra chars' length value of 1 can imply either one extra char, or no extra chars with the repeat char being faked as an extra char. An 'extra chars' length of 0 implies an empty replacement list, i.e. tr/....//c. This commit changes it so that the repeat char is *always* stored (in slot 0x101), with the extra chars stored beginning at slot 0x102. The 'extra chars' length value (located at slot 0x0100) has changed its meaning slightly: now -1 implies tr/....//c 0 implies no more replacement chars than search chars 1+ the number of excess replacement chars. This (should) make no function difference, but the extra information stored will make it easier to fix some bugs shortly.
* remove fossil debugging statement from do_trans()David Mitchell2018-01-191-2/+0
| | | | | | | | | This: DEBUG_t( Perl_deb(aTHX_ "2.TBL\n")); has been around in one form or another since perl1, but it makes no sense since perl5,000, where -Dt now shows the name of the op being executed.
* tr/// functions: add some basic code commentsDavid Mitchell2018-01-191-0/+63
| | | | | | | | | | | | | For the various C functions which implement the compile-time and run-time aspects of OP_TRANS, add some basic code comments at the top of each function explaining what its purpose is. Also add lots of code comments to the body of S_pmtrans() (which compiles a tr///). Also comment what the OPpTRANS_ private flag bits mean. No functional changes.
* doop.c: Change to use is_utf8_invariant_string()Karl Williamson2017-11-231-27/+9
| | | | | | | This commit changes 3 occurrences of byte-at-a-time looking to see if a string is invariant under UTF-8, to using the inlined is_utf8_invariant_string() which now does much faster word-at-a-time looking.
* hv_pushkv(): handle keys() and values() tooDavid Mitchell2017-07-271-18/+3
| | | | | | | | | | | | | The newish function hv_pushkv() currently just pushes all key/value pairs on the stack. i.e. it does the equivalent of the perl code '() = %h'. Extend it so that it can handle 'keys %h' and values %h' too. This is basically moving the remaining list-context functionality out of do_kv() and into hv_pushkv(). The rationale for this is that hv_pushkv() is a pure HV-related function, while do_kv() is a pp function for several ops including OP_KEYS/VALUES, and expects PL_op->op_flags/op_private to be valid.
* create Perl_hv_pushkv() functionDavid Mitchell2017-07-271-12/+8
| | | | | | | | | | | | | | | | ...and make pp_padhv(), pp_rv2hv() use it rather than using Perl_do_kv() Both pp_padhv() and pp_rv2hv() (via S_padhv_rv2hv_common()), outsource to Perl_do_kv(), the list-context pushing/flattening of a hash onto the stack. Perl_do_kv() is a big function that handles all the actions of keys, values etc. Instead, create a new function which does just the pushing of a hash onto the stack. At the same time, split it out into two loops, one for tied, one for normal: the untied one can skip extending the stack on each iteration, and use a cheaper HeVAL() instead of calling hv_iterval().
* OP_VALUES: reserve OPpMAYBE_LVSUB bitDavid Mitchell2017-07-271-0/+3
| | | | | | This op doesn't use that bit, but it calls the function Perl_do_kv(), which is called by several different ops which *do* use that bit. So ensure no-one in future thinks that bit is spare in OP_VALUES.
* use OPpAVHVSWITCH_MASKDavid Mitchell2017-07-271-2/+4
| | | | Use this symbolic constant rather than the literal constant '3'.
* Perl_do_kv(): add asserts and more code commentsDavid Mitchell2017-07-271-9/+29
| | | | | | | | | | | | This function can be called directly or indirectly by several ops. Update its code comments to explain this in detail, and assert which ops can call it. Also remove a redundant comment about OP_RKEYS/OP_RVALUES; these ops have been removed. Also, reformat the 'dokv = ' expressions. Finally, add some code comments to pp_avhvswitch explaining what its for. Apart from the op_type asserts, there should be no functional changes.
* Allow bitwise & ^ | to accept trailing UTF-8Karl Williamson2017-06-141-13/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 08b6664b858b8fd4b5c0c27542763337b6d78e46 breaks things like $foo = "" & "\x{100}"; We have deprecated using above-FF code points in bitwise operations, and made them illegal in 5.27. However, the case where the illegal code points don't play a part in the operation never raised deprecation warnings. The example above is one such, because the \x{100} comes after the operation stops since the other operand has length 0. We can't make something illegal without warning people about it for 2 releases. Rather than revert that commit, and reinstate a bunch of slow code that is far more general than now needed, this commit adds some extra code to deal with these situations, but the basic operations still take place in tight loops, which 08b6664b858b8fd4b5c0c27542763337b6d78e46 caused to happen. In the case of "&", the illegal code points get truncated away. In the case of ^ and |, they get catenated as-is. This preserves earlier behavior. It has not been decided if these should at least warn, or the usage should be deprecated. A commit can easily be done to change to whatever the final decision is, but this commit doesn't raise any warnings, hence preserves existing behavior. The breaking commit looks like it might create some havoc with CPAN, and fixing it now will save the CPAN testers effort, as they won't have to deal with a bunch of broken distributions.
* doop.c: White-space onlyKarl Williamson2017-06-071-30/+30
| | | | Outdent, since the previous commit removed an enclosing block
* Use simple-minded approach to bitwise UTF-8 operationsKarl Williamson2017-06-071-116/+32
| | | | | | | | | | | | | | | | | | | Commit 5d09ee1cb7b68f5e6fd15233bfe5048612e8f949 fatalized bitwise operations of operands with wide characters in them. It retained the regular UTF-8 handling, but throws an error when a wide character is encountered. But this code is complicated because of its original intended generality. It can essentially be ripped out, replaced by code that just downgrades the operand to non-UTF-8. Then we use the regular code to do the operation. In the complement case, that's all that need be done to mimic earlier behavior, as the result has not been in UTF-8. For the other operations, the result is simply upgraded to UTF-8. This removes quite a few lines of code, and now the UTF-8 handling uses the same tight loops as the non-UTF-8. Downgrading and upgrading had to be done specially before, but now they are done in tight loops, before the operation, and after the operation
* sv_vcatpvfn() family: make svmax arg Size_tDavid Mitchell2017-06-071-2/+3
| | | | | | | | | | | | | | | It was formerly I32. It should be unsigned since you can't have a negative number of args. And although you're unlikely to call sprintf with more than 0x7fffffff args, it makes it more consistent with other APIs which we've been gradually expanding to 64-bit/ptrsize. It also makes the code internal to Perl_sv_vcatpvfn_flags more consistent, when dealing with explict arg index formats like "%10$s". This function still has a mix of STRLEN (for string lengths) and Size_t (for arg indexes) but they are aliases for each other. I made Perl_do_sprintf()'s len arg SSize_t rather than Size_t, since it typically gets called with ptr diff arithmetic. Not sure if this is being overly cautious.
* Fatalize the use of code points above 0xFF for bitwise operators.Abigail2017-06-071-14/+6
| | | | | | This commit removes quite a number of tests, mostly from t/op/bop.t, which test the behaviour of such code points in combination of bitwise operators. Since it's now fatal, the tests are no longer useful.
* remove -DH (DEBUG_H) misfeatureDavid Mitchell2017-06-051-6/+2
| | | | | | | RT# 129300 This hash-dumping debugging flag corrupted hash values and has probably not been used by anyone in 20 years.
* Define and use symbolic constants for LvFLAGSDagfinn Ilmari Mannsåker2017-06-021-2/+2
|
* Deprecate vec() with above-FF code points.Karl Williamson2017-06-011-3/+10
| | | | This will make this consistent with the bitwise operators.
* vec(): defer lvalue out-of-range croakingDavid Mitchell2017-03-311-0/+10
| | | | | | | | | | | | | | | | | | | | | RT #131083 Recent commits v5.25.10-81-gd69c430 and v5.25.10-82-g67dd6f3 added out-of-range/overflow checks for the offset arg of vec(). However in lvalue context, these croaks now happen before the SVt_PVLV was created, rather than when its set magic was called. This means that something like sub f { $x = $_[0] } f(vec($s, -1, 8)) now croaks even though the out-of-range value never ended up getting used in lvalue context. This commit fixes things by, in pp_vec(), rather than croaking, just set flag bits in LvFLAGS() to indicate that the offset is -Ve / out-of-range. Then in Perl_magic_getvec(), return 0 if these flags are set, and in Perl_magic_setvec() croak with a suitable error.
* fix integer overflows in Perl_do_vecget()/setDavid Mitchell2017-03-171-30/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RT #130915 In something like vec($str, $bignum, 16) (i.e. where $str is treated as a series of 16-bit words), Perl_do_vecget() and Perl_do_vecset() end up doing calculations equivalent to: $start = $bignum*2; $end = $start + 2; Currently both these calculations can wrap if $bignum is near the maximum value of a STRLEN (the previous commit already fixed cases for $bignum > max(STRLEN)). So this commit makes them check for potential overflow before doing such calculations. It also takes account of the fact that the previous commit changed the type of offset from signed to unsigned. Finally, it also adds some tests to t/op/vec.t for where the 'word' overlaps the end of the string, for example $x = vec("ab", 0, 64) should behave the same as: $x = vec("ab\0\0\0\0\0\0", 0, 64) This uses a separate code path, and I couldn't see any tests for it. This commit is based on an earlier proposed fix by Aaron Crane.
* Perl_do_vecget(): change offset arg to STRLEN typeDavid Mitchell2017-03-171-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | ... and fix up its caller, pp_vec(). This is part of a fix for RT #130915. pp_vec() is responsible for extracting out the offset and size from SVs on the stack, and then calling do_vecget() with those values. (Sometimes the call is done indirectly by storing the offset in the LvTARGOFF() field of a SVt_PVLV, then later Perl_magic_getvec() passes the LvTARGOFF() value to do_vecget().) Now SvCUR, SvLEN and LvTARGOFF are all of type STRLEN (a.k.a Size_t), while the offset arg of do_vecget() is of type SSize_t (i.e. there's a signed/unsigned mismatch). It makes more sense to make the arg of type STRLEN. So that is what this commit does. At the same time this commit fixes up pp_vec() to handle all the possibilities where the offset value can't fit into a STRLEN, returning 0 or croaking accordingly, so that do_vecget() is never called with a truncated or wrapped offset. The next commit will fix up the internals of do_vecget() and do_vecset(), which have to worry about offset*(2^n) wrapping or being > SvCUR(). This commit is based on an earlier proposed fix by Aaron Crane.
* Moving variables to their innermost scope.Andy Lester2017-02-181-2/+6
| | | | | | Some vars have been tagged as const because they do not change in their new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char, so is changed to char.
* Perl_do_vop(): enhance "avoid sv_catpvn"David Mitchell2016-11-091-12/+11
| | | | | | | | | | | | | | | | | TonyC's recent commit v5.25.6-172-gdc529e6 updated do_vop() to avoid doing a sv_catpvn() when the left and destination SVs are the same. As well as being more efficient, it is needed, as a recent change to sv_catpvn() made it more likely to grow and realloc the buffer, meaning the copy()'s src buffer had been freed. This commit represents my parallel attempt to fix the same issue; I'm replacing Tony's version with mine as it is logically more comprehensive: it copes with the dest being the same as the right arg as well as the left, and checks for string pointers being equal rather than sv's being equal. Neither of these make any difference currently, but they could in theory (although unlikely) catch some future change in usage. RT #129995
* (perl #129995) avoid sv_catpvn() in do_vop() when unneededTony Cook2016-11-071-2/+11
| | | | | | | This could call sv_catpvn() with the source string being within the destination SV, which caused a freed memory access if do_vop() and sv_catpvn_flags() had different ideas about the ideal size of the target SV's buffer.
* doop.c: use new SvPVCLEAR and constant string friendly macrosYves Orton2016-10-191-2/+2
|
* Change sv_setpvn(…, "…", …) to sv_setpvs(…, "…")Dagfinn Ilmari Mannsåker2016-09-211-1/+1
| | | | | The dual-life dists affected use Devel::PPPort, so can safely use sv_setpvs() even though it wasn't added until Perl v5.10.0.
* doop.c: use sv_setpvn() instead of sv_setpvs()Yves Orton2016-09-191-1/+1
|
* [perl #129287] Make UTF8 & append nullFather Chrysostomos2016-09-181-0/+1
| | | | | | | | The & and &. operators were not appending a null byte to the string in utf8 mode. (The internal function that they use is the same. I used &. in the test just because its intent is clearer.)
* Allow assignment to &CORE::keys()Father Chrysostomos2016-05-201-2/+2
|
* Allow &CORE::foo() with hash functionsFather Chrysostomos2016-05-201-2/+6
| | | | | &CORE::keys does not yet work as an lvalue. (I’m not sure how to make that work.)
* [perl #128187] Forbid sub :lvalue{keys} in aassignFather Chrysostomos2016-05-201-0/+7
| | | | | | | | | | | | This commit makes perl die when keys(%hash) is returned from an lvalue sub and the lvalue sub call is assigned to in list assignment: sub foo : lvalue { keys(%INC) } (foo) = 3; # death This prevents an assignment that is completely useless and probably a mistake, and it makes the lvalue-sub use of keys behave the same way as (keys(%INC)) = 3.
* doop.c: fix typo in header commentDavid Mitchell2016-02-151-1/+1
|
* make gimme consistently U8David Mitchell2016-02-031-1/+1
| | | | | | | | | | | | | The value of gimme stored in the context stack is U8. Make all other uses in the main core consistent with this. My primary motivation on this was that the new function cx_pushblock(), which I gave a 'U8 gimme' parameter, was generating warnings where callers were passing I32 gimme vars to it. Rather than play whack-a-mole, it seemed simpler to just uniformly use U8 everywhere. Porting/bench.pl shows a consistent reduction of about 2 instructions on the loop and sub benchmarks, so this change isn't harming performance.
* Deprecate wide chars in logical string opsKarl Williamson2015-12-161-0/+17
| | | | | | | See thread starting at http://nntp.perl.org/group/perl.perl5.porters/227698 Ricardo Signes provided the perldelta and perldiag text.
* doop.c: Fix typo in commentKarl Williamson2015-12-161-1/+1
|
* fix up EXTEND() callersDavid Mitchell2015-10-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | The previous commit made it clear that the N argument to EXTEND() is supposed to be signed, in particular SSize_t, and now typically triggers compiler warnings where this isn't the case. This commit fixes the various places in core that passed the wrong sort of N to EXTEND(). The fixes are in three broad categories. First, where sensible, I've changed the relevant var to be SSize_t. Second, where its expected that N could never be large enough to wrap, I've just added an assert and a cast. Finally, I've added extra code to detect whether the cast could wrap/truncate, and if so set N to -1, which will trigger a panic in stack_grow(). This also fixes [perl #125937] 'x' operator on list causes segfault with possible stack corruption
* Merge declaration and initialisation of local variableDagfinn Ilmari Mannsåker2015-07-221-2/+1
| | | | | | | Commit 2b32fed8 removed the PUTBACK/SPAGAIN around hv_iterval and Perl_sv_setpvf, but didn't take the opportunity to merge the initialisation with the declaration now that there's no code between them.
* Delete experimental autoderef featureAaron Crane2015-07-131-2/+2
|