summaryrefslogtreecommitdiff
path: root/toke.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Patch from Inaba Hiroto:Jarkko Hietaniemi2001-01-281-6/+19
| | | | | | | | | | | | | | | | | | | | | | | - canonical UTF-8 hash keys: if a key string for a hash is UTF8-on, try downgrade the string and use it if unicode::distinct is not in effect. For the task, I added a function bytes_from_utf8() to utf8.c. It might resemble utf8_to_bytes() but it is not convenient to the task. Made a test for it and added to t/op/each.t - Changed do_print in doio.c to apply sv_utf8_(downgrade|upgrade) to the mortal copy of the argument SV. And changed t/io/utf8.t test 18 which expects print() to upgrade its argument. - re-implement sv_eq with bytes_from_utf8() - some bug fixes - tr/// does not handle UTF8 range (\x{}-\x{}) - \ before raw UTF8 character produced "Malformed UTF-8 character" warning. - "\x{100}\N{CENT SIGN}" is Malformed. Added tests for these 3. - and one silly bug (by me) with qu operator. p4raw-id: //depot/perl@8583
* Threadedness patch for #8562 from Doug MacEachern.Jarkko Hietaniemi2001-01-271-2/+2
| | | p4raw-id: //depot/perl@8564
* Re: Announce : Tokener reporting patchSimon Cozens2001-01-271-20/+49
| | | | | Message-ID: <20010122021722.A9334@pembro26.pmb.ox.ac.uk> p4raw-id: //depot/perl@8562
* More UTF-8 patches from Inaba Hiroto.Jarkko Hietaniemi2001-01-151-13/+17
| | | | | | | | | | | | | | | | | | | | - The substr lval was still not okay. - Now pp_stringify and sv_setsv copies source's UTF8 flag even if IN_BYTE. pp_stringify is called from fold_constants at optimization phase and "\x{100}" was made SvUTF8_off under use bytes (the bytes pragma is for "byte semantics" and not for "do not produce UTF8 data") - New `qu' operator to generate UTF8 string explicitly. Though I agree with the policy "0x00-0xff always produce bytes", sometimes want to such a string to be coded in UTF8. I can use pack"U0a*" but it requires more typing and has runtime overhead. - Fix pp_regcomp bug uncovered by "0x00-0xff always produce bytes" change, the bug appears if a pm has PMdf_UTF8 flag but interpolated string is not UTF8_on and has char 0x80-0xff. TODO: document and test qu. p4raw-id: //depot/perl@8439
* -WformatRobin Barker2001-01-121-2/+3
| | | | | Message-Id: <200101122003.UAA29599@tempest.npl.co.uk> p4raw-id: //depot/perl@8425
* Consolidated lvalue sub changesStephen McCamant2001-01-121-3/+15
| | | | | Message-ID: <14941.16925.736415.785818@soda.csua.berkeley.edu> p4raw-id: //depot/perl@8417
* [Patch perl@8375] pragma/subs.t ......FAILED tests 1-2 using Bison's parserRoca, Ignasi2001-01-121-1/+2
| | | | | Message-ID: <5930DC161690D2119667009027157547038123E1@madt009a.siemens.es> p4raw-id: //depot/perl@8413
* One more patch for UTF8 Inaba Hiroto2001-01-091-8/+5
| | | | | | | Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp> UTF-8 fixes for 'x' and tr///. p4raw-id: //depot/perl@8378
* IRIX compiler noticed that the bof initialization might beJarkko Hietaniemi2001-01-061-1/+2
| | | | | bypassed by control flow. p4raw-id: //depot/perl@8343
* Add a note about EBCDIC versus UTF-8 to a potential problem spot.Jarkko Hietaniemi2001-01-051-0/+6
| | | p4raw-id: //depot/perl@8341
* "\x{FF}\xFF" was broken, the \xFF was appended in itsJarkko Hietaniemi2001-01-051-7/+15
| | | | | raw 8-bit form to the UTF-8 string. p4raw-id: //depot/perl@8330
* strings with \x{..} in the middle are corrupted Roca, Ignasi2001-01-051-3/+6
| | | | | | | | Message-ID: <5930DC161690D211966700902715754703738F96@madt009a.siemens.es> UTF-8 parsing fix that seems to be needed for EBCDIC, in ASCII no effect. (changed the strncpy() to Copy()) p4raw-id: //depot/perl@8329
* UTF-8 cleanup.Jarkko Hietaniemi2001-01-051-12/+12
| | | p4raw-id: //depot/perl@8328
* Unify UTF-8 malformedness handling.Jarkko Hietaniemi2001-01-051-1/+1
| | | p4raw-id: //depot/perl@8323
* Corrections for Perl_yylex_r (used by a reentrant parser as Bison)Roca, Ignasi2001-01-041-7/+2
| | | | | | | Message-ID: <5930DC161690D21196670090271575470370111A@madt009a.siemens.es> The toke.c part only, patching embed.h and proto.h is futile. p4raw-id: //depot/perl@8306
* scanning two hex-constants fails on EBCDIC environment (script length.t)Roca, Ignasi2001-01-041-23/+23
| | | | | Message-ID: <5930DC161690D211966700902715754703738AA6@madt009a.siemens.es> p4raw-id: //depot/perl@8305
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* Tweak for MULTIPLICITY/USE_PERLIONick Ing-Simmons2000-12-301-8/+30
| | | p4raw-id: //depot/perlio@8272
* Re: [ID 19991001.003] sort(sub(arg)) misparsed as sort sub argsSimon Cozens2000-12-281-3/+3
| | | | | Message-ID: <20001227141244.A13344@deep-dark-truthful-mirror.perlhacker.org> p4raw-id: //depot/perl@8239
* Revert the -f ambiguousity patch, seems to causeJarkko Hietaniemi2000-12-111-4/+3
| | | | | | too much hassle (the interpret -Q as a function where Q is not a known filetest part is left in). p4raw-id: //depot/perl@8084
* Did not get that has_utf8/this_utf8 fix right last time, another spotNick Ing-Simmons2000-12-091-4/+5
| | | | | was only testing this_utf8. p4raw-id: //depot/perlio@8053
* Typo/thinko in S_scan_const() - seeing high bit sets has_utf8 not this_utf8Nick Ing-Simmons2000-12-091-72/+72
| | | | | i.e. the output string has one, but don't mess with source assumption. p4raw-id: //depot/perlio@8052
* dTHR is a nop in 5.6.0 onwards. Ergo, it can go.Jarkko Hietaniemi2000-12-051-27/+0
| | | p4raw-id: //depot/perl@7984
* Integrate perlio:Jarkko Hietaniemi2000-12-041-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ 7971] Quieten some noise in Win32 builds: - win32.h is included after <sys/socket.h>, so need to set Win32SCK_IS_STDSCK earlier to avoid re-defined noise in XSUB.h - GCC (& MSVC?) have execv(...,const char *const *) so need a cast from char **. [ 7970] PERL_IMPLICIT_SYS compiles but does not work. p4raw-link: @7971 on //depot/perlio: b4748376b6239962bd75b743e5a7b14788a2970c p4raw-link: @7970 on //depot/perlio: adb71456d0ff53391c88789f315f1e66b14373d5 p4raw-id: //depot/perl@7972
| * Quieten some noise in Win32 builds:Nick Ing-Simmons2000-12-041-1/+1
| | | | | | | | | | | | | | - win32.h is included after <sys/socket.h>, so need to set Win32SCK_IS_STDSCK earlier to avoid re-defined noise in XSUB.h - GCC (& MSVC?) have execv(...,const char *const *) so need a cast from char **. p4raw-id: //depot/perlio@7971
* | Make uv_to_utf8() to zero-terminate its output buffer,Jarkko Hietaniemi2000-12-031-1/+1
|/ | | | | always use (at least) UTF8_MAXLEN + 1 U8s deep buffer. p4raw-id: //depot/perl@7967
* Re: [ID 20001130.011] expression parsing bug ?Mike Guy2000-12-021-1/+1
| | | | | | | Message-Id: <E142GRN-0003go-00@libra.cus.cam.ac.uk> An extraneous argument. p4raw-id: //depot/perl@7958
* Some help for 20001130.011. Now one gets warnings likeJarkko Hietaniemi2000-12-011-33/+52
| | | | | "Ambiguous -f() resolved as a file test ..." p4raw-id: //depot/perl@7944
* Retract #7941. Forbidding subs m/s/etc is too cruel becauseJarkko Hietaniemi2000-12-011-25/+0
| | | | | that also breaks using them as methods. p4raw-id: //depot/perl@7943
* (Retracted by #7943.)Jarkko Hietaniemi2000-12-011-0/+25
| | | | | Reserve the short named string operator names. p4raw-id: //depot/perl@7941
* toke.c perlio.c -Wformat nitsRobin Barker2000-11-301-2/+2
| | | | | Message-Id: <200011301427.OAA00030@tempest.npl.co.uk> p4raw-id: //depot/perl@7935
* Tokeniser debuggingSimon Cozens2000-11-291-1/+33
| | | | | Message-ID: <20001129141545.A30864@pembro33.pmb.ox.ac.uk> p4raw-id: //depot/perl@7916
* Go ahead and #include <unistd.h> in perl.h.Jarkko Hietaniemi2000-11-221-6/+0
| | | p4raw-id: //depot/perl@7816
* [ID 20001113.003] utf8_to_uv on malformed utf returns wrong valuesYitzchak Scott-Thoennes2000-11-141-1/+1
| | | | | Message-Id: <200011132249.eADMnek09679@garcia.efn.org> p4raw-id: //depot/perl@7677
* Overrideable keys, each, pop, push, shift, splice, unshift.Casey R. Tweten2000-11-081-7/+7
| | | | | | Subject: [PATCH] prototyped functions that should be overrideable Message-ID: <Pine.OSF.4.21.0011031100470.17471-100000@home.kiski.net> p4raw-id: //depot/perl@7600
* glibc5 detection by __GNU_LIBRARY__.Jarkko Hietaniemi2000-11-061-2/+8
| | | p4raw-id: //depot/perl@7582
* Fix forDavid Dyck2000-11-061-2/+20
| | | | | | | | | | Subject: [ID 20000728.005] perl -P broken Message-Id: <200007290019.RAA08484@dd.tc.fluke.com> (hopefully). The fix is also not complete, it seems to break BOM swallowing for libc5 systems, but until someone figures out a way to do this without ftell(), this will do. p4raw-id: //depot/perl@7570
* A fix of sorts for 20000329.026, a better error messageJarkko Hietaniemi2000-11-051-4/+13
| | | | | for a missing "use charnames" when using the \N{...}. p4raw-id: //depot/perl@7557
* Make \x{...} consistently produce UTF-8.Simon Cozens2000-10-291-19/+20
| | | | | | Subject: Re: \x{...} is confused Message-ID: <20001029193648.A6287@pembro4.pmb.ox.ac.uk> p4raw-id: //depot/perl@7485
* The reëntrant version shouldn't be needed unless USE_PURE_BISON.Jarkko Hietaniemi2000-10-281-17/+19
| | | p4raw-id: //depot/perl@7465
* Continue the internal UTF-8 API tweaking.Jarkko Hietaniemi2000-10-251-2/+2
| | | | | | | | Rename utf8_to_uv_chk() back to utf8_to_uv() because it's used much more than the simpler API, now called utf8_to_uv_simple(). Still not quite happy with API, too much partial duplication of functionality. p4raw-id: //depot/perl@7439
* Allow poking holes at the UTF-8 decoding strictness.Jarkko Hietaniemi2000-10-251-2/+2
| | | p4raw-id: //depot/perl@7438
* Make the UTF-8 decoding stricter and more verbose whenJarkko Hietaniemi2000-10-241-28/+38
| | | | | | | | | | | | malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
* Make scan_num() reëntrant, as suggested inRoca, Ignasi2000-10-201-16/+11
| | | | | | Subject: [PATCH perl@7229] Rentrant parser and yylex() Message-ID: <5930DC161690D211966700902715754702DA09CD@madt009a.siemens.es> p4raw-id: //depot/perl@7382
* Reëntrancy fix.Roca, Ignasi2000-10-201-12/+39
| | | | | | Subject: [PATCH perl@7229] Rentrant parser and yylex() Message-ID: <5930DC161690D211966700902715754702DA09CD@madt009a.siemens.es> p4raw-id: //depot/perl@7381
* Allow @+ and @- to be doublequoted, from Simon Cozens.Jarkko Hietaniemi2000-10-131-2/+4
| | | p4raw-id: //depot/perl@7224
* Re: Trapping by opmask sets strange parser state [PATCH]Gisle Aas2000-09-301-1/+1
| | | | | Message-Id: <m3aed9ybrm.fsf@eik.g.aas.no> p4raw-id: //depot/perl@7098
* Re-instate Perl_utf8_to_uv without checking parameter - added in change 7075.Nick Ing-Simmons2000-09-301-2/+2
| | | | | | | i.e. rename Simon's function to Perl_utf8_to_uv_chk, change all calls to it to use new name and add Perl_utf8_to_uv() as a wrapper which calls it passing 0 to checking to get the warning. p4raw-id: //depot/perl@7096
* Fix for the charnames.t failures from Spider Boardman.Jarkko Hietaniemi2000-09-151-0/+1
| | | p4raw-id: //depot/perl@7093
* Re: perl@7078 Spider Boardman2000-09-141-0/+2
| | | | | Message-Id: <200009142306.TAA20082@leggy.zk3.dec.com> p4raw-id: //depot/perl@7090