summaryrefslogtreecommitdiff
path: root/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* One more patch for UTF8 Inaba Hiroto2001-01-091-8/+0
| | | | | | | Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp> UTF-8 fixes for 'x' and tr///. p4raw-id: //depot/perl@8378
* Do away with strncpy() and a fixed length buffer.Jarkko Hietaniemi2001-01-051-4/+8
| | | p4raw-id: //depot/perl@8332
* Unify UTF-8 malformedness handling.Jarkko Hietaniemi2001-01-051-48/+96
| | | p4raw-id: //depot/perl@8323
* Use the UTF8_XXX macros in is_utf8_char(), a performance nitJarkko Hietaniemi2001-01-021-6/+6
| | | | | in is_utf8_string(). p4raw-id: //depot/perl@8300
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* Signedness nit.Jarkko Hietaniemi2000-12-301-1/+1
| | | p4raw-id: //depot/perl@8274
* more UTF8 test suites and an UTF8 patchInaba Hiroto2000-12-301-1/+5
| | | | | | | | Message-ID: <3A4D722D.243AFD88@st.rim.or.jp> Just the patch part for now, and the pragma renamed as unicode::distinct. p4raw-id: //depot/perl@8267
* (Retracted by #8264) More join() testing which was good becauseJarkko Hietaniemi2000-12-291-4/+2
| | | | | it revealed a bug in #8248 (the UTF8_EIGHT_BIT_LO() was wrong). p4raw-id: //depot/perl@8249
* Do not return the Unicode replacement character if UTF-8Jarkko Hietaniemi2000-12-081-7/+9
| | | | | decoding goes awry, it should be up to the caller to decide. p4raw-id: //depot/perl@8042
* Re: ebcdic <-> ascii tables interjected in uv <-> utf8 considered harmfulSimon Cozens2000-12-081-8/+3
| | | | | | | Message-ID: <20001208133331.A11535@deep-dark-truthful-mirror.perlhacker.org> (The pp_hot part needed a rewrite.) p4raw-id: //depot/perl@8039
* Use the UTF8 macros a bit. They can't be used with abandonJarkko Hietaniemi2000-12-081-9/+20
| | | | | | everywhere because we do generate illegal UTF-8 in some situations. This is of course naughty. p4raw-id: //depot/perl@8033
* Introduce macros for UTF8 decoding.Jarkko Hietaniemi2000-12-081-14/+15
| | | p4raw-id: //depot/perl@8028
* Document utf8_to_uv() better.Jarkko Hietaniemi2000-12-071-4/+6
| | | p4raw-id: //depot/perl@8024
* Document utf8_length(), utf8_distance(), and utf8_hop().Jarkko Hietaniemi2000-12-071-4/+21
| | | p4raw-id: //depot/perl@8023
* Split off the UTF-8 decoder tests, make them to check alsoJarkko Hietaniemi2000-12-051-7/+7
| | | | | the error message. p4raw-id: //depot/perl@7996
* dTHR is a nop in 5.6.0 onwards. Ergo, it can go.Jarkko Hietaniemi2000-12-051-3/+0
| | | p4raw-id: //depot/perl@7984
* Make uv_to_utf8() to zero-terminate its output buffer,Jarkko Hietaniemi2000-12-031-18/+26
| | | | | always use (at least) UTF8_MAXLEN + 1 U8s deep buffer. p4raw-id: //depot/perl@7967
* Get the three different space character classes right under utf8.Jarkko Hietaniemi2000-12-011-1/+1
| | | p4raw-id: //depot/perl@7940
* Re: question about retlen in utf8.c:Perl_utf8_to_uv()Peter Prymmer2000-11-301-2/+3
| | | | | | | Message-ID: <Pine.OSF.4.10.10011291233120.328738-100000@aspara.forte.com> plus regen perlapi.pod. p4raw-id: //depot/perl@7932
* This should have been part of #7872: no need to scan UTF-8Jarkko Hietaniemi2000-11-291-1/+1
| | | | | until eternity. p4raw-id: //depot/perl@7911
* No need to scan till infinity, 13 is enough.Jarkko Hietaniemi2000-11-261-4/+4
| | | p4raw-id: //depot/perl@7872
* Make utf8_length() and utf8_distance() (the latter of whichJarkko Hietaniemi2000-11-261-9/+19
| | | | | is unused at the moment) to be less forgiving about bad UTF-8. p4raw-id: //depot/perl@7869
* Introduce Perl_utf8_length(). Use it.Jarkko Hietaniemi2000-11-181-0/+29
| | | p4raw-id: //depot/perl@7744
* hush warnings about malformed EBCDIC textPeter Prymmer2000-11-151-0/+4
| | | | | Message-ID: <Pine.OSF.4.10.10011141500260.106218-100000@aspara.forte.com> p4raw-id: //depot/perl@7695
* Quit utf8_to_uv() instantly if curlen == 0.Jarkko Hietaniemi2000-11-151-3/+10
| | | p4raw-id: //depot/perl@7693
* Use UINT64_C().Jens Hamisch2000-11-151-1/+1
| | | | | | Subject: [ID 20001114.006] 5.7.0-7680 Solaris 8, 64 bit, utf8 patch Message-Id: <20001114191623.G20559@Strawberry.COM> p4raw-id: //depot/perl@7691
* [ID 20001113.003] utf8_to_uv on malformed utf returns wrong valuesYitzchak Scott-Thoennes2000-11-141-2/+2
| | | | | Message-Id: <200011132249.eADMnek09679@garcia.efn.org> p4raw-id: //depot/perl@7677
* Placate nervous compilers that see longer than ints switch()ing.Jarkko Hietaniemi2000-11-131-1/+1
| | | p4raw-id: //depot/perl@7671
* Varargs don't always work too well if one puts an unsignedYitzchak Scott-Thoennes2000-11-071-1/+1
| | | | | | | | char on the stack and pop an unsigned quad off the stack. Subject: Re: [ID 20001103.002] Not OK: perl v5.7.0 +DEVEL7523 on os2-64int-ld-2.30 (UNINSTALLED) Message-ID: <pxzB6gzkgKXY092yn@efn.org> p4raw-id: //depot/perl@7584
* printf UVs the correct way, noticed by Robin Barker.Jarkko Hietaniemi2000-11-011-3/+3
| | | p4raw-id: //depot/perl@7509
* UTF-8 decoder tweak.Jarkko Hietaniemi2000-10-291-1/+1
| | | p4raw-id: //depot/perl@7481
* Continue the internal UTF-8 API tweaking.Jarkko Hietaniemi2000-10-251-30/+29
| | | | | | | | Rename utf8_to_uv_chk() back to utf8_to_uv() because it's used much more than the simpler API, now called utf8_to_uv_simple(). Still not quite happy with API, too much partial duplication of functionality. p4raw-id: //depot/perl@7439
* Allow poking holes at the UTF-8 decoding strictness.Jarkko Hietaniemi2000-10-251-16/+25
| | | p4raw-id: //depot/perl@7438
* Rename UTF8LEN() to be UNISKIP(), too confusing to haveJarkko Hietaniemi2000-10-251-3/+3
| | | | | UTF8LEN() and UTF8SKIP(). p4raw-id: //depot/perl@7437
* Fix the bug reported inAndreas König2000-10-241-10/+28
| | | | | | | | Subject: Encode bug? Message-ID: <m3lmveqwh5.fsf@ak-71.mind.de> Also make is_utf8_char() stricter. p4raw-id: //depot/perl@7425
* Make the UTF-8 decoding stricter and more verbose whenJarkko Hietaniemi2000-10-241-48/+119
| | | | | | | | | | | | malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
* Thinko in #7222.Jarkko Hietaniemi2000-10-131-1/+1
| | | p4raw-id: //depot/perl@7223
* Use UTF8SKIP(), from Simon Cozens.Jarkko Hietaniemi2000-10-131-7/+1
| | | p4raw-id: //depot/perl@7222
* The HINT_BYTE patch is apparently unnecessary, retracted.Jarkko Hietaniemi2000-10-061-4/+0
| | | p4raw-id: //depot/perl@7156
* Patch from Peter Prymmer to disable utf8 in EBCDIC platforms.Jarkko Hietaniemi2000-10-061-0/+4
| | | p4raw-id: //depot/perl@7152
* Re-instate Perl_utf8_to_uv without checking parameter - added in change 7075.Nick Ing-Simmons2000-09-301-14/+34
| | | | | | | i.e. rename Simon's function to Perl_utf8_to_uv_chk, change all calls to it to use new name and add Perl_utf8_to_uv() as a wrapper which calls it passing 0 to checking to get the warning. p4raw-id: //depot/perl@7096
* utf8.c apidocSimon Cozens2000-09-141-1/+1
| | | | | Message-ID: <20000914234657.A13953@deep-dark-truthful-mirror.perlhacker.org> p4raw-id: //depot/perl@7087
* Replace #7084 withSpider Boardman2000-09-141-1/+2
| | | | | | Subject: Re: perl@7078 Message-Id: <200009142109.RAA03425@leggy.zk3.dec.com> p4raw-id: //depot/perl@7085
* UTF8-encoded version of 256 is 0xc4 0x80; test that a char isSimon Cozens2000-09-141-2/+1
| | | | | | | | convertible to bytes by checking it doesn't go above 0xc3 Subject: Re: perl@7078 Message-ID: <20000914205919.A11098@deep-dark-truthful-mirror.perlhacker.org> p4raw-id: //depot/perl@7084
* Batch of UTF-8 patches from Simon Cozens.Jarkko Hietaniemi2000-09-141-8/+37
| | | p4raw-id: //depot/perl@7075
* Fix forMarc Lehmann2000-09-071-1/+4
| | | | | | | | Subject: [ID 20000903.001] \w in utf8-strings Message-Id: <E13VUS5-0000cv-00.pgcc-forever-2000-09-03-09-44-29@fuji> and various related nits. p4raw-id: //depot/perl@7030
* small apidoc fixMarc Lehmann2000-09-071-1/+1
| | | | | Message-ID: <20000903051206.A5909@cerebro.laendle> p4raw-id: //depot/perl@7021
* Fix vec() / utf8 (was Re: bitvec ops still broken with utf8 -- or not?)Mike Guy2000-09-011-13/+20
| | | | | Message-Id: <E13Utuf-0004Bw-00@draco.cus.cam.ac.uk> p4raw-id: //depot/perl@6988
* various syntax errors and such (not fixed: comp/require.t#22 coredumpGurusamy Sarathy2000-08-011-1/+1
| | | | | on Windows) p4raw-id: //depot/perl@6476
* The swallow_bom() saga continues. The #23 of require.tJarkko Hietaniemi2000-07-311-22/+18
| | | | | | | | (UTF16-LE) still fails (silently, no output) but the #22 (UTF16-BE) seems to be working now. The root of the failure may be in sv_gets(): is it UTF-16LE-aware, especially when it comes to line endings? p4raw-id: //depot/perl@6469