summaryrefslogtreecommitdiff
path: root/utf8.h
Commit message (Collapse)AuthorAgeFilesLines
* Salvage bits and pieces from the experimental 'utf8 everywhere'Jarkko Hietaniemi2001-05-311-4/+4
| | | | | | patch: rename HINT_BYTE and IN_BYTE to HINT_BYTES and IN_BYTES to match the pragma name; various robustness cleanups. p4raw-id: //depot/perl@10339
* Typo in utf8.hJesús Quiroga2001-04-211-1/+1
| | | | | Message-Id: <5.0.2.1.1.20010421192107.01ce5a50@ix.netcorps.com> p4raw-id: //depot/perl@9775
* Integrate changes #9493,9494,9495,9496 from maintperlJarkko Hietaniemi2001-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | into mainline. fix a broken workaround for Borland compiler in change#4739 (caused weird "short reads" on DATA, which caused op/misc.t to fail) nits spotted by Borland compiler avoid redefinition warnings under Borland 5.02 various nits identified by the Borland 5.5 compiler; remove suppression of a few warnings p4raw-link: @9496 on //depot/maint-5.6/perl: 9d05ad52b0aa7d1f7d147da0c4dbc14de5fe4a37 p4raw-link: @9495 on //depot/maint-5.6/perl: 759997f1e719f33541bed70dd7f79bfa26a930b3 p4raw-link: @9494 on //depot/maint-5.6/perl: 01b59bde1cb7ff62776f3b83c0f2575c79a950a6 p4raw-link: @9493 on //depot/maint-5.6/perl: eea7051a8d4ef81c032143ab3193bc1240ab2e8f p4raw-link: @4739 on //depot/perl: c39cd00800303e8967294e98aa4c427a1872a251 p4raw-id: //depot/perl@9497 p4raw-integrated: from //depot/maint-5.6/perl@9492 'merge in' sv.c utf8.h (@9288..) toke.c (@9292..) ext/File/Glob/bsd_glob.c (@9415..) win32/makefile.mk (@9426..) win32/win32.h (@9494..)
* More EBCDIC stuff:Nick Ing-Simmons2001-03-201-0/+4
| | | | | | | | | | | | | | - Loose the extra level of function on ASCII. - spotted a chr(0) issue in sv.c - re-work of UTF-X tr/// ranges to work in Unicode space. Still issues with the "0xff is illegal UTF-8" hack. - Yet another ad. hoc. utf8 'upgrade' in op.c recoded (why do it once when you can do it all over the place :-( - Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c, need utf8.pm for swashes. - Simplified and commented scan_const() in toke.c Still something wrong regexp and tr (swashes?). p4raw-id: //depot/perlio@9267
* More EBCDIC fixes.Nick Ing-Simmons2001-03-191-1/+3
| | | p4raw-id: //depot/perlio@9246
* Infrastructure to use UTF-EBCDIC rather than UTF-8 as the internalNick Ing-Simmons2001-03-171-68/+69
| | | | | | | | | | | | | | | | | | encoding on EBCDIC platforms. This has property that U+0000..U+009F i.e. a superset of ASCII are invariant under the encoding. This is EBCDIC friendly as an encoded string can be looked at as being EBCDIC by lexer sprintf("%d",...) etc. in same manner that a UTF-8 string be considered ASCII on ASCII machines. - re-arrange utf8.h to get ASCII specific vs Unicode generic bits seperate. - Add some more macros to comprehend different shift amounts and possible swizzle in UTF-EBCDIC vs UTF-8. Change utf8.c to use them. - add utfebcdic.h which provides UTF-EBCDIC versions of the macros, and conditionally #include it. EBCDIC build as yet untested. ASCII still fails the one test. p4raw-id: //depot/perlio@9185
* Minor naming change UTF8_IS_ASCII => UTF8_IS_INVARIANTNick Ing-Simmons2001-03-171-0/+1
| | | p4raw-id: //depot/perlio@9184
* EBCDIC Fixes.Nick Ing-Simmons2001-03-161-9/+13
| | | p4raw-id: //depot/perlio@9180
* #ifdef'ed out code for 'USE_BYTES_DOWNGRADES' case.Nick Ing-Simmons2001-03-121-0/+4
| | | p4raw-id: //depot/perlio@9110
* EBCDIC sanity - phase INick Ing-Simmons2001-03-101-11/+7
| | | | | | | | | | | | | | - rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr) - use utf8n_xxxx (c.f. pvn) for forms which take length. - back out vN.N and $^V exceptions to e2a/a2e - make "locale" isxxx macros be uvchr (may be redundant?) Not clear yet that toUPPER_uni et. al. return being handled correctly. The tr// and rexexp stuff still needs an audit, assumption is they are working in Unicode space. Need to provide v5.6 names for XS modules (decide is uni or chr ?). p4raw-id: //depot/perlio@9096
* Re: Unicode/EBCDICPeter Prymmer2001-03-091-0/+19
| | | | | Message-ID: <Pine.OSF.4.10.10103081617390.377472-100000@aspara.forte.com> p4raw-id: //depot/perl@9082
* UTF-8 documentation.Jarkko Hietaniemi2001-02-111-0/+16
| | | p4raw-id: //depot/perl@8770
* Macrofy a magic UTF-8 test.Jarkko Hietaniemi2001-01-311-0/+1
| | | p4raw-id: //depot/perl@8647
* Unify UTF-8 malformedness handling.Jarkko Hietaniemi2001-01-051-10/+12
| | | p4raw-id: //depot/perl@8323
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* (Retracted by #8264) More join() testing which was good becauseJarkko Hietaniemi2000-12-291-3/+3
| | | | | it revealed a bug in #8248 (the UTF8_EIGHT_BIT_LO() was wrong). p4raw-id: //depot/perl@8249
* (Retracted by #8264) Externally: join() was still quite UTF-8-unaware.Jarkko Hietaniemi2000-12-291-5/+8
| | | | | | | | | Internally: sv_catsv() wasn't quite okay on UTF-8, it assumed that the only cases to care about are byte+byte and byte+character. TODO: See how well pp_concat() could be implemented in terms of sv_catsv(). p4raw-id: //depot/perl@8248
* Use the UTF8 macros a bit. They can't be used with abandonJarkko Hietaniemi2000-12-081-0/+5
| | | | | | everywhere because we do generate illegal UTF-8 in some situations. This is of course naughty. p4raw-id: //depot/perl@8033
* Introduce macros for UTF8 decoding.Jarkko Hietaniemi2000-12-081-1/+16
| | | p4raw-id: //depot/perl@8028
* UINT64_C() work continues.Jarkko Hietaniemi2000-11-151-2/+0
| | | p4raw-id: //depot/perl@7700
* Use UINT64_C().Jens Hamisch2000-11-151-1/+5
| | | | | | Subject: [ID 20001114.006] 5.7.0-7680 Solaris 8, 64 bit, utf8 patch Message-Id: <20001114191623.G20559@Strawberry.COM> p4raw-id: //depot/perl@7691
* [ID 20001113.003] utf8_to_uv on malformed utf returns wrong valuesYitzchak Scott-Thoennes2000-11-141-0/+2
| | | | | Message-Id: <200011132249.eADMnek09679@garcia.efn.org> p4raw-id: //depot/perl@7677
* Allow poking holes at the UTF-8 decoding strictness.Jarkko Hietaniemi2000-10-251-1/+12
| | | p4raw-id: //depot/perl@7438
* Rename UTF8LEN() to be UNISKIP(), too confusing to haveJarkko Hietaniemi2000-10-251-2/+2
| | | | | UTF8LEN() and UTF8SKIP(). p4raw-id: //depot/perl@7437
* Make the UTF-8 decoding stricter and more verbose whenJarkko Hietaniemi2000-10-241-1/+3
| | | | | | | | | | | | malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
* Make ~(chr(a).chr(b)) eq chr(~a).chr(~b) on utf8.Simon Cozens2000-10-151-0/+18
| | | | | | Subject: [PATCH] Re: [ID 20000918.005] ~ on wide chars Message-ID: <20001014205213.A9645@pembro4.pmb.ox.ac.uk> p4raw-id: //depot/perl@7235
* Tweak #7153.Jarkko Hietaniemi2000-10-061-2/+7
| | | p4raw-id: //depot/perl@7154
* Patch from Simon Cozens to avoid using utf8 routines in EBCDIC.Jarkko Hietaniemi2000-10-061-2/+2
| | | p4raw-id: //depot/perl@7153
* allocate sufficient buffer sizes for 64-bit wide utf8 charactersGurusamy Sarathy2000-02-191-0/+2
| | | | | | | permitted by change#5011 (from Gisle Aas) p4raw-link: @5011 on //depot/perl: 3c77ea2bace63b1ad27d15a6366cb938bdd158cb p4raw-id: //depot/perl@5136
* allow 64-bit utf8-encoded integers (from Ilya Zakharevich)Gurusamy Sarathy2000-02-071-1/+2
| | | p4raw-id: //depot/perl@5011
* set SvUTF8 on vectors only if there are chars > 127; update copyrightGurusamy Sarathy2000-02-061-1/+1
| | | | | years (from Gisle Aas) p4raw-id: //depot/perl@5009
* HINT_UTF8 is not propagated to the op tree anymore; add aGurusamy Sarathy2000-02-011-1/+1
| | | | | perlunicode.pod that reflects changes to unicode support so far p4raw-id: //depot/perl@4941
* runtime now looks at the SVf_UTF8 bit on the SV to decideGurusamy Sarathy2000-01-311-0/+16
| | | | | | | | | whether to use widechar semantics; lexer and RE engine continue to need "use utf8" to enable unicode awareness in literals and patterns (TODO: this needs to be fixed); $1 et al are marked SvUTF8 if the pattern was compiled for utf8 (TODO: propagating it from the data is probably better) p4raw-id: //depot/perl@4930
* Re-integrate mainlineNick Ing-Simmons1999-09-181-0/+1
| | | | | Basic SvUTF8 stuff in headers, no functional changes yet. p4raw-id: //depot/utfperl@4193
* EXTERN_C declarations for global arrays in variousGurusamy Sarathy1999-06-121-0/+4
| | | | | | headers, so perl can be built even in C++ mode; win32 build fixups; regen headers p4raw-id: //depot/perl@3537
* update copyright yearsGurusamy Sarathy1999-03-221-1/+1
| | | p4raw-id: //depot/perl@3124
* s/Perl_utf8skip/PL_utf8skip/gGurusamy Sarathy1998-11-171-3/+3
| | | p4raw-id: //depot/perl@2241
* fix globals caught by change#1927; builds and tests on SolarisGurusamy Sarathy1998-10-061-1/+1
| | | | | p4raw-link: @1927 on //depot/perl: eb07465ebe1238598e948058857ec948c6697f86 p4raw-id: //depot/perl@1936
* add new files to MANIFEST; add missing prototypes to proto.h;Gurusamy Sarathy1998-07-261-3/+3
| | | | | | s/PL_utf8skip/utf8skip/ for now, or we end up with Perl_PL_; add typecasts to silence warnings; tweaks for win32 builds p4raw-id: //depot/perl@1663
* Here are the long-expected Unicode/UTF-8 modifications.Larry Wall1998-07-241-0/+27
p4raw-id: //depot/utfperl@1651