summaryrefslogtreecommitdiff
path: root/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* The _uni_display should not be in dump.c since theyJarkko Hietaniemi2001-11-191-0/+27
| | | | | are used under normal operation (S_not_a_number()). p4raw-id: //depot/perl@13099
* Quieten pgcc 2.91.66 worries.Jarkko Hietaniemi2001-11-141-1/+1
| | | p4raw-id: //depot/perl@13008
* a few typo fixes Jeffrey Friedl2001-11-121-1/+1
| | | | | | | | | | | Message-Id: <200111120515.fAC5FIc74795@ventrue.corp.yahoo.com> Patching README.foo instead of pod/perlfoo.pod, not patching Math::BigInt (Tels will take care of that), dropping broken hv.c and sv.h patches, patching libnetcfg.PL and perldoc.PL instead of libnetcfg and perldoc, patching ext/Digest/MD5/t/files.t since MD5.pm was changed. p4raw-id: //depot/perl@12954
* Add documentation.Jarkko Hietaniemi2001-11-021-0/+24
| | | p4raw-id: //depot/perl@12808
* Unicode: add ToFold mapping. Not used yet; but basicallyJarkko Hietaniemi2001-11-021-3/+10
| | | | | | | | | | a more useful mapping for caseless aka case-ignoring than doing either lc($a) eq lc($b) or uc($a) eq uc($b); the full algorithm for creating the foldings uses equivalence classes, see http://www.unicode.org/unicode/reports/tr21/ Hopefully this feature will be used in //i. (The folding tables were introduced by #12689.) p4raw-id: //depot/perl@12807
* More logical to use %04"UVXf" than %"UVuf" sinceJarkko Hietaniemi2001-10-261-1/+1
| | | | | the Unicode standard prefers hex. p4raw-id: //depot/perl@12691
* Implement multicharacter case mappings where a singleJarkko Hietaniemi2001-10-211-23/+41
| | | | | Unicode character can be mapped into several. p4raw-id: //depot/perl@12546
* Document the problem with the swash_fetch() API that affectsJarkko Hietaniemi2001-10-161-0/+6
| | | | | more complex case conversions. p4raw-id: //depot/perl@12450
* Make the toupper/lower/title API for Unicode not rightJarkko Hietaniemi2001-10-091-33/+24
| | | | | | but at least less wrong: prepare for the mapping being more than just one-character-to-one-character. p4raw-id: //depot/perl@12371
* Custom OpsSimon Cozens2001-08-271-1/+1
| | | | | | Message-ID: <20010825174509.A5752@netthink.co.uk> I also added a fix to Opcode.pm to quite test cases. p4raw-id: //depot/perl@11756
* Salvage bits and pieces from the experimental 'utf8 everywhere'Jarkko Hietaniemi2001-05-311-1/+1
| | | | | | patch: rename HINT_BYTE and IN_BYTE to HINT_BYTES and IN_BYTES to match the pragma name; various robustness cleanups. p4raw-id: //depot/perl@10339
* More -Wall sweeping.Jarkko Hietaniemi2001-05-301-1/+1
| | | p4raw-id: //depot/perl@10338
* Fix Perl_swash_init & Perl_swash_fetch to save ERRSV (= $@)Jarkko Hietaniemi2001-05-291-0/+14
| | | | | | before Perl_load_module/Perl_call_method and restore the value after if !SvTRUE(ERRSV). (from Inaba Hiroto) p4raw-id: //depot/perl@10297
* In character classes one couldn't have 0x80..0xff charactersJarkko Hietaniemi2001-04-291-30/+41
| | | | | | at the left hand side if there were 0x100.. characters in the character class. p4raw-id: //depot/perl@9901
* A better fix for the \x{12345678} trouble from NI-S.Jarkko Hietaniemi2001-04-191-11/+1
| | | p4raw-id: //depot/perl@9755
* Workaround for the "\x{12345678}" plus s/(.)/$1/g plus ord/lengthJarkko Hietaniemi2001-04-181-1/+11
| | | | | | | | bug noticed by Robin Houston; basically the code of detecting value wraparound was acting differently under different compilers and platforms. The workaround is to remove the overflow check for now, a real fix would be to do the overflow (portably) right. p4raw-id: //depot/perl@9740
* updates to apidoc in utf8.cPrymmer/Kahn2001-04-161-20/+25
| | | | | Message-ID: <Pine.BSF.4.21.0104152037470.8946-100000@shell8.ba.best.com> p4raw-id: //depot/perl@9716
* Integrate perlio:Jarkko Hietaniemi2001-03-281-1/+2
| | | | | | | | | | | | | | [ 9400] More EBCDIC tweaks: - one more swash issue &~(0xA0-1) did not do the right thing, for UTF-EBCDIC where &~(0x80-1) does for UTF-8. - add "use re 'asciirange'" to make [!-~] etc. work use it in MIME::QuotedPrint and t/op/regexp.t and t/op/pat.t - Choose a key for t/op/each.t test which gets encoded. - Skip utf8decode if this is UTF-EBCDIC. p4raw-link: @9400 on //depot/perlio: daf0f78e031c718c75590ef9ef573756f805776e p4raw-id: //depot/perl@9407
* Integrate perlio:Jarkko Hietaniemi2001-03-271-3/+23
| | | | | | | | | | | | | | [ 9384] Various EBCDIC fixes: - major revelation that swash code is encoding aware, (or thought it was) - now it is ;-) - With that out of the way fix a slab of tr/// cases. - Fix Encode 'Unicode' to be true Unicode so tests pass. - As anticipated Base64.xs needed tweaks. - Until tr/// works right avoid old_encode64 in MIME tests. p4raw-link: @9384 on //depot/perlio: 5ad8ef521b3ffc4e6bbbb9941bc4940d442b56b2 p4raw-id: //depot/perl@9389
* More EBCDIC stuff:Nick Ing-Simmons2001-03-201-62/+75
| | | | | | | | | | | | | | - Loose the extra level of function on ASCII. - spotted a chr(0) issue in sv.c - re-work of UTF-X tr/// ranges to work in Unicode space. Still issues with the "0xff is illegal UTF-8" hack. - Yet another ad. hoc. utf8 'upgrade' in op.c recoded (why do it once when you can do it all over the place :-( - Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c, need utf8.pm for swashes. - Simplified and commented scan_const() in toke.c Still something wrong regexp and tr (swashes?). p4raw-id: //depot/perlio@9267
* More EBCDIC fixes.Nick Ing-Simmons2001-03-191-21/+14
| | | p4raw-id: //depot/perlio@9246
* UTF-X encoding invariance for Encode:Nick Ing-Simmons2001-03-181-1/+2
| | | | | | | | | | | | - move Encode::utf8_encode to utf8::encode (likewise decode,upgrade,downgrade,valid) - move the XS code for those to universal.c (so in miniperl) - add utf8::unicode_to_native and its inverse to allow EBCDIC to work in true unicode. - change ext/Encode/compile to use above. - Fix t/lib/encode.t for above - Teach t/lib/b.t to expect -uutf8 - In utf8.c look for SWASHNEW rather than just utf8:: package to see if utf8.pm is needed. p4raw-id: //depot/perlio@9198
* Correct #if EBCDIC side typos.Nick Ing-Simmons2001-03-171-1/+1
| | | | | Builds and passes many tests on OS390. p4raw-id: //depot/perlio@9190
* Infrastructure to use UTF-EBCDIC rather than UTF-8 as the internalNick Ing-Simmons2001-03-171-14/+37
| | | | | | | | | | | | | | | | | | encoding on EBCDIC platforms. This has property that U+0000..U+009F i.e. a superset of ASCII are invariant under the encoding. This is EBCDIC friendly as an encoded string can be looked at as being EBCDIC by lexer sprintf("%d",...) etc. in same manner that a UTF-8 string be considered ASCII on ASCII machines. - re-arrange utf8.h to get ASCII specific vs Unicode generic bits seperate. - Add some more macros to comprehend different shift amounts and possible swizzle in UTF-EBCDIC vs UTF-8. Change utf8.c to use them. - add utfebcdic.h which provides UTF-EBCDIC versions of the macros, and conditionally #include it. EBCDIC build as yet untested. ASCII still fails the one test. p4raw-id: //depot/perlio@9185
* EBCDIC Fixes.Nick Ing-Simmons2001-03-161-15/+16
| | | p4raw-id: //depot/perlio@9180
* Audit #ifdef EBCDIC and #ifndef ASCIIish, replace latter with former.Nick Ing-Simmons2001-03-111-4/+0
| | | | | Use ASCII_TO_NATIVE and NATIVE_TO_ASCII to avoid some #ifs. p4raw-id: //depot/perlio@9105
* EBCDIC sanity - phase INick Ing-Simmons2001-03-101-39/+113
| | | | | | | | | | | | | | - rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr) - use utf8n_xxxx (c.f. pvn) for forms which take length. - back out vN.N and $^V exceptions to e2a/a2e - make "locale" isxxx macros be uvchr (may be redundant?) Not clear yet that toUPPER_uni et. al. return being handled correctly. The tr// and rexexp stuff still needs an audit, assumption is they are working in Unicode space. Need to provide v5.6 names for XS modules (decide is uni or chr ?). p4raw-id: //depot/perlio@9096
* Re: Unicode/EBCDICPeter Prymmer2001-03-091-1/+3
| | | | | Message-ID: <Pine.OSF.4.10.10103081617390.377472-100000@aspara.forte.com> p4raw-id: //depot/perl@9082
* A comment tweak.Jarkko Hietaniemi2001-02-251-1/+1
| | | p4raw-id: //depot/perl@8931
* Fix for "[ID 20010213.005] utf8 + localized hash elems + 64 bits?"Jarkko Hietaniemi2001-02-181-7/+9
| | | | | The hash key got wrongly UTF8fied. p4raw-id: //depot/perl@8835
* UTF-8 tweaks.Jarkko Hietaniemi2001-02-181-4/+6
| | | p4raw-id: //depot/perl@8827
* Macrofy a magic UTF-8 test.Jarkko Hietaniemi2001-01-311-1/+1
| | | p4raw-id: //depot/perl@8647
* UTF-8 nit from Inaba Hiroto.Jarkko Hietaniemi2001-01-301-11/+7
| | | p4raw-id: //depot/perl@8615
* Patch from Inaba Hiroto:Jarkko Hietaniemi2001-01-281-0/+57
| | | | | | | | | | | | | | | | | | | | | | | - canonical UTF-8 hash keys: if a key string for a hash is UTF8-on, try downgrade the string and use it if unicode::distinct is not in effect. For the task, I added a function bytes_from_utf8() to utf8.c. It might resemble utf8_to_bytes() but it is not convenient to the task. Made a test for it and added to t/op/each.t - Changed do_print in doio.c to apply sv_utf8_(downgrade|upgrade) to the mortal copy of the argument SV. And changed t/io/utf8.t test 18 which expects print() to upgrade its argument. - re-implement sv_eq with bytes_from_utf8() - some bug fixes - tr/// does not handle UTF8 range (\x{}-\x{}) - \ before raw UTF8 character produced "Malformed UTF-8 character" warning. - "\x{100}\N{CENT SIGN}" is Malformed. Added tests for these 3. - and one silly bug (by me) with qu operator. p4raw-id: //depot/perl@8583
* Re: API CleanupSimon Cozens2001-01-161-12/+35
| | | | | | | | | | | | | | | | | | To: perl5-porters@perl.org Date: Tue, 16 Jan 2001 13:42:30 +0000 Message-ID: <20010116134230.A13420@pembro26.pmb.ox.ac.uk> Subject: [PATCH] utf8.c documentation Date: Tue, 16 Jan 2001 13:52:48 +0000 Message-ID: <20010116135248.A13496@pembro26.pmb.ox.ac.uk> Subject: Re: API Cleanup From: Simon Cozens <simon@cozens.net> Date: Tue, 16 Jan 2001 14:58:55 +0000 Message-ID: <20010116145855.A13794@pembro26.pmb.ox.ac.uk> UTF-8 doc patches. p4raw-id: //depot/perl@8452
* One more patch for UTF8 Inaba Hiroto2001-01-091-8/+0
| | | | | | | Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp> UTF-8 fixes for 'x' and tr///. p4raw-id: //depot/perl@8378
* Do away with strncpy() and a fixed length buffer.Jarkko Hietaniemi2001-01-051-4/+8
| | | p4raw-id: //depot/perl@8332
* Unify UTF-8 malformedness handling.Jarkko Hietaniemi2001-01-051-48/+96
| | | p4raw-id: //depot/perl@8323
* Use the UTF8_XXX macros in is_utf8_char(), a performance nitJarkko Hietaniemi2001-01-021-6/+6
| | | | | in is_utf8_string(). p4raw-id: //depot/perl@8300
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* Signedness nit.Jarkko Hietaniemi2000-12-301-1/+1
| | | p4raw-id: //depot/perl@8274
* more UTF8 test suites and an UTF8 patchInaba Hiroto2000-12-301-1/+5
| | | | | | | | Message-ID: <3A4D722D.243AFD88@st.rim.or.jp> Just the patch part for now, and the pragma renamed as unicode::distinct. p4raw-id: //depot/perl@8267
* (Retracted by #8264) More join() testing which was good becauseJarkko Hietaniemi2000-12-291-4/+2
| | | | | it revealed a bug in #8248 (the UTF8_EIGHT_BIT_LO() was wrong). p4raw-id: //depot/perl@8249
* Do not return the Unicode replacement character if UTF-8Jarkko Hietaniemi2000-12-081-7/+9
| | | | | decoding goes awry, it should be up to the caller to decide. p4raw-id: //depot/perl@8042
* Re: ebcdic <-> ascii tables interjected in uv <-> utf8 considered harmfulSimon Cozens2000-12-081-8/+3
| | | | | | | Message-ID: <20001208133331.A11535@deep-dark-truthful-mirror.perlhacker.org> (The pp_hot part needed a rewrite.) p4raw-id: //depot/perl@8039
* Use the UTF8 macros a bit. They can't be used with abandonJarkko Hietaniemi2000-12-081-9/+20
| | | | | | everywhere because we do generate illegal UTF-8 in some situations. This is of course naughty. p4raw-id: //depot/perl@8033
* Introduce macros for UTF8 decoding.Jarkko Hietaniemi2000-12-081-14/+15
| | | p4raw-id: //depot/perl@8028
* Document utf8_to_uv() better.Jarkko Hietaniemi2000-12-071-4/+6
| | | p4raw-id: //depot/perl@8024
* Document utf8_length(), utf8_distance(), and utf8_hop().Jarkko Hietaniemi2000-12-071-4/+21
| | | p4raw-id: //depot/perl@8023
* Split off the UTF-8 decoder tests, make them to check alsoJarkko Hietaniemi2000-12-051-7/+7
| | | | | the error message. p4raw-id: //depot/perl@7996