| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
(Lots of Perl 5 source code archaeology was involved.)
Larry didn't make strangled noises when I showed him
the patch, either :-)
p4raw-id: //depot/perl@19242
|
|
|
| |
p4raw-id: //depot/perl@18807
|
|
|
| |
p4raw-id: //depot/perl@18801
|
|
|
|
|
| |
but 0xFFFE quite wrong.
p4raw-id: //depot/perl@15762
|
|
|
| |
p4raw-id: //depot/perl@15761
|
|
|
|
|
|
|
|
|
| |
be Hugo), ballooned a bit... the goal is Larry's wish that
illegal Unicode (such as U+FFFF) by default doesn't warn,
since what if somebody WANTS to create illegal Unicode?
Now getting close to this in the regex runtime.
(Also, fix more of my fixation that BOM would be U+FFFE.)
p4raw-id: //depot/perl@15689
|
|
|
| |
p4raw-id: //depot/perl@15148
|
|
|
| |
p4raw-id: //depot/perl@14900
|
|
|
| |
p4raw-id: //depot/perl@14758
|
|
|
| |
p4raw-id: //depot/perl@14561
|
|
|
| |
p4raw-id: //depot/perl@14391
|
|
|
|
|
|
|
|
|
| |
"the same" means trouble (here s and 's')
What broke now was 841 and 842 of t/op/pat.t, because of the
ANYOF_UNICODE_FOLD_SHARP_S() in utf8.h, ccversion 5.0.1.0
(note that breakage happened only under cc_r and usethreads+
useithreads)
p4raw-id: //depot/perl@14379
|
|
|
| |
p4raw-id: //depot/perl@14222
|
|
|
| |
p4raw-id: //depot/perl@14114
|
|
|
|
|
| |
enhance regex dumping code.
p4raw-id: //depot/perl@14096
|
|
|
| |
p4raw-id: //depot/perl@13866
|
|
|
|
|
|
| |
U+...FFFE, U+...FFFF, and characters beyond U+10FFFF
(the Unicode maximum code point) warnable offenses.
p4raw-id: //depot/perl@13823
|
|
|
| |
p4raw-id: //depot/perl@13672
|
|
|
|
|
| |
Message-ID: <3B9D23D6.90BCCC25@rowman.com>
p4raw-id: //depot/perl@11986
|
|
|
|
|
|
| |
and the Perl will be built to do that by default (adding that
will break scripts having non-UTF-8 binary data, such as Latin-1.)
p4raw-id: //depot/perl@11656
|
|
|
| |
p4raw-id: //depot/perl@11652
|
|
|
|
|
| |
Message-Id: <200107061339.JAA12582@bottesini.harvard.edu>
p4raw-id: //depot/perl@11184
|
|
|
|
|
|
| |
patch: rename HINT_BYTE and IN_BYTE to HINT_BYTES and IN_BYTES
to match the pragma name; various robustness cleanups.
p4raw-id: //depot/perl@10339
|
|
|
|
|
| |
Message-Id: <5.0.2.1.1.20010421192107.01ce5a50@ix.netcorps.com>
p4raw-id: //depot/perl@9775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into mainline.
fix a broken workaround for Borland compiler in change#4739
(caused weird "short reads" on DATA, which caused op/misc.t to fail)
nits spotted by Borland compiler
avoid redefinition warnings under Borland 5.02
various nits identified by the Borland 5.5 compiler; remove suppression
of a few warnings
p4raw-link: @9496 on //depot/maint-5.6/perl: 9d05ad52b0aa7d1f7d147da0c4dbc14de5fe4a37
p4raw-link: @9495 on //depot/maint-5.6/perl: 759997f1e719f33541bed70dd7f79bfa26a930b3
p4raw-link: @9494 on //depot/maint-5.6/perl: 01b59bde1cb7ff62776f3b83c0f2575c79a950a6
p4raw-link: @9493 on //depot/maint-5.6/perl: eea7051a8d4ef81c032143ab3193bc1240ab2e8f
p4raw-link: @4739 on //depot/perl: c39cd00800303e8967294e98aa4c427a1872a251
p4raw-id: //depot/perl@9497
p4raw-integrated: from //depot/maint-5.6/perl@9492 'merge in' sv.c
utf8.h (@9288..) toke.c (@9292..) ext/File/Glob/bsd_glob.c
(@9415..) win32/makefile.mk (@9426..) win32/win32.h (@9494..)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Loose the extra level of function on ASCII.
- spotted a chr(0) issue in sv.c
- re-work of UTF-X tr/// ranges to work in Unicode
space. Still issues with the "0xff is illegal UTF-8" hack.
- Yet another ad. hoc. utf8 'upgrade' in op.c recoded
(why do it once when you can do it all over the place :-(
- Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c,
need utf8.pm for swashes.
- Simplified and commented scan_const() in toke.c
Still something wrong regexp and tr (swashes?).
p4raw-id: //depot/perlio@9267
|
|
|
| |
p4raw-id: //depot/perlio@9246
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
encoding on EBCDIC platforms. This has property that U+0000..U+009F i.e.
a superset of ASCII are invariant under the encoding. This is EBCDIC
friendly as an encoded string can be looked at as being EBCDIC by lexer
sprintf("%d",...) etc. in same manner that a UTF-8 string be considered
ASCII on ASCII machines.
- re-arrange utf8.h to get ASCII specific vs Unicode generic bits
seperate.
- Add some more macros to comprehend different shift amounts and
possible swizzle in UTF-EBCDIC vs UTF-8. Change utf8.c to use them.
- add utfebcdic.h which provides UTF-EBCDIC versions of the macros,
and conditionally #include it.
EBCDIC build as yet untested. ASCII still fails the one test.
p4raw-id: //depot/perlio@9185
|
|
|
| |
p4raw-id: //depot/perlio@9184
|
|
|
| |
p4raw-id: //depot/perlio@9180
|
|
|
| |
p4raw-id: //depot/perlio@9110
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr)
- use utf8n_xxxx (c.f. pvn) for forms which take length.
- back out vN.N and $^V exceptions to e2a/a2e
- make "locale" isxxx macros be uvchr (may be redundant?)
Not clear yet that toUPPER_uni et. al. return being handled correctly.
The tr// and rexexp stuff still needs an audit, assumption is they are working
in Unicode space.
Need to provide v5.6 names for XS modules (decide is uni or chr ?).
p4raw-id: //depot/perlio@9096
|
|
|
|
|
| |
Message-ID: <Pine.OSF.4.10.10103081617390.377472-100000@aspara.forte.com>
p4raw-id: //depot/perl@9082
|
|
|
| |
p4raw-id: //depot/perl@8770
|
|
|
| |
p4raw-id: //depot/perl@8647
|
|
|
| |
p4raw-id: //depot/perl@8323
|
|
|
| |
p4raw-id: //depot/perl@8289
|
|
|
|
|
| |
it revealed a bug in #8248 (the UTF8_EIGHT_BIT_LO() was wrong).
p4raw-id: //depot/perl@8249
|
|
|
|
|
|
|
|
|
| |
Internally: sv_catsv() wasn't quite okay on UTF-8, it assumed
that the only cases to care about are byte+byte and byte+character.
TODO: See how well pp_concat() could be implemented in terms
of sv_catsv().
p4raw-id: //depot/perl@8248
|
|
|
|
|
|
| |
everywhere because we do generate illegal UTF-8 in some situations.
This is of course naughty.
p4raw-id: //depot/perl@8033
|
|
|
| |
p4raw-id: //depot/perl@8028
|
|
|
| |
p4raw-id: //depot/perl@7700
|
|
|
|
|
|
| |
Subject: [ID 20001114.006] 5.7.0-7680 Solaris 8, 64 bit, utf8 patch
Message-Id: <20001114191623.G20559@Strawberry.COM>
p4raw-id: //depot/perl@7691
|
|
|
|
|
| |
Message-Id: <200011132249.eADMnek09679@garcia.efn.org>
p4raw-id: //depot/perl@7677
|
|
|
| |
p4raw-id: //depot/perl@7438
|
|
|
|
|
| |
UTF8LEN() and UTF8SKIP().
p4raw-id: //depot/perl@7437
|
|
|
|
|
|
|
|
|
|
|
|
| |
malformation happens. This involved adding an argument
to utf8_to_uv_chk(), which involved changing its prototype,
and prefer STRLEN over I32 for the UTF-8 length, which as
a domino effect necessitated changing the prototypes of
scan_bin(), scan_oct(), scan_hex(), and reg_uni().
The stricter UTF-8 decoding checking uses Markus Kuhn's
UTF-8 Decode Stress Tester from
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
p4raw-id: //depot/perl@7416
|
|
|
|
|
|
| |
Subject: [PATCH] Re: [ID 20000918.005] ~ on wide chars
Message-ID: <20001014205213.A9645@pembro4.pmb.ox.ac.uk>
p4raw-id: //depot/perl@7235
|
|
|
| |
p4raw-id: //depot/perl@7154
|
|
|
| |
p4raw-id: //depot/perl@7153
|