delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	It's UTF-8, not UTF8. (Note: not s/UTF-8/UTF8/,	Jarkko Hietaniemi	2003-09-12	1	-2/+2
\| \| \| \| \| \|	since that would break a lot of code.) Also few stray UTF16s, UTF32s, and "encoded in Unicode". p4raw-id: //depot/perl@21198
*	Fix up Larry's copyright statements to my best knowledge.	Jarkko Hietaniemi	2003-04-16	1	-1/+1
\| \| \| \| \| \| \|	(Lots of Perl 5 source code archaeology was involved.) Larry didn't make strangled noises when I showed him the patch, either :-) p4raw-id: //depot/perl@19242
*	Reverse copyright update (#18801) for files not changed in 2003.	Hugo van der Sanden	2003-03-02	1	-1/+1
\| \| \|	p4raw-id: //depot/perl@18807
*	Update all copyrights to 2003, from Jarkko	Hugo van der Sanden	2003-03-02	1	-1/+1
\| \| \|	p4raw-id: //depot/perl@18801
*	As noted by Philip Newton: nothing wrong with BOM,	Jarkko Hietaniemi	2002-04-06	1	-14/+12
\| \| \| \| \|	but 0xFFFE quite wrong. p4raw-id: //depot/perl@15762
*	Explain the "gaps" in the UTF-8 encoding.	Jarkko Hietaniemi	2002-04-06	1	-0/+4
\| \| \|	p4raw-id: //depot/perl@15761
*	What started as a small nit (the charnames test, nit found	Jarkko Hietaniemi	2002-04-02	1	-5/+5
\| \| \| \| \| \| \| \| \|	be Hugo), ballooned a bit... the goal is Larry's wish that illegal Unicode (such as U+FFFF) by default doesn't warn, since what if somebody WANTS to create illegal Unicode? Now getting close to this in the regex runtime. (Also, fix more of my fixation that BOM would be U+FFFE.) p4raw-id: //depot/perl@15689
*	Mysterious characters.	Jarkko Hietaniemi	2002-03-10	1	-6/+6
\| \| \|	p4raw-id: //depot/perl@15148
*	Update the UTF-8 explanation table.	Jarkko Hietaniemi	2002-02-27	1	-2/+25
\| \| \|	p4raw-id: //depot/perl@14900
*	Not extending enough.	Jarkko Hietaniemi	2002-02-19	1	-2/+4
\| \| \|	p4raw-id: //depot/perl@14758
*	EBCDIC: SHARP S is different.	Jarkko Hietaniemi	2002-02-05	1	-1/+14
\| \| \|	p4raw-id: //depot/perl@14561
*	Copyright++. (Not all the toplevel *.h have one, it seems.)	Jarkko Hietaniemi	2002-01-23	1	-1/+1
\| \| \|	p4raw-id: //depot/perl@14391
*	AIX cpp bug: having macro arguments and character constants	Jarkko Hietaniemi	2002-01-23	1	-7/+7
\| \| \| \| \| \| \| \| \|	"the same" means trouble (here s and 's') What broke now was 841 and 842 of t/op/pat.t, because of the ANYOF_UNICODE_FOLD_SHARP_S() in utf8.h, ccversion 5.0.1.0 (note that breakage happened only under cc_r and usethreads+ useithreads) p4raw-id: //depot/perl@14379
*	Sharp S as a special treat for our German UTF-8 testers :-)	Jarkko Hietaniemi	2002-01-12	1	-0/+8
\| \| \|	p4raw-id: //depot/perl@14222
*	More regex and utf8 debug dumping.	Jarkko Hietaniemi	2002-01-07	1	-0/+3
\| \| \|	p4raw-id: //depot/perl@14114
*	Finish up (ha!) the Unicode case folding;	Jarkko Hietaniemi	2002-01-05	1	-0/+2
\| \| \| \| \|	enhance regex dumping code. p4raw-id: //depot/perl@14096
*	The funky final sigma casefolding.	Jarkko Hietaniemi	2001-12-23	1	-0/+5
\| \| \|	p4raw-id: //depot/perl@13866
*	Make using U+FDD0..U+FDEF (noncharacters since Unicode 3.1),	Jarkko Hietaniemi	2001-12-21	1	-0/+11
\| \| \| \| \| \|	U+...FFFE, U+...FFFF, and characters beyond U+10FFFF (the Unicode maximum code point) warnable offenses. p4raw-id: //depot/perl@13823
*	Unadorned numbers evil.	Jarkko Hietaniemi	2001-12-13	1	-1/+6
\| \| \|	p4raw-id: //depot/perl@13672
*	PATCH Resubmission - was Re: [ID 20010902.001] v strings over 2*31 barf	John Peacock	2001-09-10	1	-1/+1
\| \| \| \| \|	Message-ID: <3B9D23D6.90BCCC25@rowman.com> p4raw-id: //depot/perl@11986
*	If you want you can now add -DUSE_UTF8_SCRIPTS to your cflags	Jarkko Hietaniemi	2001-08-12	1	-0/+9
\| \| \| \| \| \|	and the Perl will be built to do that by default (adding that will break scripts having non-UTF-8 binary data, such as Latin-1.) p4raw-id: //depot/perl@11656
*	There is no IN_UTF8.	Jarkko Hietaniemi	2001-08-12	1	-1/+0
\| \| \|	p4raw-id: //depot/perl@11652
*	QNX patch extended for NTO	Norton T. Allen	2001-07-06	1	-1/+3
\| \| \| \| \|	Message-Id: <200107061339.JAA12582@bottesini.harvard.edu> p4raw-id: //depot/perl@11184
*	Salvage bits and pieces from the experimental 'utf8 everywhere'	Jarkko Hietaniemi	2001-05-31	1	-4/+4
\| \| \| \| \| \|	patch: rename HINT_BYTE and IN_BYTE to HINT_BYTES and IN_BYTES to match the pragma name; various robustness cleanups. p4raw-id: //depot/perl@10339
*	Typo in utf8.h	Jesús Quiroga	2001-04-21	1	-1/+1
\| \| \| \| \|	Message-Id: <5.0.2.1.1.20010421192107.01ce5a50@ix.netcorps.com> p4raw-id: //depot/perl@9775
*	Integrate changes #9493,9494,9495,9496 from maintperl	Jarkko Hietaniemi	2001-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into mainline. fix a broken workaround for Borland compiler in change#4739 (caused weird "short reads" on DATA, which caused op/misc.t to fail) nits spotted by Borland compiler avoid redefinition warnings under Borland 5.02 various nits identified by the Borland 5.5 compiler; remove suppression of a few warnings p4raw-link: @9496 on //depot/maint-5.6/perl: 9d05ad52b0aa7d1f7d147da0c4dbc14de5fe4a37 p4raw-link: @9495 on //depot/maint-5.6/perl: 759997f1e719f33541bed70dd7f79bfa26a930b3 p4raw-link: @9494 on //depot/maint-5.6/perl: 01b59bde1cb7ff62776f3b83c0f2575c79a950a6 p4raw-link: @9493 on //depot/maint-5.6/perl: eea7051a8d4ef81c032143ab3193bc1240ab2e8f p4raw-link: @4739 on //depot/perl: c39cd00800303e8967294e98aa4c427a1872a251 p4raw-id: //depot/perl@9497 p4raw-integrated: from //depot/maint-5.6/perl@9492 'merge in' sv.c utf8.h (@9288..) toke.c (@9292..) ext/File/Glob/bsd_glob.c (@9415..) win32/makefile.mk (@9426..) win32/win32.h (@9494..)
*	More EBCDIC stuff:	Nick Ing-Simmons	2001-03-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Loose the extra level of function on ASCII. - spotted a chr(0) issue in sv.c - re-work of UTF-X tr/// ranges to work in Unicode space. Still issues with the "0xff is illegal UTF-8" hack. - Yet another ad. hoc. utf8 'upgrade' in op.c recoded (why do it once when you can do it all over the place :-( - Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c, need utf8.pm for swashes. - Simplified and commented scan_const() in toke.c Still something wrong regexp and tr (swashes?). p4raw-id: //depot/perlio@9267
*	More EBCDIC fixes.	Nick Ing-Simmons	2001-03-19	1	-1/+3
\| \| \|	p4raw-id: //depot/perlio@9246
*	Infrastructure to use UTF-EBCDIC rather than UTF-8 as the internal	Nick Ing-Simmons	2001-03-17	1	-68/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	encoding on EBCDIC platforms. This has property that U+0000..U+009F i.e. a superset of ASCII are invariant under the encoding. This is EBCDIC friendly as an encoded string can be looked at as being EBCDIC by lexer sprintf("%d",...) etc. in same manner that a UTF-8 string be considered ASCII on ASCII machines. - re-arrange utf8.h to get ASCII specific vs Unicode generic bits seperate. - Add some more macros to comprehend different shift amounts and possible swizzle in UTF-EBCDIC vs UTF-8. Change utf8.c to use them. - add utfebcdic.h which provides UTF-EBCDIC versions of the macros, and conditionally #include it. EBCDIC build as yet untested. ASCII still fails the one test. p4raw-id: //depot/perlio@9185
*	Minor naming change UTF8_IS_ASCII => UTF8_IS_INVARIANT	Nick Ing-Simmons	2001-03-17	1	-0/+1
\| \| \|	p4raw-id: //depot/perlio@9184
*	EBCDIC Fixes.	Nick Ing-Simmons	2001-03-16	1	-9/+13
\| \| \|	p4raw-id: //depot/perlio@9180
*	#ifdef'ed out code for 'USE_BYTES_DOWNGRADES' case.	Nick Ing-Simmons	2001-03-12	1	-0/+4
\| \| \|	p4raw-id: //depot/perlio@9110
*	EBCDIC sanity - phase I	Nick Ing-Simmons	2001-03-10	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr) - use utf8n_xxxx (c.f. pvn) for forms which take length. - back out vN.N and $^V exceptions to e2a/a2e - make "locale" isxxx macros be uvchr (may be redundant?) Not clear yet that toUPPER_uni et. al. return being handled correctly. The tr// and rexexp stuff still needs an audit, assumption is they are working in Unicode space. Need to provide v5.6 names for XS modules (decide is uni or chr ?). p4raw-id: //depot/perlio@9096
*	Re: Unicode/EBCDIC	Peter Prymmer	2001-03-09	1	-0/+19
\| \| \| \| \|	Message-ID: <Pine.OSF.4.10.10103081617390.377472-100000@aspara.forte.com> p4raw-id: //depot/perl@9082
*	UTF-8 documentation.	Jarkko Hietaniemi	2001-02-11	1	-0/+16
\| \| \|	p4raw-id: //depot/perl@8770
*	Macrofy a magic UTF-8 test.	Jarkko Hietaniemi	2001-01-31	1	-0/+1
\| \| \|	p4raw-id: //depot/perl@8647
*	Unify UTF-8 malformedness handling.	Jarkko Hietaniemi	2001-01-05	1	-10/+12
\| \| \|	p4raw-id: //depot/perl@8323
*	Bump up Larry's copyright.	Jarkko Hietaniemi	2001-01-01	1	-1/+1
\| \| \|	p4raw-id: //depot/perl@8289
*	(Retracted by #8264) More join() testing which was good because	Jarkko Hietaniemi	2000-12-29	1	-3/+3
\| \| \| \| \|	it revealed a bug in #8248 (the UTF8_EIGHT_BIT_LO() was wrong). p4raw-id: //depot/perl@8249
*	(Retracted by #8264) Externally: join() was still quite UTF-8-unaware.	Jarkko Hietaniemi	2000-12-29	1	-5/+8
\| \| \| \| \| \| \| \| \|	Internally: sv_catsv() wasn't quite okay on UTF-8, it assumed that the only cases to care about are byte+byte and byte+character. TODO: See how well pp_concat() could be implemented in terms of sv_catsv(). p4raw-id: //depot/perl@8248
*	Use the UTF8 macros a bit. They can't be used with abandon	Jarkko Hietaniemi	2000-12-08	1	-0/+5
\| \| \| \| \| \|	everywhere because we do generate illegal UTF-8 in some situations. This is of course naughty. p4raw-id: //depot/perl@8033
*	Introduce macros for UTF8 decoding.	Jarkko Hietaniemi	2000-12-08	1	-1/+16
\| \| \|	p4raw-id: //depot/perl@8028
*	UINT64_C() work continues.	Jarkko Hietaniemi	2000-11-15	1	-2/+0
\| \| \|	p4raw-id: //depot/perl@7700
*	Use UINT64_C().	Jens Hamisch	2000-11-15	1	-1/+5
\| \| \| \| \| \|	Subject: [ID 20001114.006] 5.7.0-7680 Solaris 8, 64 bit, utf8 patch Message-Id: <20001114191623.G20559@Strawberry.COM> p4raw-id: //depot/perl@7691
*	[ID 20001113.003] utf8_to_uv on malformed utf returns wrong values	Yitzchak Scott-Thoennes	2000-11-14	1	-0/+2
\| \| \| \| \|	Message-Id: <200011132249.eADMnek09679@garcia.efn.org> p4raw-id: //depot/perl@7677
*	Allow poking holes at the UTF-8 decoding strictness.	Jarkko Hietaniemi	2000-10-25	1	-1/+12
\| \| \|	p4raw-id: //depot/perl@7438
*	Rename UTF8LEN() to be UNISKIP(), too confusing to have	Jarkko Hietaniemi	2000-10-25	1	-2/+2
\| \| \| \| \|	UTF8LEN() and UTF8SKIP(). p4raw-id: //depot/perl@7437
*	Make the UTF-8 decoding stricter and more verbose when	Jarkko Hietaniemi	2000-10-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
*	Make ~(chr(a).chr(b)) eq chr(~a).chr(~b) on utf8.	Simon Cozens	2000-10-15	1	-0/+18
\| \| \| \| \| \|	Subject: [PATCH] Re: [ID 20000918.005] ~ on wide chars Message-ID: <20001014205213.A9645@pembro4.pmb.ox.ac.uk> p4raw-id: //depot/perl@7235
*	Tweak #7153.	Jarkko Hietaniemi	2000-10-06	1	-2/+7
\| \| \|	p4raw-id: //depot/perl@7154