| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- canonical UTF-8 hash keys: if a key string for a hash is
UTF8-on, try downgrade the string and use it if
unicode::distinct is not in effect.
For the task, I added a function bytes_from_utf8() to utf8.c.
It might resemble utf8_to_bytes() but it is not convenient
to the task.
Made a test for it and added to t/op/each.t
- Changed do_print in doio.c to apply sv_utf8_(downgrade|upgrade) to
the mortal copy of the argument SV.
And changed t/io/utf8.t test 18 which expects print() to
upgrade its argument.
- re-implement sv_eq with bytes_from_utf8()
- some bug fixes
- tr/// does not handle UTF8 range (\x{}-\x{})
- \ before raw UTF8 character produced
"Malformed UTF-8 character" warning.
- "\x{100}\N{CENT SIGN}" is Malformed.
Added tests for these 3.
- and one silly bug (by me) with qu operator.
p4raw-id: //depot/perl@8583
|
|
|
| |
p4raw-id: //depot/perl@8564
|
|
|
|
|
| |
Message-ID: <20010122021722.A9334@pembro26.pmb.ox.ac.uk>
p4raw-id: //depot/perl@8562
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- The substr lval was still not okay.
- Now pp_stringify and sv_setsv copies source's UTF8 flag
even if IN_BYTE. pp_stringify is called from fold_constants
at optimization phase and "\x{100}" was made SvUTF8_off under
use bytes (the bytes pragma is for "byte semantics" and not
for "do not produce UTF8 data")
- New `qu' operator to generate UTF8 string explicitly.
Though I agree with the policy "0x00-0xff always produce bytes",
sometimes want to such a string to be coded in UTF8.
I can use pack"U0a*" but it requires more typing and has
runtime overhead.
- Fix pp_regcomp bug uncovered by "0x00-0xff always produce bytes"
change, the bug appears if a pm has PMdf_UTF8 flag but interpolated
string is not UTF8_on and has char 0x80-0xff.
TODO: document and test qu.
p4raw-id: //depot/perl@8439
|
|
|
|
|
| |
Message-Id: <200101122003.UAA29599@tempest.npl.co.uk>
p4raw-id: //depot/perl@8425
|
|
|
|
|
| |
Message-ID: <14941.16925.736415.785818@soda.csua.berkeley.edu>
p4raw-id: //depot/perl@8417
|
|
|
|
|
| |
Message-ID: <5930DC161690D2119667009027157547038123E1@madt009a.siemens.es>
p4raw-id: //depot/perl@8413
|
|
|
|
|
|
|
| |
Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp>
UTF-8 fixes for 'x' and tr///.
p4raw-id: //depot/perl@8378
|
|
|
|
|
| |
bypassed by control flow.
p4raw-id: //depot/perl@8343
|
|
|
| |
p4raw-id: //depot/perl@8341
|
|
|
|
|
| |
raw 8-bit form to the UTF-8 string.
p4raw-id: //depot/perl@8330
|
|
|
|
|
|
|
|
| |
Message-ID: <5930DC161690D211966700902715754703738F96@madt009a.siemens.es>
UTF-8 parsing fix that seems to be needed for EBCDIC, in ASCII
no effect. (changed the strncpy() to Copy())
p4raw-id: //depot/perl@8329
|
|
|
| |
p4raw-id: //depot/perl@8328
|
|
|
| |
p4raw-id: //depot/perl@8323
|
|
|
|
|
|
|
| |
Message-ID: <5930DC161690D21196670090271575470370111A@madt009a.siemens.es>
The toke.c part only, patching embed.h and proto.h is futile.
p4raw-id: //depot/perl@8306
|
|
|
|
|
| |
Message-ID: <5930DC161690D211966700902715754703738AA6@madt009a.siemens.es>
p4raw-id: //depot/perl@8305
|
|
|
| |
p4raw-id: //depot/perl@8289
|
|
|
| |
p4raw-id: //depot/perlio@8272
|
|
|
|
|
| |
Message-ID: <20001227141244.A13344@deep-dark-truthful-mirror.perlhacker.org>
p4raw-id: //depot/perl@8239
|
|
|
|
|
|
| |
too much hassle (the interpret -Q as a function
where Q is not a known filetest part is left in).
p4raw-id: //depot/perl@8084
|
|
|
|
|
| |
was only testing this_utf8.
p4raw-id: //depot/perlio@8053
|
|
|
|
|
| |
i.e. the output string has one, but don't mess with source assumption.
p4raw-id: //depot/perlio@8052
|
|
|
| |
p4raw-id: //depot/perl@7984
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
[ 7971]
Quieten some noise in Win32 builds:
- win32.h is included after <sys/socket.h>, so need to
set Win32SCK_IS_STDSCK earlier to avoid re-defined noise in XSUB.h
- GCC (& MSVC?) have execv(...,const char *const *) so need a cast from char **.
[ 7970]
PERL_IMPLICIT_SYS compiles but does not work.
p4raw-link: @7971 on //depot/perlio: b4748376b6239962bd75b743e5a7b14788a2970c
p4raw-link: @7970 on //depot/perlio: adb71456d0ff53391c88789f315f1e66b14373d5
p4raw-id: //depot/perl@7972
|
| |
| |
| |
| |
| |
| |
| | |
- win32.h is included after <sys/socket.h>, so need to
set Win32SCK_IS_STDSCK earlier to avoid re-defined noise in XSUB.h
- GCC (& MSVC?) have execv(...,const char *const *) so need a cast from char **.
p4raw-id: //depot/perlio@7971
|
|/
|
|
|
| |
always use (at least) UTF8_MAXLEN + 1 U8s deep buffer.
p4raw-id: //depot/perl@7967
|
|
|
|
|
|
|
| |
Message-Id: <E142GRN-0003go-00@libra.cus.cam.ac.uk>
An extraneous argument.
p4raw-id: //depot/perl@7958
|
|
|
|
|
| |
"Ambiguous -f() resolved as a file test ..."
p4raw-id: //depot/perl@7944
|
|
|
|
|
| |
that also breaks using them as methods.
p4raw-id: //depot/perl@7943
|
|
|
|
|
| |
Reserve the short named string operator names.
p4raw-id: //depot/perl@7941
|
|
|
|
|
| |
Message-Id: <200011301427.OAA00030@tempest.npl.co.uk>
p4raw-id: //depot/perl@7935
|
|
|
|
|
| |
Message-ID: <20001129141545.A30864@pembro33.pmb.ox.ac.uk>
p4raw-id: //depot/perl@7916
|
|
|
| |
p4raw-id: //depot/perl@7816
|
|
|
|
|
| |
Message-Id: <200011132249.eADMnek09679@garcia.efn.org>
p4raw-id: //depot/perl@7677
|
|
|
|
|
|
| |
Subject: [PATCH] prototyped functions that should be overrideable
Message-ID: <Pine.OSF.4.21.0011031100470.17471-100000@home.kiski.net>
p4raw-id: //depot/perl@7600
|
|
|
| |
p4raw-id: //depot/perl@7582
|
|
|
|
|
|
|
|
|
|
| |
Subject: [ID 20000728.005] perl -P broken
Message-Id: <200007290019.RAA08484@dd.tc.fluke.com>
(hopefully). The fix is also not complete, it seems to break
BOM swallowing for libc5 systems, but until someone figures
out a way to do this without ftell(), this will do.
p4raw-id: //depot/perl@7570
|
|
|
|
|
| |
for a missing "use charnames" when using the \N{...}.
p4raw-id: //depot/perl@7557
|
|
|
|
|
|
| |
Subject: Re: \x{...} is confused
Message-ID: <20001029193648.A6287@pembro4.pmb.ox.ac.uk>
p4raw-id: //depot/perl@7485
|
|
|
| |
p4raw-id: //depot/perl@7465
|
|
|
|
|
|
|
|
| |
Rename utf8_to_uv_chk() back to utf8_to_uv() because it's
used much more than the simpler API, now called utf8_to_uv_simple().
Still not quite happy with API, too much partial duplication
of functionality.
p4raw-id: //depot/perl@7439
|
|
|
| |
p4raw-id: //depot/perl@7438
|
|
|
|
|
|
|
|
|
|
|
|
| |
malformation happens. This involved adding an argument
to utf8_to_uv_chk(), which involved changing its prototype,
and prefer STRLEN over I32 for the UTF-8 length, which as
a domino effect necessitated changing the prototypes of
scan_bin(), scan_oct(), scan_hex(), and reg_uni().
The stricter UTF-8 decoding checking uses Markus Kuhn's
UTF-8 Decode Stress Tester from
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
p4raw-id: //depot/perl@7416
|
|
|
|
|
|
| |
Subject: [PATCH perl@7229] Rentrant parser and yylex()
Message-ID: <5930DC161690D211966700902715754702DA09CD@madt009a.siemens.es>
p4raw-id: //depot/perl@7382
|
|
|
|
|
|
| |
Subject: [PATCH perl@7229] Rentrant parser and yylex()
Message-ID: <5930DC161690D211966700902715754702DA09CD@madt009a.siemens.es>
p4raw-id: //depot/perl@7381
|
|
|
| |
p4raw-id: //depot/perl@7224
|
|
|
|
|
| |
Message-Id: <m3aed9ybrm.fsf@eik.g.aas.no>
p4raw-id: //depot/perl@7098
|
|
|
|
|
|
|
| |
i.e. rename Simon's function to Perl_utf8_to_uv_chk, change all calls to it
to use new name and add Perl_utf8_to_uv() as a wrapper which calls it passing
0 to checking to get the warning.
p4raw-id: //depot/perl@7096
|
|
|
| |
p4raw-id: //depot/perl@7093
|
|
|
|
|
| |
Message-Id: <200009142306.TAA20082@leggy.zk3.dec.com>
p4raw-id: //depot/perl@7090
|