summaryrefslogtreecommitdiff
path: root/lib/locale.pm
Commit message (Collapse)AuthorAgeFilesLines
* Strengthen cautions about locale use with threadsKarl Williamson2016-04-081-1/+12
| | | | | This comes from our increased understanding of their perils, given ticket #127708
* lib/locale.pm: Fix so works on platforms without LC_CTYPEKarl Williamson2015-11-201-5/+4
| | | | | | | These may not actually exist in the wild, but it is better to be general. This also adds an XXX comment about future possibilities.
* Increment $VERSION in lib/locale.pm.James E Keenan2015-09-081-1/+1
|
* lib/locale.pm: Add an assertionKarl Williamson2015-09-081-2/+12
| | | | | | | | | It turns out that the code assumes that the values for LC_CTYPE, LC_MESSAGES, ... are small non-negative numbers, as a bit position is reserved for each of these. It's better to make this assumption explicit rather than getting hard-to-find failures. (LC_ALL doesn't have to be of this form, and is in fact -1 on AIX)
* Skip various locale tests when locales are not availableKarl Williamson2015-03-091-1/+1
| | | | | | | It is possible to compile Perl without locales, and some platforms may not have them available properly. These tests were failing under these conditions. This commit uses the new infrastructure in loc_tools.pl to centralize the knowledge of how to determine if locales are available.
* Add 'locale' warning categoryKarl Williamson2014-11-041-1/+1
| | | | | This category will be used in future commits for warnings that are entirely because of locale issues.
* Locale tests assumed POSIX, not true in minitest.Jarkko Hietaniemi2014-06-101-1/+6
|
* Add parameters to "use locale"Karl Williamson2014-06-051-20/+64
| | | | | | | This commit allows one to specify to enable locale-awareness for only a specified subset of the locale categories. Thus you could make a section of code LC_MESSAGES aware, with no locale-awareness for the other categories.
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-271-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* lib/locale.pm: Pod correctionsKarl Williamson2014-01-231-5/+6
|
* Allow 'use locale' on systems without locales.Karl Williamson2014-01-231-13/+3
| | | | | | | | | | | | | Instead of throwing an error, just go ahead and do the import. This will tell Perl internally to use the current underlying locale, which should be the C locale. Attempts to change the locale will fail. This differs slightly from Brian Fraser's patch, in that his didn't touch $^H, thus 'use locale' was a no-op. He has told me to apply this one, which does affect $^H. The advantage here is that now programs that are run on platforms with and without locales will behave similarly, and should run identically if the locale is not switched from the default.
* Enable perl core tests to pass when locale support is not available.Jess Robinson2013-02-091-1/+16
| | | | | | | | use locale - this will now die if $Config{d_setlocale} is not true. All tests that use locale will skip if $Config{d_setlocale} is not true. This enables us to pass tests on Android which uses ICU instead of locales. The committer removed trailing white space
* Add :not_characters parameter to 'use locale'Karl Williamson2012-01-211-2/+47
| | | | | This adds the parameter handling, tests, and documentation for this new feature which allows locale and Unicode to play well with each other.
* locale.pm: Pod tweaksKarl Williamson2012-01-211-6/+7
|
* Integrate change #10412 from maintperl; locale is nowJarkko Hietaniemi2001-06-031-1/+1
| | | | | | | | | | | | | | | per-cop, not per-op; plus retweak the locale.t to always list the skipped utf8 locales. p4raw-link: @10412 on //depot/maint-5.6/perl: 71d0b827413df9e881d1c54d2d968823ed50c75b p4raw-id: //depot/perl@10413 p4raw-edited: from //depot/maint-5.6/perl@10411 'edit in' t/pragma/locale.t (@8600..) p4raw-integrated: from //depot/maint-5.6/perl@10411 'merge in' lib/locale.pm (@5902..) opcode.h pp.sym pp_proto.h (@8620..) opcode.pl (@8998..) op.h perl.h (@9288..) pp_sys.c (@9524..) util.c (@9538..) embed.h (@9584..) op.c (@9950..) pp.c (@10091..) pp_ctl.c (@10100..)
* $VERSION crusade, strict, tests, etc... all over lib/Michael G. Schwern2000-12-061-0/+2
| | | | | | | Message-ID: <20001205212328.C6473@blackrider.aocn.com> Carp::Heavy parts not very applicable because of recent changes. p4raw-id: //depot/perl@8013
* make hints available via globals in the respective pragmas toGurusamy Sarathy2000-03-041-2/+4
| | | | | avoid duplicating the constants everywhere p4raw-id: //depot/perl@5527
* load base packages based on nonexistent $VERSIONAndreas König1999-05-051-0/+3
| | | | | | Message-ID: <sfcsob2m5ub.fsf@dubravka.in-berlin.de> Subject: Re: base.pm flaw p4raw-id: //depot/perl@3302
* [inseparable changes from patch from perl5.003_08 to perl5.003_09]Perl 5 Porters1996-11-261-0/+33
CORE LANGUAGE CHANGES Subject: Lexical locales From: Chip Salzenberg <chip@atlantic.net> Files: too many to list make effectiveness of locales depend on C<use locale> Subject: Lexical scoping cleanup From: Chip Salzenberg <chip@atlantic.net> Files: many... but mostly perly.y and toke.c tighten scoping of lexical variables, somewhat on the new constructs and somewhat on the old Subject: memory corruption / security bug in sysread,syswrite + patch Date: Mon, 25 Nov 1996 21:46:31 +0200 (EET) From: Jarkko Hietaniemi <jhi@cc.hut.fi> Files: MANIFEST pod/perldiag.pod pod/perlfunc.pod pp_sys.c t/op/sysio.t Msg-ID: <199611251946.VAA30459@alpha.hut.fi> (applied based on p5p patch as commit d7090df90a9cb89c83787d916e40d92a616b146d) DOCUMENTATION Subject: perldiag documentation patch. Date: Wed, 20 Nov 96 16:07:28 GMT From: Paul Marquess <pmarquess@bfsec.bt.co.uk> Files: pod/perldiag.pod private-msgid: <9611201607.AA12729@claudius.bfsec.bt.co.uk> Subject: a missing perldiag entry Date: Thu, 21 Nov 1996 15:24:02 -0500 From: Gurusamy Sarathy <gsar@engin.umich.edu> Files: pod/perldiag.pod private-msgid: <199611212024.PAA15758@aatma.engin.umich.edu> Subject: perlfunc patch Date: Wed, 20 Nov 96 14:04:08 GMT From: Paul Marquess <pmarquess@bfsec.bt.co.uk> Files: pod/perlfunc.pod Following on from the patch to make uc, lc etc default to $_ (as per Camel II), here is a followup patch to perlfunc that documents the change. I think I have documented all the other cases where $_ defaulting works as well. p5p-msgid: <9611201404.AA12477@claudius.bfsec.bt.co.uk> OTHER CORE CHANGES Subject: Properly prototype safe{malloc,calloc,realloc,free}. From: Chip Salzenberg <chip@atlantic.net> Files: proto.h Subject: UnixWare 2.1 fix for perl5.003_08 - cope with fp->_cnt < -1, allow debugging Date: Wed, 20 Nov 1996 14:27:06 +0100 From: John Hughes <john@AtlanTech.COM> Files: sv.c UnixWare 2.1 has no fp->_base so most of the debugging stuff in sv_gets just core dumps. Also, for some unknown reason fp->_cnt is sometimes < -1, screwing up the initial SvGROW in svgets. Appart from that its io is std. p5p-msgid: <01BBD6EE.E915C860@malvinas.AtlanTech.COM> Subject: die -> croak Date: Thu, 21 Nov 1996 16:11:21 -0500 From: Gurusamy Sarathy <gsar@engin.umich.edu> Files: pp_ctl.c private-msgid: <199611212111.QAA17070@aatma.engin.umich.edu> Subject: Cleanup of {,un}pack('w'). From: Chip Salzenberg <chip@atlantic.net> Files: pp.c Subject: Cleanups from Ilya. From: Chip Salzenberg <chip@atlantic.net> Files: gv.c malloc.c pod/perlguts.pod pp_ctl.c Subject: Fix for unpack('w') on 64-bit systems. From: Chip Salzenberg <chip@atlantic.net> Files: pp.c Subject: Re: LC_NUMERIC support is ready + performance Date: Mon, 25 Nov 1996 22:08:27 -0500 (EST) From: Ilya Zakharevich <ilya@math.ohio-state.edu> Files: sv.c Chip Salzenberg writes: > > Having thought about the use of our own gcvt() and atof(), I've run > away in horror. It's just too hairy. > > So I've implemented the only viable alternative I know of: Toggling > LC_NUMERIC to/from "C" as needed. > > Patch follows. > > I think _09 is *very* close. Since _09 is going to be alpha anyway, I reiterate my question: Is there any reason to not include my hash/array performance patches in _09? Btw, here is the next performance patch. It makes PADTMP values stealable too. I do not do by setting TEMP flags on them, since it would be a very distributed patch, and it would break some places which check for TEMP for some other reasons (yes, I checked ;-). This patch decreases *twice* the memory usage of perl -e '$a = "a" x 1e6; 1' Enjoy, p5p-msgid: <199611260308.WAA02677@monk.mps.ohio-state.edu> Subject: Hash key sharing improvements from Ilya. From: Chip Salzenberg <chip@atlantic.net> Files: hv.c hv.h proto.h Subject: Mortal stack pre-allocation from Ilya. From: Chip Salzenberg <chip@atlantic.net> Files: pp.c pp.h pp_ctl.c pp_hot.c pp_sys.c PORTABILITY Subject: VMS patches post-5.003_08 Date: Fri, 22 Nov 1996 18:16:31 -0500 (EST) From: Charles Bailey <bailey@hmivax.humgen.upenn.edu> Files: lib/ExtUtils/MM_Unix.pm lib/ExtUtils/MM_VMS.pm lib/ExtUtils/MakeMaker.pm lib/File/Path.pm mg.c pp_ctl.c utils/h2xs.PL vms/config.vms vms/descrip.mms vms/gen_shrfls.pl vms/genconfig.pl vms/perlvms.pod vms/vms.c vms/vmsish.h Here're diffs to bring a base 5.003_08 up to the current VMS working sources. Nearly all of the changes are VMS-specific, and comprise miscellaneous bugfixes accumulated since 5.003_07, rather than any particular problem with 5.003_08. I'm posting them here since some of the patches change core files, and I'd like to insure that I haven't accidentally created problems for anyone else. With these and a couple of of the small patches already send to p5p, 5.003_08 builds clean and passes all tests under VMS. Thanks, Chip, for all the work. p5p-msgid: <1996Nov22.181631.1603238@hmivax.humgen.upenn.edu>