summaryrefslogtreecommitdiff
path: root/pod/perllocale.pod
Commit message (Collapse)AuthorAgeFilesLines
* Remove documentation references to recent Configure taint changesPaul "LeoNerd" Evans2022-05-191-1/+0
|
* Doc changes to reflect that perl might not support taintNeil Bowers2022-04-201-5/+11
| | | | | | | | The central doc change is in perlsec.pod. This not only explains that you can build a perl that doesn't support taint, but shows how you can check whether your perl supports taint or not. The other doc changes are mainly to note that taint might not be supported, and to refer the reader to perlsec for more details.
* perllocale: Add note about z/OS special behaviorKarl Williamson2022-03-281-3/+6
|
* perllocale: Formatting, grammarKarl Williamson2022-03-281-7/+7
|
* perllocale: Remove stray markupKarl Williamson2020-12-061-1/+1
|
* Fix typosSamanta Navarro2020-10-031-1/+1
| | | | | | | | | For: https://github.com/Perl/perl5/pull/18201 Committer: Samanta Navarro is now a Perl author. To keep 'make test_porting' happy: Increment $VERSION in several files. Regenerate uconfig.h via './perl -Ilib regen/uconfig_h.pl'.
* Fix a bunch of repeated-word typosDagfinn Ilmari Mannsåker2020-05-221-1/+1
| | | | | Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
* typo "set_locale" should be "setlocale"Paul Marquess2020-05-121-1/+1
| | | | | (Committer changed author address to the already-known @cpan one, so this would pass porting tests.)
* Update links to perlrun to link to specific itemsDan Book2020-01-281-3/+4
|
* perllocale: Tweak cautionary textKarl Williamson2019-12-231-3/+4
|
* Update documentation, readmes, comments, and utilities to reference the ↵Dan Book2019-12-221-3/+5
| | | | | | GitHub issue tracker The perlbug utility and perlbug@perl.org should no longer be used to submit bug reports or patches.
* perllocale: Clarify textKarl Williamson2019-12-201-1/+3
|
* Unicode.org is https, except for http://cldr.unicode.orgMax Maischein2019-10-111-3/+3
|
* (perl #134187) how do we know it's a Turkic localeTony Cook2019-07-011-3/+7
| | | | Not by name.
* perllocale: Use L</Foo Bar>, not L<Foo Bar>Karl Williamson2019-05-241-1/+1
|
* docs (incl. perldelta) for aa8a2baafacKarl Williamson2019-03-061-0/+7
|
* Docs for new Turkic UTF-8 locale supportKarl Williamson2019-02-051-3/+8
|
* Allow forcing use of POSIX 2008 locale fcnsKarl Williamson2018-11-191-0/+5
| | | | | | | These thread-safe functions are not normally used on unthreaded builds, retaining the use of the library functions that have long been used. But, it is now possible to tell Configure to use them on unthreaded builds.
* perllocale: Improve pod appearanceKarl Williamson2018-07-161-17/+4
| | | | Using Z<> instead of NBSP improves the pod spacing.
* perllocale: Update, clarifyKarl Williamson2018-03-121-4/+3
|
* Add thread-safe locale handlingKarl Williamson2018-02-181-21/+117
| | | | | | This (large) commit allows locales to be used in threaded perls on platforms that support it. This includes recent Windows and Posix 2008 ones.
* perllocale: Wording/formatting nitsKarl Williamson2018-01-141-3/+3
|
* perllocale: Fix typoKarl Williamson2018-01-141-1/+1
|
* I18N-Langinfo: Use new fcn Perl_langinfo()Karl Williamson2017-09-091-0/+2
| | | | | This automatically fixes the bug where it always returned a dot for the decimal point character.
* perllocale: Rmv links to obsolete documentsKarl Williamson2017-08-121-5/+2
|
* perllocale: Clarifications, corrections, and nitsKarl Williamson2017-07-141-10/+13
|
* pod/perllocale: Add caution about incompatible localesKarl Williamson2016-10-261-5/+6
| | | | | Some locales aren't compatible with Perl. Note the potential bad consequences of using them.
* pod/*: remove deprecated L<"section"> and L<section> syntaxLukas Mai2016-06-111-17/+17
|
* Do better locale collation in UTF-8 localesKarl Williamson2016-05-241-15/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some platforms, the libc strxfrm() works reasonably well on UTF-8 locales, giving a default collation ordering. It will assume that every string passed to it is in UTF-8. This commit changes Perl to make sure that strxfrm's expectations are met. Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8 string. And this commit makes sure of that as well. So, simply meeting strxfrm's expectations allows Perl to start supporting default collation in UTF-8 locales, and fixes it to work on single-byte locales with UTF-8 input. (Unicode::Collate provides tailorable functionality and is portable to platforms where strxfrm isn't as intelligent, but is a much more heavy-weight solution that may not be needed for particular applications.) There is a problem in non-UTF-8 locales if the passed string contains code points representable only in UTF-8. This commit causes them to be changed, before being passed to strxfrm, into the highest collating character in the locale that doesn't require UTF-8. They then will sort the same as that character, which means after all other characters in the locale but that one. In strings that don't have that character, this will generally provide exactly correct operation. There still is a problem, if that character, in the given locale, combines with adjacent characters to form a specially weighted sequence. Then, the change of these above-255 code points into that character can skew the results. See the commit message for 6696cfa7cc3a0e1e0eab29a11ac131e6f5a3469e for more on this. But it is really an illegal situation to have above-255 code points in a single-byte locale, so this behavior is a reasonable degradation when given illegal input. If two transformed strings compare exactly equal, Perl already uses the un-transformed versions to break ties, and there, these faked-up strings will collate so the above-255 code points sort after everything else, and in code point order amongst themselves.
* perllocale: Change headings so two aren't identicalKarl Williamson2016-05-241-2/+2
| | | | | Two html anchors in this pod were identical, which isn't a problem unless you try to link to one of them, as the next commit does
* Change calculation of locale collation coefficientsKarl Williamson2016-05-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Every time a new collation locale is set, two coefficients are calculated that are used in predicting how much space is needed in the transformation of a string by strxfrm(). The transformed string is roughly linear with the the length of the input string, so we are calcaulating 'm' and 'b' such that transformed_length = m * input_length + b Space is allocated based on this prediction. If it is too small, the strxfrm() will fail, and we will have to increase the allotted amount and try again. It's better to get the prediction right to avoid multiple, expensive strxfrm() calls. Prior to this commit, the calculation was not rigorous, and failed on some platforms that don't have a fully conforming strxfrm(). This commit changes to not panic if a locale has an apparent defective collation, but instead silently change to use C-locale collation. It could be argued that a warning should additionally be raised. This commit fixes [perl #121734].
* Change mem_collxfrm() algorithm for embedded NULsKarl Williamson2016-05-241-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problems in implementing Perl is that the C library routines forbid embedded NUL characters, which Perl accepts. This is true for the case of strxfrm() which handles collation under locale. The best solution as far as functionality goes, would be for Perl to write its own strxfrm replacement which would handle the specific needs of Perl. But that is not going to happen because of the huge complexity in handling it across many platforms. We would have to know the location and format of the locale definition files for every such platform. Some might follow POSIX guidelines, some might not. strxfrm creates a transformation of its input into a new string consisting of weight bytes. In the typical but general case, a 3 character NUL-terminated input string 'A B C 00' (spaces added for readability) gets transformed into something like: A¹ B¹ C¹ 01 A² B² C² 01 A³ B³ C³ 00 where the superscripted characters are weights for the corresponding input characters. Superscript 1 represents (essentially) the primary sorting key; 2, the secondary, etc, for as many levels as the locale definition gives. The 01 byte is likely to be the separator between levels, but not necessarily, and there could be some other mechanisms used on various platforms. To handle embedded NULs, the simplest thing would be to just remove them before passing in to strxfrm(). Then they would be entirely ignored, which might not be what you want. You might want them to have some weight at the tertiary level, for example. It also causes problems because strxfrm is very context sensitive. The locale definition can define weights for specific sequences of any length (and the weights can be multi-byte), and by removing a NUL, two characters now become adjacent that weren't in the input, and they could now form one of those special sequences and thus throw things off. Another way to handle NULs, that seemingly ignores them, but actually doesn't, is the mechanism in use prior to this commit. The input string is split at the NULs, and the substrings are independently passed to strxfrm, and the results concatenated together. This doesn't work either. In our example 'A B C 00', suppose B is a NUL, and should have some weight at the tertiary level. What we want is: A¹ C¹ 01 A² C² 01 A³ B³ C³ 00 But that's not at all what you get. Instead it is: A¹ 01 A² 01 A³ C¹ 01 C² 01 C³ 00 The primary weight of C comes immediately after the teriary weight of A, but more importantly, a NUL, instead of being ignored at the primary levels, is significant at all levels, so that "a\0c" would sort before "ab". Still another possibility is to replace the NUL with some other character before passing it to strxfrm. That was my original plan, to replace each NUL with the character that this code determines has the lowest collation order for the current locale. On strings that don't contain that character, the results would be as good as it gets for that locale. That character is likely to be ignored at higher weight levels, but have some small non-ignored weight at the lowest ones. And hopefully the character would rarely be encountered in practice. When it does happen, it and NUL would sort identically; hardly the end of the world. If the entire strings sorted identically, the NUL-containing one would come out before the other one, since the original Perl strings are used as a tie breaker. However, testing showed a problem with this. If that other character is part of a sequence that has special weighting, the results won't be correct. With gcc, U+00B4 ACUTE ACCENT is the lowest collating character in many UTF-8 locales. It combines in Romanian and Vietnamese with some other characters to change weights, and hence changing NULs into U+B4 screws things up. What I finally have come to is to do is a modification of this final approach, where the possible NUL replacements are limited to just characters that are controls in the locale. NULs are replaced by the lowest collating control. It would really be a defective locale if this control combined with some other character to form a special sequence. Often the character will be a 01, START OF HEADING. In the very unlikely case that there are absolutely no controls in the locale, 01 is used, because we have to replace it with something. The code added by this commit is mostly utf8-ready. A few commits from now will make Perl properly work with UTF-8 (if the platform supports it). But until that time, this isn't a full implementation; it only looks for the lowest-sorting control that is invariant, where the the UTF8ness doesn't matter. The added tests are marked as TODO until then.
* perllocale: Document NUL collation handlingKarl Williamson2016-05-241-0/+10
| | | | And add a TODO test, because this shortly will be improved upon
* perllocale: Unicode has changed their data; fix referencesKarl Williamson2016-04-121-5/+11
| | | | We say something here that is no longer true; update it.
* Strengthen cautions about locale use with threadsKarl Williamson2016-04-081-4/+16
| | | | | This comes from our increased understanding of their perils, given ticket #127708
* perllocale: Nits, update for 5.24 changesKarl Williamson2016-03-111-25/+5
|
* Small typographical corrections to documentation.SHIRAKATA Kentaro2015-04-181-2/+2
| | | | Commit modifies 4 of 5 files in patch submitted by author in RT #124335.
* perllocale: Update for EBCDICKarl Williamson2015-04-031-3/+6
|
* perllocale: NitKarl Williamson2015-04-031-2/+2
|
* perllocale: Correctly document behaviorKarl Williamson2015-03-191-4/+15
|
* [perl #123814] replace grok_atou with grok_atoUVHugo van der Sanden2015-03-091-4/+3
| | | | | | | | | | | | Some questions and loose ends: XXX gv.c:S_gv_magicalize - why are we using SSize_t for paren? XXX mg.c:Perl_magic_set - need appopriate error handling for $) XXX regcomp.c:S_reg - need to check if we do the right thing if parno was not grokked Perl_get_debug_opts should probably return something unsigned; not sure if that's something we can change.
* Corrections to spelling and grammatical errors.Lajos Veres2015-01-281-1/+1
| | | | Extracted from patch submitted by Lajos Veres in RT #123693.
* Add documentation for /n (non-capture) regexp flag.Matthew Horsfall2014-12-301-1/+1
|
* Raise warning on multi-byte char in single-byte localeKarl Williamson2014-12-291-0/+5
| | | | | | | | | See http://nntp.perl.org/group/perl.perl5.porters/211909 Something is quite likely wrong with the logic if say in a Greek locale, Unicode characters (especially Greek ones) are encountered. The same character will be represented by two different code points. This warning alerts the user to this undesirable state of affairs.
* perllocale: NitsKarl Williamson2014-12-291-11/+14
|
* Reinstate "Raise warnings for poorly supported locales"Karl Williamson2014-11-141-2/+19
| | | | | This reverts commit 1244bd171b8d1fd4b6179e537f7b95c38bd8f099, thus reinstating commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc.
* Revert "Raise warnings for poorly supported locales"Karl Williamson2014-11-041-19/+2
| | | | | | | This reverts commit 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc. Win32 with a 1252 code page was failing blead. Revert until I have time to look at it.
* perllocale: Nits and clarificationsKarl Williamson2014-11-041-9/+18
|
* Raise warnings for poorly supported localesKarl Williamson2014-11-041-2/+19
| | | | | | | | | Perl only supports single-byte locales (except for UTF-8 ones), and has poor support for 7-bit locales that aren't supersets of ASCII (these should be exceedingly rare these days). This commit raises warnings in the new locale warning category when such a locale is entered.
* perllocale: NitKarl Williamson2014-09-191-1/+1
|