summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDominic Dunlop <domo@slipper.ip.lu>1996-12-28 10:56:41 +0100
committerChip Salzenberg <chip@atlantic.net>1997-01-01 08:59:00 +1200
commita034a98d8bfd0fd904012bd5227ce209aaaa0b26 (patch)
treee2afccfa1b455a13bf9d8e0ec521987c49b5c07c
parente38874e2f3f61264e6d7b5d69540cdd51724e623 (diff)
downloadperl-a034a98d8bfd0fd904012bd5227ce209aaaa0b26.tar.gz
Locale-related pod patches, take 2
[Ahem. Had the wrong thing in the scratch-pad, didn't I? Please ignore my previous full posting of a slightly-tweaked perllocale.pod. This mail contains what I really meant to send.] Herewith (quick, before _18 appears) locale-related patches to the documentation in perl5.003_17/pod. The main effect is to add locale-related information to pods other than perllocale.pod, although there are some tiny tweaks to that pod too. Produces no complaints from pod2man; not checked for layout since 5.003_13. p5p-msgid: <v03007800aeea9e488b36@[194.51.248.77]>
-rw-r--r--pod/perl.pod4
-rw-r--r--pod/perlform.pod21
-rw-r--r--pod/perlfunc.pod29
-rw-r--r--pod/perlop.pod10
-rw-r--r--pod/perlre.pod14
-rw-r--r--pod/perlsec.pod32
6 files changed, 82 insertions, 28 deletions
diff --git a/pod/perl.pod b/pod/perl.pod
index 76a0f269fe..7ac7094f57 100644
--- a/pod/perl.pod
+++ b/pod/perl.pod
@@ -267,8 +267,8 @@ directory. If PERL5LIB is defined, PERLLIB is not used.
=back
-Perl also has environment variables that control how Perl handles
-language-specific data. Please consult L<perllocale>.
+Perl also has environment variables that control how Perl handles data
+specific to particular natural languages. See L<perllocale>.
Apart from these, Perl uses no other environment variables, except
to make them available to the script being executed, and to child
diff --git a/pod/perlform.pod b/pod/perlform.pod
index 4fac1a69e3..b11936b534 100644
--- a/pod/perlform.pod
+++ b/pod/perlform.pod
@@ -72,7 +72,14 @@ separated by commas. The expressions are all evaluated in a list context
before the line is processed, so a single list expression could produce
multiple list elements. The expressions may be spread out to more than
one line if enclosed in braces. If so, the opening brace must be the first
-token on the first line.
+token on the first line. If an expression evaluates to a number with a
+decimal part, and if the corresponding picture specifies that the decimal
+part should appear in the output (that is, any picture except multiple "#"
+characters B<without> an embedded "."), the character used for the decimal
+point is B<always> determined by the current LC_NUMERIC locale. This
+means that, if, for example, the run-time environment happens to specify a
+German locale, "," will be used instead of the default ".". See
+L<perllocale> and L<"WARNINGS"> for more information.
Picture fields that begin with ^ rather than @ are treated specially.
With a # field, the field is blanked out if the value is undefined. For
@@ -306,10 +313,20 @@ is to printf(), do this:
END
print $string;
-=head1 WARNING
+=head1 WARNINGS
Lexical variables (declared with "my") are not visible within a
format unless the format is declared within the scope of the lexical
variable. (They weren't visible at all before version 5.001.) Furthermore,
lexical aliases will not be compiled correctly: see
L<perlfunc/my> for other issues.
+
+Formats are the only part of Perl which unconditionally use information
+from a program's locale; if a program's environment specifies an
+LC_NUMERIC locale, it is always used to specify the decimal point
+character in formatted output. Perl ignores all other aspects of locale
+handling unless the C<use locale> pragma is in effect. Formatted output
+cannot be controlled by C<use locale> because the pragma is tied to the
+block structure of the program, and, for historical reasons, formats
+exist outside that block structure. See L<perllocale> for further
+discussion of locale handling.
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index fe3da14929..c39dd29298 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -1544,7 +1544,7 @@ C<continue> block, if any, is not executed:
Returns an lowercased version of EXPR. This is the internal function
implementing the \L escape in double-quoted strings.
-Should respect any POSIX setlocale() settings.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses $_.
@@ -1554,7 +1554,7 @@ If EXPR is omitted, uses $_.
Returns the value of EXPR with the first character lowercased. This is
the internal function implementing the \l escape in double-quoted strings.
-Should respect any POSIX setlocale() settings.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses $_.
@@ -2106,8 +2106,10 @@ you will have to use a block returning its value instead:
=item printf FORMAT, LIST
-Equivalent to a "print FILEHANDLE sprintf(FORMAT, LIST)". The first argument
-of the list will be interpreted as the printf format.
+Equivalent to C<print FILEHANDLE sprintf(FORMAT, LIST)>. The first argument
+of the list will be interpreted as the printf format. If C<use locale> is
+in effect, the character used for the decimal point in formatted real numbers
+is affected by the LC_NUMERIC locale. See L<perllocale>.
=item prototype FUNCTION
@@ -2141,8 +2143,11 @@ Generalized quotes. See L<perlop>.
=item quotemeta
-Returns the value of EXPR with with all regular expression
-metacharacters backslashed. This is the internal function implementing
+Returns the value of EXPR with with all non-alphanumeric
+characters backslashed. (That is, all characters not matching
+C</[A-Za-z_0-9]/> will be preceded by a backslash in the
+returned string, regardless of any locale settings.)
+This is the internal function implementing
the \Q escape in double-quoted strings.
If EXPR is omitted, uses $_.
@@ -2674,6 +2679,9 @@ the subroutine not via @_ but as the package global variables $a and
$b (see example below). They are passed by reference, so don't
modify $a and $b. And don't try to declare them as lexicals either.
+When C<use locale> is in effect, C<sort LIST> sorts LIST according to the
+current collation locale. See L<perllocale>.
+
Examples:
# sort lexically
@@ -2888,7 +2896,10 @@ Returns a string formatted by the usual printf conventions of the C
language. See L<sprintf(3)> or L<printf(3)> on your system for details.
(The * character for an indirectly specified length is not
supported, but you can get the same effect by interpolating a variable
-into the pattern.) Some C libraries' implementations of sprintf() can
+into the pattern.) If C<use locale> is
+in effect, the character used for the decimal point in formatted real numbers
+is affected by the LC_NUMERIC locale. See L<perllocale>.
+Some C libraries' implementations of sprintf() can
dump core when fed ludicrous arguments.
=item sqrt EXPR
@@ -3243,7 +3254,7 @@ on your system.
Returns an uppercased version of EXPR. This is the internal function
implementing the \U escape in double-quoted strings.
-Should respect any POSIX setlocale() settings.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses $_.
@@ -3253,7 +3264,7 @@ If EXPR is omitted, uses $_.
Returns the value of EXPR with the first character uppercased. This is
the internal function implementing the \u escape in double-quoted strings.
-Should respect any POSIX setlocale() settings.
+Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>.
If EXPR is omitted, uses $_.
diff --git a/pod/perlop.pod b/pod/perlop.pod
index a75cb4947d..a8f34c0e57 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -290,6 +290,9 @@ to the right argument.
Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise
less than, equal to, or greater than the right argument.
+"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
+by the current locale if C<use locale> is in effect. See L<perllocale>.
+
=head2 Bitwise And
Binary "&" returns its operators ANDed together bit by bit.
@@ -580,6 +583,9 @@ are interpolated, as are the following sequences:
\E end case modification
\Q quote regexp metacharacters till \E
+If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
+and <\U> is taken from the current locale. See L<perllocale>.
+
Patterns are subject to an additional level of interpretation as a
regular expression. This is done as a second pass, after variables are
interpolated, so that regular expressions may be incorporated into the
@@ -619,6 +625,8 @@ C<!~> operator, the $_ string is searched. (The string specified with
C<=~> need not be an lvalue--it may be the result of an expression
evaluation, but remember the C<=~> binds rather tightly.) See also
L<perlre>.
+See L<perllocale> for discussion of additional considerations which apply
+when C<use locale> is in effect.
Options are:
@@ -769,6 +777,8 @@ at run-time. If you want the pattern compiled only once the first time
the variable is interpolated, use the C</o> option. If the pattern
evaluates to a null string, the last successfully executed regular
expression is used instead. See L<perlre> for further explanation on these.
+See L<perllocale> for discussion of additional considerations which apply
+when C<use locale> is in effect.
Options are:
diff --git a/pod/perlre.pod b/pod/perlre.pod
index ce054ec448..12f9f51016 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -19,6 +19,9 @@ the regular expression inside. These are:
Do case-insensitive pattern matching.
+If C<use locale> is in effect, the case map is taken from the current
+locale. See L<perllocale>.
+
=item m
Treat string as multiple lines. That is, change "^" and "$" from matching
@@ -136,6 +139,9 @@ also work:
\E end case modification (think vi)
\Q quote regexp metacharacters till \E
+If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
+and <\U> is taken from the current locale. See L<perllocale>.
+
In addition, Perl defines the following:
\w Match a "word" character (alphanumeric plus "_")
@@ -146,9 +152,11 @@ In addition, Perl defines the following:
\D Match a non-digit character
Note that C<\w> matches a single alphanumeric character, not a whole
-word. To match a word you'd need to say C<\w+>. You may use C<\w>,
-C<\W>, C<\s>, C<\S>, C<\d>, and C<\D> within character classes (though not
-as either end of a range).
+word. To match a word you'd need to say C<\w+>. If C<use locale> is in
+effect, the list of alphabetic characters generated by C<\w> is taken
+from the current locale. See L<perllocale>. You may use C<\w>, C<\W>,
+C<\s>, C<\S>, C<\d>, and C<\D> within character classes (though not as
+either end of a range).
Perl defines the following zero-width assertions:
diff --git a/pod/perlsec.pod b/pod/perlsec.pod
index 2b6972701f..69de8592b6 100644
--- a/pod/perlsec.pod
+++ b/pod/perlsec.pod
@@ -30,14 +30,14 @@ program more secure than the corresponding C program.
You may not use data derived from outside your program to affect something
else outside your program--at least, not by accident. All command-line
-arguments, environment variables, and file input are marked as "tainted".
-Tainted data may not be used directly or indirectly in any command that
-invokes a sub-shell, nor in any command that modifies files, directories,
-or processes. Any variable set within an expression that has previously
-referenced a tainted value itself becomes tainted, even if it is logically
-impossible for the tainted value to influence the variable. Because
-taintedness is associated with each scalar value, some elements of an
-array can be tainted and others not.
+arguments, environment variables, locale information (see L<perllocale>),
+and file input are marked as "tainted". Tainted data may not be used
+directly or indirectly in any command that invokes a sub-shell, nor in any
+command that modifies files, directories, or processes. Any variable set
+within an expression that has previously referenced a tainted value itself
+becomes tainted, even if it is logically impossible for the tainted value
+to influence the variable. Because taintedness is associated with each
+scalar value, some elements of an array can be tainted and others not.
For example:
@@ -107,10 +107,10 @@ mechanism is by referencing sub-patterns from a regular expression match.
Perl presumes that if you reference a substring using $1, $2, etc., that
you knew what you were doing when you wrote the pattern. That means using
a bit of thought--don't just blindly untaint anything, or you defeat the
-entire mechanism. It's better to verify that the variable has only
-good characters (for certain values of "good") rather than checking
-whether it has any bad characters. That's because it's far too easy to
-miss bad characters that you never thought of.
+entire mechanism. It's better to verify that the variable has only good
+characters (for certain values of "good") rather than checking whether it
+has any bad characters. That's because it's far too easy to miss bad
+characters that you never thought of.
Here's a test to make sure that the data contains nothing but "word"
characters (alphabetics, numerics, and underscores), a hyphen, an at sign,
@@ -131,6 +131,14 @@ Laundering data using regular expression is the I<ONLY> mechanism for
untainting dirty data, unless you use the strategy detailed below to fork
a child of lesser privilege.
+The example does not untaint $data if C<use locale> is in effect,
+because the characters matched by C<\w> are determined by the locale.
+Perl considers that locale definitions are untrustworthy because they
+contain data from outside the program. If you are writing a
+locale-aware program, and want to launder data with a regular expression
+containing C<\w>, put C<no locale> ahead of the expression in the same
+block. See L<perllocale/SECURITY> for further discussion and examples.
+
=head2 Cleaning Up Your Path
For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to a