diff options
author | Brian Fraser <fraserbn@gmail.com> | 2012-01-09 06:13:43 -0300 |
---|---|---|
committer | Brian Fraser <fraserbn@gmail.com> | 2012-01-13 01:51:09 -0300 |
commit | 9a7a15a3d572c0b11adb6a40a48fcf5d859c2453 (patch) | |
tree | 2a60942271bef61e30ffcb4781a94ca833194e27 | |
parent | 211e3a35cbe8854816789427ce145da78f571095 (diff) | |
download | perl-smoke-me/hugmeir/fc_keyword.tar.gz |
pod updates for fc and \Fsmoke-me/hugmeir/fc_keyword
-rw-r--r-- | pod/perlfunc.pod | 60 | ||||
-rw-r--r-- | pod/perlop.pod | 12 | ||||
-rw-r--r-- | pod/perlrebackslash.pod | 7 | ||||
-rw-r--r-- | pod/perlreref.pod | 13 | ||||
-rw-r--r-- | pod/perlunicode.pod | 7 |
5 files changed, 82 insertions, 17 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 56c74521e7..54b3a25f3c 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -107,8 +107,8 @@ than one place. =item Functions for SCALARs or strings X<scalar> X<string> X<character> -C<chomp>, C<chop>, C<chr>, C<crypt>, C<hex>, C<index>, C<lc>, C<lcfirst>, -C<length>, C<oct>, C<ord>, C<pack>, C<q//>, C<qq//>, C<reverse>, +C<chomp>, C<chop>, C<chr>, C<crypt>, C<fc>, C<hex>, C<index>, C<lc>, +C<lcfirst>, C<length>, C<oct>, C<ord>, C<pack>, C<q//>, C<qq//>, C<reverse>, C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>, C<y///> =item Regular expressions and pattern matching @@ -1995,6 +1995,54 @@ X<exp> X<exponential> X<antilog> X<antilogarithm> X<e> Returns I<e> (the natural logarithm base) to the power of EXPR. If EXPR is omitted, gives C<exp($_)>. +=item fc EXPR +X<fc> X<foldcase> X<casefold> X<fold-case> X<case-fold> + +=item fc + +Returns the casefolded version of EXPR. This is the internal function +implementing the C<\F> escape in double-quoted strings. + +Casefolding is the process of mapping strings to a form where case +differences are erased; comparing two strings in their casefolded +form is effectively a way of asking if two strings are equal, +regardless of case. + +Roughly, if you ever found yourself writing this + + lc($this) eq lc($that) # Wrong! + # or + uc($this) eq uc($that) # Also wrong! + # or + $this =~ /\Q$that/i # Right! + +Now you can write + + fc($this) eq fc($that) + +And get the correct results. + +Perl only implements the full form of casefolding. +For further information on casefolding, refer to +the Unicode Standard, specifically sections 3.13 C<Default Case Operations>, +4.2 C<Case-Normative>, and 5.18 C<Case Mappings>, +available at L<http://www.unicode.org/versions/latest/>, as well as the +Case Charts available at L<http://www.unicode.org/charts/case/>. + +If EXPR is omitted, uses C<$_>. + +This function behaves the same way under various pragma, such as in a locale, +as L</lc> does. + +While the Unicode Standard defines two additional forms of casefolding, +one for Turkic languages and one that never maps one character into multiple +characters, these are not provided by the Perl core; However, the CPAN module +C<Unicode::Casing> may be used to provide an implementation. + +This keyword is available only when the C<"fc"> feature is enabled, +or when prefixed with C<CORE::>; See L<feature>. Alternately, +include a C<use v5.16> or later to the current scope. + =item fcntl FILEHANDLE,FUNCTION,SCALAR X<fcntl> @@ -6045,7 +6093,7 @@ Examples: @articles = sort {$a cmp $b} @files; # now case-insensitively - @articles = sort {uc($a) cmp uc($b)} @files; + @articles = sort {fc($a) cmp fc($b)} @files; # same thing in reversed order @articles = sort {$b cmp $a} @files; @@ -6083,7 +6131,7 @@ Examples: my @new = sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || - uc($a) cmp uc($b) + fc($a) cmp fc($b) } @old; # same thing, but much more efficiently; @@ -6092,7 +6140,7 @@ Examples: my @nums = @caps = (); for (@old) { push @nums, ( /=(\d+)/ ? $1 : undef ); - push @caps, uc($_); + push @caps, fc($_); } my @new = @old[ sort { @@ -6107,7 +6155,7 @@ Examples: sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] - } map { [$_, /=(\d+)/, uc($_)] } @old; + } map { [$_, /=(\d+)/, fc($_)] } @old; # using a prototype allows you to use any comparison subroutine # as a sort subroutine (including other package's subroutines) diff --git a/pod/perlop.pod b/pod/perlop.pod index 80add657e2..0275346c84 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -1493,17 +1493,18 @@ does have meaning in regular expression patterns in Perl, see L<perlre>.) The following escape sequences are available in constructs that interpolate, but not in transliterations. -X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> +X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F> \l lowercase next character only \u titlecase (not uppercase!) next character only \L lowercase all characters till \E or end of string \U uppercase all characters till \E or end of string + \F foldcase all characters till \E or end of string \Q quote non-word characters till \E or end of string \E end either case modification or quoted section (whichever was last seen) -C<\L>, C<\U>, and C<\Q> can stack, in which case you need one +C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one C<\E> for each. For example: say "This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?"; @@ -1515,6 +1516,7 @@ If Unicode (for example, C<\N{}> or code points of 0x100 or beyond) is being used, the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is as defined by Unicode. That means that case-mapping a single character can sometimes produce several characters. +Under C<use locale>, C<\F> produces the same results as C<\L>. All systems use the virtual C<"\n"> to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical @@ -2611,7 +2613,7 @@ as a literal C<->. =item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF"> -C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are +C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar"> is converted to C<$foo . (quotemeta("baz" . $bar))> internally. The other escape sequences such as C<\200> and C<\t> and backslashed @@ -2662,7 +2664,7 @@ Fortunately, it's usually correct for ambiguous cases. =item the replacement of C<s///> -Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation +Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation happens as with C<qq//> constructs. It is at this step that C<\1> is begrudgingly converted to C<$1> in @@ -2673,7 +2675,7 @@ is emitted if the C<use warnings> pragma or the B<-w> command-line flag =item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, -Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>, +Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>, and interpolation happens (almost) as with C<qq//> constructs. Processing of C<\N{...}> is also done here, and compiled into an intermediate diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index f9b963c96c..cc72a1f14e 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -75,6 +75,7 @@ as C<Not in [].> \e Escape character. \E Turn off \Q, \L and \U processing. Not in []. \f Form feed. + \F Foldcase till \E. Not in []. \g{}, \g1 Named, absolute or relative backreference. Not in [] \G Pos assertion. Not in []. \h Character class for horizontal whitespace. @@ -336,7 +337,11 @@ isn't a letter, digit, or underscore. This ensures that any character between C<\Q> and C<\E> shall be matched literally, not interpreted as a metacharacter by the regex engine. -Mnemonic: I<L>owercase, I<U>ppercase, I<Q>uotemeta, I<E>nd. +C<\F> can be used to casefold all characters following, up to the next C<\E> +or the end of the pattern. It provides the functionality similar to +the C<fc> function. + +Mnemonic: I<L>owercase, I<U>ppercase, I<F>old-case, I<Q>uotemeta, I<E>nd. =head4 Examples diff --git a/pod/perlreref.pod b/pod/perlreref.pod index 11da56d98f..954a423759 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -107,6 +107,7 @@ These work as in normal strings. \u Titlecase next character \L Lowercase until \E \U Uppercase until \E + \F Foldcase until \E \Q Disable pattern metacharacters until \E \E End modification @@ -299,6 +300,7 @@ Captured groups are numbered according to their I<opening> paren. lcfirst Lowercase first char of a string uc Uppercase a string ucfirst Titlecase first char of a string + fc Foldcase a string pos Return or set current match position quotemeta Quote metacharacters @@ -307,8 +309,9 @@ Captured groups are numbered according to their I<opening> paren. split Use a regex to split a string into parts -The first four of these are like the escape sequences C<\L>, C<\l>, -C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. +The first five of these are like the escape sequences C<\L>, C<\l>, +C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For +Foldcase, see L</Foldcase>. =head2 TERMINOLOGY @@ -317,6 +320,12 @@ C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. Unicode concept which most often is equal to uppercase, but for certain characters like the German "sharp s" there is a difference. +=head3 Foldcase + +Unicode form that is useful when comparing strings regardless of case, +as certain characters have compex one-to-many case mappings. Primarily a +variant of lowercase. + =head1 AUTHOR Iain Truskett. Updated by the Perl 5 Porters. diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index b50ae93618..41498fcbe1 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -263,9 +263,10 @@ complement B<and> the full character-wide bit complement. =item * There is a CPAN module, L<Unicode::Casing>, which allows you to define -your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>, and -C<ucfirst()> (or their double-quoted string inlined versions such as -C<\U>). (Prior to Perl 5.16, this functionality was partially provided +your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>, +C<ucfirst()>, and C<fc> (or their double-quoted string inlined +versions such as C<\U>). +(Prior to Perl 5.16, this functionality was partially provided in the Perl core, but suffered from a number of insurmountable drawbacks, so the CPAN module was written instead.) |