pod updates for fc and \Fsmoke-me/hugmeir/fc_keyword

author: Brian Fraser <fraserbn@gmail.com> 2012-01-09 06:13:43 -0300
committer: Brian Fraser <fraserbn@gmail.com> 2012-01-13 01:51:09 -0300
commit: 9a7a15a3d572c0b11adb6a40a48fcf5d859c2453 (patch)
tree: 2a60942271bef61e30ffcb4781a94ca833194e27
parent: 211e3a35cbe8854816789427ce145da78f571095 (diff)
download: perl-smoke-me/hugmeir/fc_keyword.tar.gz
5 files changed, 82 insertions, 17 deletions
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 56c74521e7..54b3a25f3c 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -107,8 +107,8 @@ than one place.
 =item Functions for SCALARs or strings
 X<scalar> X<string> X<character>
 
-C<chomp>, C<chop>, C<chr>, C<crypt>, C<hex>, C<index>, C<lc>, C<lcfirst>,
-C<length>, C<oct>, C<ord>, C<pack>, C<q//>, C<qq//>, C<reverse>,
+C<chomp>, C<chop>, C<chr>, C<crypt>, C<fc>, C<hex>, C<index>, C<lc>,
+C<lcfirst>, C<length>, C<oct>, C<ord>, C<pack>, C<q//>, C<qq//>, C<reverse>,
 C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>, C<y///>
 
 =item Regular expressions and pattern matching
@@ -1995,6 +1995,54 @@ X<exp> X<exponential> X<antilog> X<antilogarithm> X<e>
 Returns I<e> (the natural logarithm base) to the power of EXPR.
 If EXPR is omitted, gives C<exp($_)>.
 
+=item fc EXPR
+X<fc> X<foldcase> X<casefold> X<fold-case> X<case-fold>
+
+=item fc
+
+Returns the casefolded version of EXPR.  This is the internal function
+implementing the C<\F> escape in double-quoted strings.
+
+Casefolding is the process of mapping strings to a form where case
+differences are erased; comparing two strings in their casefolded
+form is effectively a way of asking if two strings are equal,
+regardless of case.
+
+Roughly, if you ever found yourself writing this
+
+    lc($this) eq lc($that)  # Wrong!
+        # or
+    uc($this) eq uc($that)  # Also wrong!
+        # or
+    $this =~ /\Q$that/i     # Right!
+
+Now you can write
+
+    fc($this) eq fc($that)
+
+And get the correct results.
+
+Perl only implements the full form of casefolding.
+For further information on casefolding, refer to
+the Unicode Standard, specifically sections 3.13 C<Default Case Operations>, 
+4.2 C<Case-Normative>, and 5.18 C<Case Mappings>, 
+available at L<http://www.unicode.org/versions/latest/>, as well as the
+Case Charts available at L<http://www.unicode.org/charts/case/>.
+
+If EXPR is omitted, uses C<$_>.
+
+This function behaves the same way under various pragma, such as in a locale,
+as L</lc> does.
+
+While the Unicode Standard defines two additional forms of casefolding,
+one for Turkic languages and one that never maps one character into multiple
+characters, these are not provided by the Perl core; However, the CPAN module
+C<Unicode::Casing> may be used to provide an implementation.
+
+This keyword is available only when the C<"fc"> feature is enabled,
+or when prefixed with C<CORE::>; See L<feature>. Alternately,
+include a C<use v5.16> or later to the current scope.
+
 =item fcntl FILEHANDLE,FUNCTION,SCALAR
 X<fcntl>
 
@@ -6045,7 +6093,7 @@ Examples:
     @articles = sort {$a cmp $b} @files;
     
     # now case-insensitively
-    @articles = sort {uc($a) cmp uc($b)} @files;
+    @articles = sort {fc($a) cmp fc($b)} @files;
     
     # same thing in reversed order
     @articles = sort {$b cmp $a} @files;
@@ -6083,7 +6131,7 @@ Examples:
     my @new = sort {
         ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
                             ||
-                    uc($a)  cmp  uc($b)
+                    fc($a)  cmp  fc($b)
     } @old;
 
     # same thing, but much more efficiently;
@@ -6092,7 +6140,7 @@ Examples:
     my @nums = @caps = ();
     for (@old) {
         push @nums, ( /=(\d+)/ ? $1 : undef );
-        push @caps, uc($_);
+        push @caps, fc($_);
     }
 
     my @new = @old[ sort {
@@ -6107,7 +6155,7 @@ Examples:
            sort { $b->[1] <=> $a->[1]
                            ||
                   $a->[2] cmp $b->[2]
-           } map { [$_, /=(\d+)/, uc($_)] } @old;
+           } map { [$_, /=(\d+)/, fc($_)] } @old;
 
     # using a prototype allows you to use any comparison subroutine
     # as a sort subroutine (including other package's subroutines)
diff --git a/pod/perlop.pod b/pod/perlop.pod
index 80add657e2..0275346c84 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1493,17 +1493,18 @@ does have meaning in regular expression patterns in Perl, see L<perlre>.)
 
 The following escape sequences are available in constructs that interpolate,
 but not in transliterations.
-X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
+X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F>
 
     \l		lowercase next character only
     \u		titlecase (not uppercase!) next character only
     \L		lowercase all characters till \E or end of string
     \U		uppercase all characters till \E or end of string
+    \F		foldcase all characters till \E or end of string
     \Q		quote non-word characters till \E or end of string
     \E		end either case modification or quoted section
 		(whichever was last seen)
 
-C<\L>, C<\U>, and C<\Q> can stack, in which case you need one
+C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one
 C<\E> for each.  For example:
 
 	say "This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
@@ -1515,6 +1516,7 @@ If Unicode (for example, C<\N{}> or code points of 0x100 or
 beyond) is being used, the case map used by C<\l>, C<\L>, C<\u>, and
 C<\U> is as defined by Unicode.  That means that case-mapping
 a single character can sometimes produce several characters.
+Under C<use locale>, C<\F> produces the same results as C<\L>.
 
 All systems use the virtual C<"\n"> to represent a line terminator,
 called a "newline".  There is no such thing as an unvarying, physical
@@ -2611,7 +2613,7 @@ as a literal C<->.
 
 =item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
 
-C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
+C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are
 converted to corresponding Perl constructs.  Thus, C<"$foo\Qbaz$bar">
 is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
 The other escape sequences such as C<\200> and C<\t> and backslashed
@@ -2662,7 +2664,7 @@ Fortunately, it's usually correct for ambiguous cases.
 
 =item the replacement of C<s///>
 
-Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
+Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation
 happens as with C<qq//> constructs.
 
 It is at this step that C<\1> is begrudgingly converted to C<$1> in
@@ -2673,7 +2675,7 @@ is emitted if the C<use warnings> pragma or the B<-w> command-line flag
 
 =item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
 
-Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>,
+Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>,
 and interpolation happens (almost) as with C<qq//> constructs.
 
 Processing of C<\N{...}> is also done here, and compiled into an intermediate
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod
index f9b963c96c..cc72a1f14e 100644
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -75,6 +75,7 @@ as C<Not in [].>
  \e                Escape character.
  \E                Turn off \Q, \L and \U processing.  Not in [].
  \f                Form feed.
+ \F                Foldcase till \E.  Not in [].
  \g{}, \g1         Named, absolute or relative backreference.  Not in []
  \G                Pos assertion.  Not in [].
  \h                Character class for horizontal whitespace.
@@ -336,7 +337,11 @@ isn't a letter, digit, or underscore. This ensures that any character
 between C<\Q> and C<\E> shall be matched literally, not interpreted
 as a metacharacter by the regex engine.
 
-Mnemonic: I<L>owercase, I<U>ppercase, I<Q>uotemeta, I<E>nd.
+C<\F> can be used to casefold all characters following, up to the next C<\E>
+or the end of the pattern. It provides the functionality similar to
+the C<fc> function.
+
+Mnemonic: I<L>owercase, I<U>ppercase, I<F>old-case, I<Q>uotemeta, I<E>nd.
 
 =head4 Examples
 
diff --git a/pod/perlreref.pod b/pod/perlreref.pod
index 11da56d98f..954a423759 100644
--- a/pod/perlreref.pod
+++ b/pod/perlreref.pod
@@ -107,6 +107,7 @@ These work as in normal strings.
    \u  Titlecase next character
    \L  Lowercase until \E
    \U  Uppercase until \E
+   \F  Foldcase until \E
    \Q  Disable pattern metacharacters until \E
    \E  End modification
 
@@ -299,6 +300,7 @@ Captured groups are numbered according to their I<opening> paren.
    lcfirst     Lowercase first char of a string
    uc          Uppercase a string
    ucfirst     Titlecase first char of a string
+   fc          Foldcase a string
 
    pos         Return or set current match position
    quotemeta   Quote metacharacters
@@ -307,8 +309,9 @@ Captured groups are numbered according to their I<opening> paren.
 
    split       Use a regex to split a string into parts
 
-The first four of these are like the escape sequences C<\L>, C<\l>,
-C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
+The first five of these are like the escape sequences C<\L>, C<\l>,
+C<\U>, C<\u>, and C<\F>.  For Titlecase, see L</Titlecase>; For
+Foldcase, see L</Foldcase>.
 
 =head2 TERMINOLOGY
 
@@ -317,6 +320,12 @@ C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
 Unicode concept which most often is equal to uppercase, but for
 certain characters like the German "sharp s" there is a difference.
 
+=head3 Foldcase
+
+Unicode form that is useful when comparing strings regardless of case,
+as certain characters have compex one-to-many case mappings. Primarily a
+variant of lowercase.
+
 =head1 AUTHOR
 
 Iain Truskett. Updated by the Perl 5 Porters.
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index b50ae93618..41498fcbe1 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -263,9 +263,10 @@ complement B<and> the full character-wide bit complement.
 =item *
 
 There is a CPAN module, L<Unicode::Casing>, which allows you to define
-your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>, and
-C<ucfirst()> (or their double-quoted string inlined versions such as
-C<\U>).  (Prior to Perl 5.16, this functionality was partially provided
+your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>,
+C<ucfirst()>, and C<fc> (or their double-quoted string inlined
+versions such as C<\U>).
+(Prior to Perl 5.16, this functionality was partially provided
 in the Perl core, but suffered from a number of insurmountable
 drawbacks, so the CPAN module was written instead.)
author	Brian Fraser <fraserbn@gmail.com>	2012-01-09 06:13:43 -0300
committer	Brian Fraser <fraserbn@gmail.com>	2012-01-13 01:51:09 -0300
commit	9a7a15a3d572c0b11adb6a40a48fcf5d859c2453 (patch)
tree	2a60942271bef61e30ffcb4781a94ca833194e27
parent	211e3a35cbe8854816789427ce145da78f571095 (diff)
download	perl-smoke-me/hugmeir/fc_keyword.tar.gz