summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorJeff Pinyan <japhy@pobox.com>2004-04-14 13:01:38 -0400
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2004-04-19 08:56:28 +0000
commitbac0b42524fd3607268d7139a21b07697a1c978b (patch)
tree432e583d3de1bc2fa9d32e63fc59b743c101a704 /pod/perlunicode.pod
parent0e788c723acb32640922903142ad781f29abd676 (diff)
downloadperl-bac0b42524fd3607268d7139a21b07697a1c978b.tar.gz
Re: [PATCH] lib/utf8_heavy.pl -- cascading classes and '&' support
From: "Jeff 'japhy' Pinyan" <japhy@perlmonk.org> Message-ID: <Pine.LNX.4.44.0404141659480.11423-301000@perlmonk.org> p4raw-id: //depot/perl@22713
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod57
1 files changed, 44 insertions, 13 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 7de87ac72c..0817bb36e9 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -632,10 +632,21 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
=head2 User-Defined Character Properties
You can define your own character properties by defining subroutines
-whose names begin with "In" or "Is". The subroutines must be defined
-in the C<main> package. The user-defined properties can be used in the
-regular expression C<\p> and C<\P> constructs. Note that the effect
-is compile-time and immutable once defined.
+whose names begin with "In" or "Is". The subroutines can be defined in
+any package. The user-defined properties can be used in the regular
+expression C<\p> and C<\P> constructs; if you are using a user-defined
+property from a package other than the one you are in, you must specify
+its package in the C<\p> or C<\P> construct.
+
+ # assuming property IsForeign defined in Lang::
+ package main; # property package name required
+ if ($txt =~ /\p{Lang::IsForeign}+/) { ... }
+
+ package Lang; # property package name not required
+ if ($txt =~ /\p{IsForeign}+/) { ... }
+
+
+Note that the effect is compile-time and immutable once defined.
The subroutines must return a specially-formatted string, with one
or more newline-separated lines. Each line must be one of the following:
@@ -650,23 +661,30 @@ tabular characters) denoting a range of Unicode code points to include.
=item *
Something to include, prefixed by "+": a built-in character
-property (prefixed by "utf8::"), to represent all the characters in that
-property; two hexadecimal code points for a range; or a single
-hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
=item *
Something to exclude, prefixed by "-": an existing character
-property (prefixed by "utf8::"), for all the characters in that
-property; two hexadecimal code points for a range; or a single
-hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
=item *
Something to negate, prefixed "!": an existing character
-property (prefixed by "utf8::") for all the characters except the
-characters in the property; two hexadecimal code points for a range;
-or a single hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
+
+=item *
+
+Something to intersect with, prefixed by "&": an existing character
+property (prefixed by "utf8::") or a user-defined character property,
+for all the characters except the characters in the property; two
+hexadecimal code points for a range; or a single hexadecimal code point.
=back
@@ -714,6 +732,19 @@ The negation is useful for defining (surprise!) negated classes.
END
}
+Intersection is useful for getting the common characters matched by
+two (or more) classes.
+
+ sub InFooAndBar {
+ return <<'END';
+ +main::Foo
+ &main::Bar
+ END
+ }
+
+It's important to remember not to use "&" for the first set -- that
+would be intersecting with nothing (resulting in an empty set).
+
You can also define your own mappings to be used in the lc(),
lcfirst(), uc(), and ucfirst() (or their string-inlined versions).
The principle is the same: define subroutines in the C<main> package