diff options
author | Jeff Pinyan <japhy@pobox.com> | 2004-04-14 13:01:38 -0400 |
---|---|---|
committer | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2004-04-19 08:56:28 +0000 |
commit | bac0b42524fd3607268d7139a21b07697a1c978b (patch) | |
tree | 432e583d3de1bc2fa9d32e63fc59b743c101a704 /pod/perlunicode.pod | |
parent | 0e788c723acb32640922903142ad781f29abd676 (diff) | |
download | perl-bac0b42524fd3607268d7139a21b07697a1c978b.tar.gz |
Re: [PATCH] lib/utf8_heavy.pl -- cascading classes and '&' support
From: "Jeff 'japhy' Pinyan" <japhy@perlmonk.org>
Message-ID: <Pine.LNX.4.44.0404141659480.11423-301000@perlmonk.org>
p4raw-id: //depot/perl@22713
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r-- | pod/perlunicode.pod | 57 |
1 files changed, 44 insertions, 13 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 7de87ac72c..0817bb36e9 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -632,10 +632,21 @@ And finally, C<scalar reverse()> reverses by character rather than by byte. =head2 User-Defined Character Properties You can define your own character properties by defining subroutines -whose names begin with "In" or "Is". The subroutines must be defined -in the C<main> package. The user-defined properties can be used in the -regular expression C<\p> and C<\P> constructs. Note that the effect -is compile-time and immutable once defined. +whose names begin with "In" or "Is". The subroutines can be defined in +any package. The user-defined properties can be used in the regular +expression C<\p> and C<\P> constructs; if you are using a user-defined +property from a package other than the one you are in, you must specify +its package in the C<\p> or C<\P> construct. + + # assuming property IsForeign defined in Lang:: + package main; # property package name required + if ($txt =~ /\p{Lang::IsForeign}+/) { ... } + + package Lang; # property package name not required + if ($txt =~ /\p{IsForeign}+/) { ... } + + +Note that the effect is compile-time and immutable once defined. The subroutines must return a specially-formatted string, with one or more newline-separated lines. Each line must be one of the following: @@ -650,23 +661,30 @@ tabular characters) denoting a range of Unicode code points to include. =item * Something to include, prefixed by "+": a built-in character -property (prefixed by "utf8::"), to represent all the characters in that -property; two hexadecimal code points for a range; or a single -hexadecimal code point. +property (prefixed by "utf8::") or a user-defined character property, +to represent all the characters in that property; two hexadecimal code +points for a range; or a single hexadecimal code point. =item * Something to exclude, prefixed by "-": an existing character -property (prefixed by "utf8::"), for all the characters in that -property; two hexadecimal code points for a range; or a single -hexadecimal code point. +property (prefixed by "utf8::") or a user-defined character property, +to represent all the characters in that property; two hexadecimal code +points for a range; or a single hexadecimal code point. =item * Something to negate, prefixed "!": an existing character -property (prefixed by "utf8::") for all the characters except the -characters in the property; two hexadecimal code points for a range; -or a single hexadecimal code point. +property (prefixed by "utf8::") or a user-defined character property, +to represent all the characters in that property; two hexadecimal code +points for a range; or a single hexadecimal code point. + +=item * + +Something to intersect with, prefixed by "&": an existing character +property (prefixed by "utf8::") or a user-defined character property, +for all the characters except the characters in the property; two +hexadecimal code points for a range; or a single hexadecimal code point. =back @@ -714,6 +732,19 @@ The negation is useful for defining (surprise!) negated classes. END } +Intersection is useful for getting the common characters matched by +two (or more) classes. + + sub InFooAndBar { + return <<'END'; + +main::Foo + &main::Bar + END + } + +It's important to remember not to use "&" for the first set -- that +would be intersecting with nothing (resulting in an empty set). + You can also define your own mappings to be used in the lc(), lcfirst(), uc(), and ucfirst() (or their string-inlined versions). The principle is the same: define subroutines in the C<main> package |