summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2002-04-21 21:24:07 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2002-04-21 21:24:07 +0000
commit237bad5b7686b69e853033a4269a1410b74d1ed4 (patch)
treeb6e5bc3a0d025e941f9ec6c81e6386a29c09687c /pod
parente0a1f643a0cdeaeaadd4feec0912681a40607520 (diff)
downloadperl-237bad5b7686b69e853033a4269a1410b74d1ed4.tar.gz
One more way to do character class subtraction.
p4raw-id: //depot/perl@16052
Diffstat (limited to 'pod')
-rw-r--r--pod/perlunicode.pod9
1 files changed, 6 insertions, 3 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index f635013583..033c9ac5a9 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -615,7 +615,7 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
=back
-=head2 Defining your own character properties
+=head2 User-defined Character Properties
You can define your own character properties by defining subroutines
that have names beginning with "In" or "Is". The subroutines must be
@@ -724,7 +724,8 @@ Level 1 - Basic Unicode Support
[ 3] . \p{...} \P{...}
[ 4] now scripts (see UTR#24 Script Names) in addition to blocks
[ 5] have negation
- [ 6] can use look-ahead to emulate subtraction (*)
+ [ 6] can use regular expression look-ahead [a]
+ or user-defined character properties [b] to emulate subtraction
[ 7] include Letters in word characters
[ 8] note that perl does Full casefolding in matching, not Simple:
for example U+1F88 is equivalent with U+1F000 U+03B9,
@@ -737,7 +738,7 @@ Level 1 - Basic Unicode Support
(should also affect <>, $., and script line numbers)
(the \x{85}, \x{2028} and \x{2029} do match \s)
-(*) You can mimic class subtraction using lookahead.
+[a] You can mimic class subtraction using lookahead.
For example, what TR18 might write as
[{Greek}-[{UNASSIGNED}]]
@@ -753,6 +754,8 @@ But in this particular example, you probably really want
which will match assigned characters known to be part of the Greek script.
+[b] See L</User-defined Character Properties>.
+
=item *
Level 2 - Extended Unicode Support