User-defined character properties were unintentionally

removed, noticed by Dan Kogai. p4raw-id: //depot/perl@16012
author: Jarkko Hietaniemi <jhi@iki.fi> 2002-04-20 01:46:03 +0000
committer: Jarkko Hietaniemi <jhi@iki.fi> 2002-04-20 01:46:03 +0000
commit: 491fd90a109f6263a896300e5709e6fd255f075f (patch)
tree: 596ceeddf227da61927d12e4c2ce4c324fc43bbd /pod/perlunicode.pod
parent: ee081dd1f02934d943364e5d6bd4130bf9c3e0ad (diff)
download: perl-491fd90a109f6263a896300e5709e6fd255f075f.tar.gz
1 files changed, 80 insertions, 0 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index af79344402..46080430a7 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -615,6 +615,86 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
 
 =back
 
+=head2 Defining your own character properties
+
+You can define your own character properties by defining subroutines
+that have names beginning with "In" or "Is".  The subroutines must be
+visible in the package that uses the properties.  The user-defined
+properties can be used in the regular expression C<\p> and C<\P>
+constructs.
+
+The subroutines must return a specially formatted string: one or more
+newline-separated lines.  Each line must be one of the following:
+
+=over 4
+
+=item *
+
+Two hexadecimal numbers separated by a tabulator denoting a range
+of Unicode codepoints.
+
+=item *
+
+An existing character property prefixed by "+utf8::" to include
+all the characters in that property.
+
+=item *
+
+An existing character property prefixed by "-utf8::" to exclude
+all the characters in that property.
+
+=item *
+
+An existing character property prefixed by "!utf8::" to include
+all except the characters in that property.
+
+=back
+
+For example, to define a property that covers both the Japanese
+syllabaries (hiragana and katakana), you can define
+
+    sub InKana {
+	return <<'END';
+    3040    309F
+    30A0    30FF
+    END
+    }
+
+Imagine that the here-doc end marker is at the beginning of the line,
+and that the hexadecimal numbers are separated by a tabulator.
+Now you can use C<\p{InKana}> and C<\P{IsKana}>.
+
+You could also have used the existing block property names:
+
+    sub InKana {
+	return <<'END';
+    +utf8::InHiragana
+    +utf8::InKatakana
+    END
+    }
+
+Suppose you wanted to match only the allocated characters,
+not the by raw block ranges: in other words, you want to remove
+the non-characters:
+
+    sub InKana {
+	return <<'END';
+    +utf8::InHiragana
+    +utf8::InKatakana
+    -utf8::IsCn
+    END
+    }
+
+The negation is useful for defining (surprise!) negated classes.
+
+    sub InNotKana {
+	return <<'END';
+    !utf8::InHiragana
+    -utf8::InKatakana
+    +utf8::IsCn
+    END
+    }
+
 =head2 Character encodings for input and output
 
 See L<Encode>.
author	Jarkko Hietaniemi <jhi@iki.fi>	2002-04-20 01:46:03 +0000
committer	Jarkko Hietaniemi <jhi@iki.fi>	2002-04-20 01:46:03 +0000
commit	491fd90a109f6263a896300e5709e6fd255f075f (patch)
tree	596ceeddf227da61927d12e4c2ce4c324fc43bbd /pod/perlunicode.pod
parent	ee081dd1f02934d943364e5d6bd4130bf9c3e0ad (diff)
download	perl-491fd90a109f6263a896300e5709e6fd255f075f.tar.gz