summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2001-01-20 20:15:30 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2001-01-20 20:15:30 +0000
commit49cb94c67d828cadfe8cac24ae5955cf752eb2df (patch)
tree52cd7eb3150a55b88b170492879696fbbd72bda9
parent3384d91b59037da95d8c4ea56131ee567d0c261c (diff)
downloadperl-49cb94c67d828cadfe8cac24ae5955cf752eb2df.tar.gz
Document and test the new qu operator.
p4raw-id: //depot/perl@8485
-rw-r--r--MANIFEST1
-rw-r--r--pod/perlfunc.pod9
-rw-r--r--pod/perlop.pod66
-rw-r--r--pod/perlre.pod1
-rw-r--r--pod/perlunicode.pod14
-rw-r--r--t/op/qu.t24
6 files changed, 80 insertions, 35 deletions
diff --git a/MANIFEST b/MANIFEST
index 49813e86d1..4269c3cf4e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -1557,6 +1557,7 @@ t/op/pat.t See if esoteric patterns work
t/op/pos.t See if pos works
t/op/push.t See if push and pop work
t/op/pwent.t See if getpw*() functions work
+t/op/qu.t See if qu works
t/op/quotemeta.t See if quotemeta works
t/op/rand.t See if rand works
t/op/range.t See if .. works
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 82086e3c96..9228fdbb84 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -96,8 +96,9 @@ than one place.
=item Functions for SCALARs or strings
C<chomp>, C<chop>, C<chr>, C<crypt>, C<hex>, C<index>, C<lc>, C<lcfirst>,
-C<length>, C<oct>, C<ord>, C<pack>, C<q/STRING/>, C<qq/STRING/>, C<reverse>,
-C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>, C<y///>
+C<length>, C<oct>, C<ord>, C<pack>, C<q/STRING/>, C<qq/STRING/>, C<qu/STRING/>,
+C<reverse>, C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>,
+C<y///>
=item Regular expressions and pattern matching
@@ -3469,10 +3470,12 @@ but is more efficient. Returns the new number of elements in the array.
=item qr/STRING/
-=item qx/STRING/
+=item qu/STRING/
=item qw/STRING/
+=item qx/STRING/
+
Generalized quotes. See L<perlop/"Regexp Quote-Like Operators">.
=item quotemeta EXPR
diff --git a/pod/perlop.pod b/pod/perlop.pod
index 0bb506ddc7..ebe52c568e 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -645,6 +645,7 @@ any pair of delimiters you choose.
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
+ qu{} Literal yes, Unicode
`` qx{} Command yes (unless '' is delimiter)
qw{} Word list no
// m{} Pattern match yes (unless '' is delimiter)
@@ -1011,6 +1012,44 @@ Options are:
See L<perlre> for additional information on valid syntax for STRING, and
for a detailed look at the semantics of regular expressions.
+=item qw/STRING/
+
+Evaluates to a list of the words extracted out of STRING, using embedded
+whitespace as the word delimiters. It can be understood as being roughly
+equivalent to:
+
+ split(' ', q/STRING/);
+
+the difference being that it generates a real list at compile time. So
+this expression:
+
+ qw(foo bar baz)
+
+is semantically equivalent to the list:
+
+ 'foo', 'bar', 'baz'
+
+Some frequently seen examples:
+
+ use POSIX qw( setlocale localeconv )
+ @EXPORT = qw( foo bar baz );
+
+A common mistake is to try to separate the words with comma or to
+put comments into a multi-line C<qw>-string. For this reason, the
+C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
+produces warnings if the STRING contains the "," or the "#" character.
+
+=item qu/STRING/
+
+Like L<qq> but generates Unicode for characters whose code points are
+greater than 128, or 0x80. Such characters can be generated using
+the \xHH (for characters 0x80...0xff, or 128..255) and \x{HHH...}
+notations (for characters 0x100..., or greater than 256).
+
+(In qq/STRING/, or "", both the \xHH and the \x{HHH...} generate
+bytes for the 0x80..0xff range (these bytes are host-dependent),
+and the \x{HHH...} can be used to generate Unicode.)
+
=item qx/STRING/
=item `STRING`
@@ -1092,33 +1131,6 @@ Just understand what you're getting yourself into.
See L<"I/O Operators"> for more discussion.
-=item qw/STRING/
-
-Evaluates to a list of the words extracted out of STRING, using embedded
-whitespace as the word delimiters. It can be understood as being roughly
-equivalent to:
-
- split(' ', q/STRING/);
-
-the difference being that it generates a real list at compile time. So
-this expression:
-
- qw(foo bar baz)
-
-is semantically equivalent to the list:
-
- 'foo', 'bar', 'baz'
-
-Some frequently seen examples:
-
- use POSIX qw( setlocale localeconv )
- @EXPORT = qw( foo bar baz );
-
-A common mistake is to try to separate the words with comma or to
-put comments into a multi-line C<qw>-string. For this reason, the
-C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
-produces warnings if the STRING contains the "," or the "#" character.
-
=item s/PATTERN/REPLACEMENT/egimosx
Searches a string for a pattern, and if found, replaces that pattern
diff --git a/pod/perlre.pod b/pod/perlre.pod
index c5ecb13c40..0c38ac7cba 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -179,6 +179,7 @@ In addition, Perl defines the following:
\X Match eXtended Unicode "combining character sequence",
equivalent to C<(?:\PM\pM*)>
\C Match a single C char (octet) even under utf8.
+ (Currently this does not work correctly.)
A C<\w> matches a single alphanumeric character or C<_>, not a whole word.
Use C<\w+> to match a string of Perl-identifier characters (which isn't
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 30a4482260..b8bbc5707c 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -16,7 +16,8 @@ The following areas need further work.
There is currently no easy way to mark data read from a file or other
external source as being utf8. This will be one of the major areas of
-focus in the near future.
+focus in the near future. Unfortunately it is unlikely that the Perl
+5.6 and earlier will ever gain this capability.
=item Regular Expressions
@@ -66,7 +67,8 @@ or from literals and constants in the source text.
If the C<-C> command line switch is used, (or the ${^WIDE_SYSTEM_CALLS}
global flag is set to C<1>), all system calls will use the
corresponding wide character APIs. This is currently only implemented
-on Windows.
+on Windows as other platforms do not have a unified way of handling
+wide character APIs.
Regardless of the above, the C<bytes> pragma can always be used to force
byte semantics in a particular lexical scope. See L<bytes>.
@@ -127,8 +129,7 @@ attempt to canonicalize variable names for you.)
Regular expressions match characters instead of bytes. For instance,
"." matches a character instead of a byte. (However, the C<\C> pattern
-is provided to force a match a single byte ("C<char>" in C, hence
-C<\C>).)
+is available to force a match a single byte ("C<char>" in C, hence C<\C>).)
=item *
@@ -216,7 +217,10 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
=head2 Character encodings for input and output
-[XXX: This feature is not yet implemented.]
+This feature is in the process of getting implemented.
+
+(For Perl 5.6 and earlier the support is unlikely to get integrated
+to the core language and some external module will be required.)
=head1 CAVEATS
diff --git a/t/op/qu.t b/t/op/qu.t
new file mode 100644
index 0000000000..280020445c
--- /dev/null
+++ b/t/op/qu.t
@@ -0,0 +1,24 @@
+print "1..6\n";
+
+my $foo = "foo";
+
+print "not " unless qu(abc$foo) eq "abcfoo";
+print "ok 1\n";
+
+# qu is always Unicode, even in EBCDIC, so \x41 is 'A' and \x{61} is 'a'.
+
+print "not " unless qu(abc\x41) eq "abcA";
+print "ok 2\n";
+
+print "not " unless qu(abc\x{61}$foo) eq "abcafoo";
+print "ok 3\n";
+
+print "not " unless qu(\x{41}\x{100}\x61\x{200}) eq "A\x{100}a\x{200}";
+print "ok 4\n";
+
+print "not " unless join(" ", unpack("C*", qu(\x80))) eq "194 128";
+print "ok 5\n";
+
+print "not " unless join(" ", unpack("C*", qu(\x{100}))) eq "196 128";
+print "ok 6\n";
+