diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-01-20 20:15:30 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-01-20 20:15:30 +0000 |
commit | 49cb94c67d828cadfe8cac24ae5955cf752eb2df (patch) | |
tree | 52cd7eb3150a55b88b170492879696fbbd72bda9 | |
parent | 3384d91b59037da95d8c4ea56131ee567d0c261c (diff) | |
download | perl-49cb94c67d828cadfe8cac24ae5955cf752eb2df.tar.gz |
Document and test the new qu operator.
p4raw-id: //depot/perl@8485
-rw-r--r-- | MANIFEST | 1 | ||||
-rw-r--r-- | pod/perlfunc.pod | 9 | ||||
-rw-r--r-- | pod/perlop.pod | 66 | ||||
-rw-r--r-- | pod/perlre.pod | 1 | ||||
-rw-r--r-- | pod/perlunicode.pod | 14 | ||||
-rw-r--r-- | t/op/qu.t | 24 |
6 files changed, 80 insertions, 35 deletions
@@ -1557,6 +1557,7 @@ t/op/pat.t See if esoteric patterns work t/op/pos.t See if pos works t/op/push.t See if push and pop work t/op/pwent.t See if getpw*() functions work +t/op/qu.t See if qu works t/op/quotemeta.t See if quotemeta works t/op/rand.t See if rand works t/op/range.t See if .. works diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 82086e3c96..9228fdbb84 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -96,8 +96,9 @@ than one place. =item Functions for SCALARs or strings C<chomp>, C<chop>, C<chr>, C<crypt>, C<hex>, C<index>, C<lc>, C<lcfirst>, -C<length>, C<oct>, C<ord>, C<pack>, C<q/STRING/>, C<qq/STRING/>, C<reverse>, -C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>, C<y///> +C<length>, C<oct>, C<ord>, C<pack>, C<q/STRING/>, C<qq/STRING/>, C<qu/STRING/>, +C<reverse>, C<rindex>, C<sprintf>, C<substr>, C<tr///>, C<uc>, C<ucfirst>, +C<y///> =item Regular expressions and pattern matching @@ -3469,10 +3470,12 @@ but is more efficient. Returns the new number of elements in the array. =item qr/STRING/ -=item qx/STRING/ +=item qu/STRING/ =item qw/STRING/ +=item qx/STRING/ + Generalized quotes. See L<perlop/"Regexp Quote-Like Operators">. =item quotemeta EXPR diff --git a/pod/perlop.pod b/pod/perlop.pod index 0bb506ddc7..ebe52c568e 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -645,6 +645,7 @@ any pair of delimiters you choose. Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes + qu{} Literal yes, Unicode `` qx{} Command yes (unless '' is delimiter) qw{} Word list no // m{} Pattern match yes (unless '' is delimiter) @@ -1011,6 +1012,44 @@ Options are: See L<perlre> for additional information on valid syntax for STRING, and for a detailed look at the semantics of regular expressions. +=item qw/STRING/ + +Evaluates to a list of the words extracted out of STRING, using embedded +whitespace as the word delimiters. It can be understood as being roughly +equivalent to: + + split(' ', q/STRING/); + +the difference being that it generates a real list at compile time. So +this expression: + + qw(foo bar baz) + +is semantically equivalent to the list: + + 'foo', 'bar', 'baz' + +Some frequently seen examples: + + use POSIX qw( setlocale localeconv ) + @EXPORT = qw( foo bar baz ); + +A common mistake is to try to separate the words with comma or to +put comments into a multi-line C<qw>-string. For this reason, the +C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) +produces warnings if the STRING contains the "," or the "#" character. + +=item qu/STRING/ + +Like L<qq> but generates Unicode for characters whose code points are +greater than 128, or 0x80. Such characters can be generated using +the \xHH (for characters 0x80...0xff, or 128..255) and \x{HHH...} +notations (for characters 0x100..., or greater than 256). + +(In qq/STRING/, or "", both the \xHH and the \x{HHH...} generate +bytes for the 0x80..0xff range (these bytes are host-dependent), +and the \x{HHH...} can be used to generate Unicode.) + =item qx/STRING/ =item `STRING` @@ -1092,33 +1131,6 @@ Just understand what you're getting yourself into. See L<"I/O Operators"> for more discussion. -=item qw/STRING/ - -Evaluates to a list of the words extracted out of STRING, using embedded -whitespace as the word delimiters. It can be understood as being roughly -equivalent to: - - split(' ', q/STRING/); - -the difference being that it generates a real list at compile time. So -this expression: - - qw(foo bar baz) - -is semantically equivalent to the list: - - 'foo', 'bar', 'baz' - -Some frequently seen examples: - - use POSIX qw( setlocale localeconv ) - @EXPORT = qw( foo bar baz ); - -A common mistake is to try to separate the words with comma or to -put comments into a multi-line C<qw>-string. For this reason, the -C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) -produces warnings if the STRING contains the "," or the "#" character. - =item s/PATTERN/REPLACEMENT/egimosx Searches a string for a pattern, and if found, replaces that pattern diff --git a/pod/perlre.pod b/pod/perlre.pod index c5ecb13c40..0c38ac7cba 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -179,6 +179,7 @@ In addition, Perl defines the following: \X Match eXtended Unicode "combining character sequence", equivalent to C<(?:\PM\pM*)> \C Match a single C char (octet) even under utf8. + (Currently this does not work correctly.) A C<\w> matches a single alphanumeric character or C<_>, not a whole word. Use C<\w+> to match a string of Perl-identifier characters (which isn't diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 30a4482260..b8bbc5707c 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -16,7 +16,8 @@ The following areas need further work. There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of -focus in the near future. +focus in the near future. Unfortunately it is unlikely that the Perl +5.6 and earlier will ever gain this capability. =item Regular Expressions @@ -66,7 +67,8 @@ or from literals and constants in the source text. If the C<-C> command line switch is used, (or the ${^WIDE_SYSTEM_CALLS} global flag is set to C<1>), all system calls will use the corresponding wide character APIs. This is currently only implemented -on Windows. +on Windows as other platforms do not have a unified way of handling +wide character APIs. Regardless of the above, the C<bytes> pragma can always be used to force byte semantics in a particular lexical scope. See L<bytes>. @@ -127,8 +129,7 @@ attempt to canonicalize variable names for you.) Regular expressions match characters instead of bytes. For instance, "." matches a character instead of a byte. (However, the C<\C> pattern -is provided to force a match a single byte ("C<char>" in C, hence -C<\C>).) +is available to force a match a single byte ("C<char>" in C, hence C<\C>).) =item * @@ -216,7 +217,10 @@ And finally, C<scalar reverse()> reverses by character rather than by byte. =head2 Character encodings for input and output -[XXX: This feature is not yet implemented.] +This feature is in the process of getting implemented. + +(For Perl 5.6 and earlier the support is unlikely to get integrated +to the core language and some external module will be required.) =head1 CAVEATS diff --git a/t/op/qu.t b/t/op/qu.t new file mode 100644 index 0000000000..280020445c --- /dev/null +++ b/t/op/qu.t @@ -0,0 +1,24 @@ +print "1..6\n"; + +my $foo = "foo"; + +print "not " unless qu(abc$foo) eq "abcfoo"; +print "ok 1\n"; + +# qu is always Unicode, even in EBCDIC, so \x41 is 'A' and \x{61} is 'a'. + +print "not " unless qu(abc\x41) eq "abcA"; +print "ok 2\n"; + +print "not " unless qu(abc\x{61}$foo) eq "abcafoo"; +print "ok 3\n"; + +print "not " unless qu(\x{41}\x{100}\x61\x{200}) eq "A\x{100}a\x{200}"; +print "ok 4\n"; + +print "not " unless join(" ", unpack("C*", qu(\x80))) eq "194 128"; +print "ok 5\n"; + +print "not " unless join(" ", unpack("C*", qu(\x{100}))) eq "196 128"; +print "ok 6\n"; + |