summaryrefslogtreecommitdiff
path: root/pod/perldata.pod
diff options
context:
space:
mode:
Diffstat (limited to 'pod/perldata.pod')
-rw-r--r--pod/perldata.pod408
1 files changed, 408 insertions, 0 deletions
diff --git a/pod/perldata.pod b/pod/perldata.pod
new file mode 100644
index 0000000000..6b4f7a4053
--- /dev/null
+++ b/pod/perldata.pod
@@ -0,0 +1,408 @@
+=head1 NAME
+
+perldata - Perl data structures
+
+=head1 DESCRIPTION
+
+=head2 Variable names
+
+Perl has three data structures: scalars, arrays of scalars, and
+associative arrays of scalars, known as "hashes". Normal arrays are
+indexed by number, starting with 0. (Negative subscripts count from
+the end.) Hash arrays are indexed by string.
+
+Scalar values are always named with '$', even when referring to a scalar
+that is part of an array. It works like the English word "the". Thus
+we have:
+
+ $days # the simple scalar value "days"
+ $days[28] # the 29th element of array @days
+ $days{'Feb'} # the 'Feb' value from hash %days
+ $#days # the last index of array @days
+
+but entire arrays or array slices are denoted by '@', which works much like
+the word "these" or "those":
+
+ @days # ($days[0], $days[1],... $days[n])
+ @days[3,4,5] # same as @days[3..5]
+ @days{'a','c'} # same as ($days{'a'},$days{'c'})
+
+and entire hashes are denoted by '%':
+
+ %days # (key1, val1, key2, val2 ...)
+
+In addition, subroutines are named with an initial '&', though this is
+optional when it's otherwise unambiguous (just as "do" is often
+redundant in English). Symbol table entries can be named with an
+initial '*', but you don't really care about that yet.
+
+Every variable type has its own namespace. You can, without fear of
+conflict, use the same name for a scalar variable, an array, or a hash
+(or, for that matter, a filehandle, a subroutine name, or a label).
+This means that $foo and @foo are two different variables. It also
+means that $foo[1] is a part of @foo, not a part of $foo. This may
+seem a bit weird, but that's okay, because it is weird.
+
+Since variable and array references always start with '$', '@', or '%',
+the "reserved" words aren't in fact reserved with respect to variable
+names. (They ARE reserved with respect to labels and filehandles,
+however, which don't have an initial special character. You can't have
+a filehandle named "log", for instance. Hint: you could say
+C<open(LOG,'logfile')> rather than C<open(log,'logfile')>. Using uppercase
+filehandles also improves readability and protects you from conflict
+with future reserved words.) Case I<IS> significant--"FOO", "Foo" and
+"foo" are all different names. Names that start with a letter or
+underscore may also contain digits and underscores.
+
+It is possible to replace such an alphanumeric name with an expression
+that returns a reference to an object of that type. For a description
+of this, see L<perlref>.
+
+Names that start with a digit may only contain more digits. Names
+which do not start with a letter, underscore, or digit are limited to
+one character, e.g. "$%" or "$$". (Most of these one character names
+have a predefined significance to Perl. For instance, $$ is the
+current process id.)
+
+=head2 Context
+
+The interpretation of operations and values in Perl sometimes depends
+on the requirements of the context around the operation or value.
+There are two major contexts: scalar and list. Certain operations
+return list values in contexts wanting a list, and scalar values
+otherwise. (If this is true of an operation it will be mentioned in
+the documentation for that operation.) In other words, Perl overloads
+certain operations based on whether the expected return value is
+singular or plural. (Some words in English work this way, like "fish"
+and "sheep".)
+
+In a reciprocal fashion, an operation provides either a scalar or a
+list context to each of its arguments. For example, if you say
+
+ int( <STDIN> )
+
+the integer operation provides a scalar context for the <STDIN>
+operator, which responds by reading one line from STDIN and passing it
+back to the integer operation, which will then find the integer value
+of that line and return that. If, on the other hand, you say
+
+ sort( <STDIN> )
+
+then the sort operation provides a list context for <STDIN>, which
+will proceed to read every line available up to the end of file, and
+pass that list of lines back to the sort routine, which will then
+sort those lines and return them as a list to whatever the context
+of the sort was.
+
+Assignment is a little bit special in that it uses its left argument to
+determine the context for the right argument. Assignment to a scalar
+evaluates the righthand side in a scalar context, while assignment to
+an array or array slice evaluates the righthand side in a list
+context. Assignment to a list also evaluates the righthand side in a
+list context.
+
+User defined subroutines may choose to care whether they are being
+called in a scalar or list context, but most subroutines do not
+need to care, because scalars are automatically interpolated into
+lists. See L<perlfunc/wantarray>.
+
+=head2 Scalar values
+
+Scalar variables may contain various kinds of singular data, such as
+numbers, strings and references. In general, conversion from one form
+to another is transparent. (A scalar may not contain multiple values,
+but may contain a reference to an array or hash containing multiple
+values.) Because of the automatic conversion of scalars, operations and
+functions that return scalars don't need to care (and, in fact, can't
+care) whether the context is looking for a string or a number.
+
+A scalar value is interpreted as TRUE in the Boolean sense if it is not
+the null string or the number 0 (or its string equivalent, "0"). The
+Boolean context is just a special kind of scalar context.
+
+There are actually two varieties of null scalars: defined and
+undefined. Undefined null scalars are returned when there is no real
+value for something, such as when there was an error, or at end of
+file, or when you refer to an uninitialized variable or element of an
+array. An undefined null scalar may become defined the first time you
+use it as if it were defined, but prior to that you can use the
+defined() operator to determine whether the value is defined or not.
+
+The length of an array is a scalar value. You may find the length of
+array @days by evaluating C<$#days>, as in B<csh>. (Actually, it's not
+the length of the array, it's the subscript of the last element, since
+there is (ordinarily) a 0th element.) Assigning to C<$#days> changes the
+length of the array. Shortening an array by this method destroys
+intervening values. Lengthening an array that was previously shortened
+I<NO LONGER> recovers the values that were in those elements. (It used to
+in Perl 4, but we had to break this make to make sure destructors were
+called when expected.) You can also gain some measure of efficiency by
+preextending an array that is going to get big. (You can also extend
+an array by assigning to an element that is off the end of the array.)
+You can truncate an array down to nothing by assigning the null list ()
+to it. The following are equivalent:
+
+ @whatever = ();
+ $#whatever = $[ - 1;
+
+If you evaluate a named array in a scalar context, it returns the length of
+the array. (Note that this is not true of lists, which return the
+last value, like the C comma operator.) The following is always true:
+
+ scalar(@whatever) == $#whatever - $[ + 1;
+
+Version 5 of Perl changed the semantics of $[: files that don't set
+the value of $[ no longer need to worry about whether another
+file changed its value. (In other words, use of $[ is deprecated.)
+So in general you can just assume that
+
+ scalar(@whatever) == $#whatever + 1;
+
+If you evaluate a hash in a scalar context, it returns a value which is
+true if and only if the hash contains any key/value pairs. (If there
+are any key/value pairs, the value returned is a string consisting of
+the number of used buckets and the number of allocated buckets, separated
+by a slash. This is pretty much only useful to find out whether Perl's
+(compiled in) hashing algorithm is performing poorly on your data set.
+For example, you stick 10,000 things in a hash, but evaluating %HASH in
+scalar context reveals "1/16", which means only one out of sixteen buckets
+has been touched, and presumably contains all 10,000 of your items. This
+isn't supposed to happen.)
+
+=head2 Scalar value constructors
+
+Numeric literals are specified in any of the customary floating point or
+integer formats:
+
+
+ 12345
+ 12345.67
+ .23E-10
+ 0xffff # hex
+ 0377 # octal
+ 4_294_967_296 # underline for legibility
+
+String literals are delimited by either single or double quotes. They
+work much like shell quotes: double-quoted string literals are subject
+to backslash and variable substitution; single-quoted strings are not
+(except for "C<\'>" and "C<\\>"). The usual Unix backslash rules apply for making
+characters such as newline, tab, etc., as well as some more exotic
+forms. See L<perlop/qq> for a list.
+
+You can also embed newlines directly in your strings, i.e. they can end
+on a different line than they begin. This is nice, but if you forget
+your trailing quote, the error will not be reported until Perl finds
+another line containing the quote character, which may be much further
+on in the script. Variable substitution inside strings is limited to
+scalar variables, arrays, and array slices. (In other words,
+identifiers beginning with $ or @, followed by an optional bracketed
+expression as a subscript.) The following code segment prints out "The
+price is $100."
+
+ $Price = '$100'; # not interpreted
+ print "The price is $Price.\n"; # interpreted
+
+As in some shells, you can put curly brackets around the identifier to
+delimit it from following alphanumerics. Also note that a
+single-quoted string must be separated from a preceding word by a
+space, since single quote is a valid (though discouraged) character in
+an identifier (see L<perlmod/Packages>).
+
+Two special literals are __LINE__ and __FILE__, which represent the
+current line number and filename at that point in your program. They
+may only be used as separate tokens; they will not be interpolated into
+strings. In addition, the token __END__ may be used to indicate the
+logical end of the script before the actual end of file. Any following
+text is ignored, but may be read via the DATA filehandle. (The DATA
+filehandle may read data only from the main script, but not from any
+required file or evaluated string.) The two control characters ^D and
+^Z are synonyms for __END__.
+
+A word that doesn't have any other interpretation in the grammar will
+be treated as if it were a quoted string. These are known as
+"barewords". As with filehandles and labels, a bareword that consists
+entirely of lowercase letters risks conflict with future reserved
+words, and if you use the B<-w> switch, Perl will warn you about any
+such words. Some people may wish to outlaw barewords entirely. If you
+say
+
+ use strict 'subs';
+
+then any bareword that would NOT be interpreted as a subroutine call
+produces a compile-time error instead. The restriction lasts to the
+end of the enclosing block. An inner block may countermand this
+by saying C<no strict 'subs'>.
+
+Array variables are interpolated into double-quoted strings by joining all
+the elements of the array with the delimiter specified in the C<$">
+variable, space by default. The following are equivalent:
+
+ $temp = join($",@ARGV);
+ system "echo $temp";
+
+ system "echo @ARGV";
+
+Within search patterns (which also undergo double-quotish substitution)
+there is a bad ambiguity: Is C</$foo[bar]/> to be interpreted as
+C</${foo}[bar]/> (where C<[bar]> is a character class for the regular
+expression) or as C</${foo[bar]}/> (where C<[bar]> is the subscript to array
+@foo)? If @foo doesn't otherwise exist, then it's obviously a
+character class. If @foo exists, Perl takes a good guess about C<[bar]>,
+and is almost always right. If it does guess wrong, or if you're just
+plain paranoid, you can force the correct interpretation with curly
+brackets as above.
+
+A line-oriented form of quoting is based on the shell "here-doc" syntax.
+Following a C<E<lt>E<lt>> you specify a string to terminate the quoted material,
+and all lines following the current line down to the terminating string
+are the value of the item. The terminating string may be either an
+identifier (a word), or some quoted text. If quoted, the type of
+quotes you use determines the treatment of the text, just as in regular
+quoting. An unquoted identifier works like double quotes. There must
+be no space between the C<E<lt>E<lt>> and the identifier. (If you put a space it
+will be treated as a null identifier, which is valid, and matches the
+first blank line--see the Merry Christmas example below.) The terminating
+string must appear by itself (unquoted and with no surrounding
+whitespace) on the terminating line.
+
+ print <<EOF; # same as above
+ The price is $Price.
+ EOF
+
+ print <<"EOF"; # same as above
+ The price is $Price.
+ EOF
+
+ print << x 10; # Legal but discouraged. Use <<"".
+ Merry Christmas!
+
+ print <<`EOC`; # execute commands
+ echo hi there
+ echo lo there
+ EOC
+
+ print <<"foo", <<"bar"; # you can stack them
+ I said foo.
+ foo
+ I said bar.
+ bar
+
+ myfunc(<<"THIS", 23, <<'THAT'');
+ Here's a line
+ or two.
+ THIS
+ and here another.
+ THAT
+
+Just don't forget that you have to put a semicolon on the end
+to finish the statement, as Perl doesn't know you're not going to
+try to do this:
+
+ print <<ABC
+ 179231
+ ABC
+ + 20;
+
+
+=head2 List value constructors
+
+List values are denoted by separating individual values by commas
+(and enclosing the list in parentheses where precedence requires it):
+
+ (LIST)
+
+In a context not requiring an list value, the value of the list
+literal is the value of the final element, as with the C comma operator.
+For example,
+
+ @foo = ('cc', '-E', $bar);
+
+assigns the entire list value to array foo, but
+
+ $foo = ('cc', '-E', $bar);
+
+assigns the value of variable bar to variable foo. Note that the value
+of an actual array in a scalar context is the length of the array; the
+following assigns to $foo the value 3:
+
+ @foo = ('cc', '-E', $bar);
+ $foo = @foo; # $foo gets 3
+
+You may have an optional comma before the closing parenthesis of an
+list literal, so that you can say:
+
+ @foo = (
+ 1,
+ 2,
+ 3,
+ );
+
+LISTs do automatic interpolation of sublists. That is, when a LIST is
+evaluated, each element of the list is evaluated in a list context, and
+the resulting list value is interpolated into LIST just as if each
+individual element were a member of LIST. Thus arrays lose their
+identity in a LIST--the list
+
+ (@foo,@bar,&SomeSub)
+
+contains all the elements of @foo followed by all the elements of @bar,
+followed by all the elements returned by the subroutine named SomeSub.
+To make a list reference that does I<NOT> interpolate, see L<perlref>.
+
+The null list is represented by (). Interpolating it in a list
+has no effect. Thus ((),(),()) is equivalent to (). Similarly,
+interpolating an array with no elements is the same as if no
+array had been interpolated at that point.
+
+A list value may also be subscripted like a normal array. You must
+put the list in parentheses to avoid ambiguity. Examples:
+
+ # Stat returns list value.
+ $time = (stat($file))[8];
+
+ # Find a hex digit.
+ $hexdigit = ('a','b','c','d','e','f')[$digit-10];
+
+ # A "reverse comma operator".
+ return (pop(@foo),pop(@foo))[0];
+
+Lists may be assigned to if and only if each element of the list
+is legal to assign to:
+
+ ($a, $b, $c) = (1, 2, 3);
+
+ ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
+
+The final element may be an array or a hash:
+
+ ($a, $b, @rest) = split;
+ local($a, $b, %rest) = @_;
+
+You can actually put an array anywhere in the list, but the first array
+in the list will soak up all the values, and anything after it will get
+a null value. This may be useful in a local() or my().
+
+A hash literal contains pairs of values to be interpreted
+as a key and a value:
+
+ # same as map assignment above
+ %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
+
+It is often more readable to use the C<=E<gt>> operator between key/value pairs
+(the C<=E<gt>> operator is actually nothing more than a more visually
+distinctive synonym for a comma):
+
+ %map = (
+ 'red' => 0x00f,
+ 'blue' => 0x0f0,
+ 'green' => 0xf00,
+ );
+
+Array assignment in a scalar context returns the number of elements
+produced by the expression on the right side of the assignment:
+
+ $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
+
+This is very handy when you want to do a list assignment in a Boolean
+context, since most list functions return a null list when finished,
+which when assigned produces a 0, which is interpreted as FALSE.