diff options
Diffstat (limited to 'pod/perlref.pod')
-rw-r--r-- | pod/perlref.pod | 332 |
1 files changed, 332 insertions, 0 deletions
diff --git a/pod/perlref.pod b/pod/perlref.pod new file mode 100644 index 0000000000..0ad25dfe66 --- /dev/null +++ b/pod/perlref.pod @@ -0,0 +1,332 @@ +=head1 NAME + +perlref - Perl references and nested data structures + +=head1 DESCRIPTION + +In Perl 4 it was difficult to represent complex data structures, because +all references had to be symbolic, and even that was difficult to do when +you wanted to refer to a variable rather than a symbol table entry. Perl +5 not only makes it easier to use symbolic references to variables, but +lets you have "hard" references to any piece of data. Any scalar may hold +a hard reference. Since arrays and hashes contain scalars, you can now +easily build arrays of arrays, arrays of hashes, hashes of arrays, arrays +of hashes of functions, and so on. + +Hard references are smart--they keep track of reference counts for you, +automatically freeing the thing referred to when its reference count +goes to zero. If that thing happens to be an object, the object is +destructed. See L<perlobj> for more about objects. (In a sense, +everything in Perl is an object, but we usually reserve the word for +references to objects that have been officially "blessed" into a class package.) + +A symbolic reference contains the name of a variable, just as a +symbolic link in the filesystem merely contains the name of a file. +The C<*glob> notation is a kind of symbolic reference. Hard references +are more like hard links in the file system: merely another way +at getting at the same underlying object, irrespective of its name. + +"Hard" references are easy to use in Perl. There is just one +overriding principle: Perl does no implicit referencing or +dereferencing. When a scalar is holding a reference, it always behaves +as a scalar. It doesn't magically start being an array or a hash +unless you tell it so explicitly by dereferencing it. + +References can be constructed several ways. + +=over 4 + +=item 1. + +By using the backslash operator on a variable, subroutine, or value. +(This works much like the & (address-of) operator works in C.) Note +that this typically creates I<ANOTHER> reference to a variable, since +there's already a reference to the variable in the symbol table. But +the symbol table reference might go away, and you'll still have the +reference that the backslash returned. Here are some examples: + + $scalarref = \$foo; + $arrayref = \@ARGV; + $hashref = \%ENV; + $coderef = \&handler; + +=item 2. + +A reference to an anonymous array can be constructed using square +brackets: + + $arrayref = [1, 2, ['a', 'b', 'c']]; + +Here we've constructed a reference to an anonymous array of three elements +whose final element is itself reference to another anonymous array of three +elements. (The multidimensional syntax described later can be used to +access this. For example, after the above, $arrayref->[2][1] would have +the value "b".) + +=item 3. + +A reference to an anonymous hash can be constructed using curly +brackets: + + $hashref = { + 'Adam' => 'Eve', + 'Clyde' => 'Bonnie', + }; + +Anonymous hash and array constructors can be intermixed freely to +produce as complicated a structure as you want. The multidimensional +syntax described below works for these too. The values above are +literals, but variables and expressions would work just as well, because +assignment operators in Perl (even within local() or my()) are executable +statements, not compile-time declarations. + +Because curly brackets (braces) are used for several other things +including BLOCKs, you may occasionally have to disambiguate braces at the +beginning of a statement by putting a C<+> or a C<return> in front so +that Perl realizes the opening brace isn't starting a BLOCK. The economy and +mnemonic value of using curlies is deemed worth this occasional extra +hassle. + +For example, if you wanted a function to make a new hash and return a +reference to it, you have these options: + + sub hashem { { @_ } } # silently wrong + sub hashem { +{ @_ } } # ok + sub hashem { return { @_ } } # ok + +=item 4. + +A reference to an anonymous subroutine can be constructed by using +C<sub> without a subname: + + $coderef = sub { print "Boink!\n" }; + +Note the presence of the semicolon. Except for the fact that the code +inside isn't executed immediately, a C<sub {}> is not so much a +declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no +matter how many times you execute that line (unless you're in an +C<eval("...")>), C<$coderef> will still have a reference to the I<SAME> +anonymous subroutine.) + +For those who worry about these things, the current implementation +uses shallow binding of local() variables; my() variables are not +accessible. This precludes true closures. However, you can work +around this with a run-time (rather than a compile-time) eval(): + + { + my $x = time; + $coderef = eval "sub { \$x }"; + } + +Normally--if you'd used just C<sub{}> or even C<eval{}>--your unew sub +would only have been able to access the global $x. But because you've +used a run-time eval(), this will not only generate a brand new subroutine +reference each time called, it will all grant access to the my() variable +lexically above it rather than the global one. The particular $x +accessed will be different for each new sub you create. This mechanism +yields deep binding of variables. (If you don't know what closures, deep +binding, or shallow binding are, don't worry too much about it.) + +=item 5. + +References are often returned by special subroutines called constructors. +Perl objects are just reference a special kind of object that happens to know +which package it's associated with. Constructors are just special +subroutines that know how to create that association. They do so by +starting with an ordinary reference, and it remains an ordinary reference +even while it's also being an object. Constructors are customarily +named new(), but don't have to be: + + $objref = new Doggie (Tail => 'short', Ears => 'long'); + +=item 6. + +References of the appropriate type can spring into existence if you +dereference them in a context that assumes they exist. Since we haven't +talked about dereferencing yet, we can't show you any examples yet. + +=back + +That's it for creating references. By now you're probably dying to +know how to use references to get back to your long-lost data. There +are several basic methods. + +=over 4 + +=item 1. + +Anywhere you'd put an identifier as part of a variable or subroutine +name, you can replace the identifier with a simple scalar variable +containing a reference of the correct type: + + $bar = $$scalarref; + push(@$arrayref, $filename); + $$arrayref[0] = "January"; + $$hashref{"KEY"} = "VALUE"; + &$coderef(1,2,3); + +It's important to understand that we are specifically I<NOT> dereferencing +C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the +scalar variable happens I<BEFORE> it does any key lookups. Anything more +complicated than a simple scalar variable must use methods 2 or 3 below. +However, a "simple scalar" includes an identifier that itself uses method +1 recursively. Therefore, the following prints "howdy". + + $refrefref = \\\"howdy"; + print $$$$refrefref; + +=item 2. + +Anywhere you'd put an identifier as part of a variable or subroutine +name, you can replace the identifier with a BLOCK returning a reference +of the correct type. In other words, the previous examples could be +written like this: + + $bar = ${$scalarref}; + push(@{$arrayref}, $filename); + ${$arrayref}[0] = "January"; + ${$hashref}{"KEY"} = "VALUE"; + &{$coderef}(1,2,3); + +Admittedly, it's a little silly to use the curlies in this case, but +the BLOCK can contain any arbitrary expression, in particular, +subscripted expressions: + + &{ $dispatch{$index} }(1,2,3); # call correct routine + +Because of being able to omit the curlies for the simple case of C<$$x>, +people often make the mistake of viewing the dereferencing symbols as +proper operators, and wonder about their precedence. If they were, +though, you could use parens instead of braces. That's not the case. +Consider the difference below; case 0 is a short-hand version of case 1, +I<NOT> case 2: + + $$hashref{"KEY"} = "VALUE"; # CASE 0 + ${$hashref}{"KEY"} = "VALUE"; # CASE 1 + ${$hashref{"KEY"}} = "VALUE"; # CASE 2 + ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 + +Case 2 is also deceptive in that you're accessing a variable +called %hashref, not dereferencing through $hashref to the hash +it's presumably referencing. That would be case 3. + +=item 3. + +The case of individual array elements arises often enough that it gets +cumbersome to use method 2. As a form of syntactic sugar, the two +lines like that above can be written: + + $arrayref->[0] = "January"; + $hashref->{"KEY} = "VALUE"; + +The left side of the array can be any expression returning a reference, +including a previous dereference. Note that C<$array[$x]> is I<NOT> the +same thing as C<$array-E<gt>[$x]> here: + + $array[$x]->{"foo"}->[0] = "January"; + +This is one of the cases we mentioned earlier in which references could +spring into existence when in an lvalue context. Before this +statement, C<$array[$x]> may have been undefined. If so, it's +automatically defined with a hash reference so that we can look up +C<{"foo"}> in it. Likewise C<$array[$x]-E<gt>{"foo"}> will automatically get +defined with an array reference so that we can look up C<[0]> in it. + +One more thing here. The arrow is optional I<BETWEEN> brackets +subscripts, so you can shrink the above down to + + $array[$x]{"foo"}[0] = "January"; + +Which, in the degenerate case of using only ordinary arrays, gives you +multidimensional arrays just like C's: + + $score[$x][$y][$z] += 42; + +Well, okay, not entirely like C's arrays, actually. C doesn't know how +to grow its arrays on demand. Perl does. + +=item 4. + +If a reference happens to be a reference to an object, then there are +probably methods to access the things referred to, and you should probably +stick to those methods unless you're in the class package that defines the +object's methods. In other words, be nice, and don't violate the object's +encapsulation without a very good reason. Perl does not enforce +encapsulation. We are not totalitarians here. We do expect some basic +civility though. + +=back + +The ref() operator may be used to determine what type of thing the +reference is pointing to. See L<perlfunc>. + +The bless() operator may be used to associate a reference with a package +functioning as an object class. See L<perlobj>. + +A type glob may be dereferenced the same way a reference can, since +the dereference syntax always indicates the kind of reference desired. +So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. + +Here's a trick for interpolating a subroutine call into a string: + + print "My sub returned ${\mysub(1,2,3)}\n"; + +The way it works is that when the C<${...}> is seen in the double-quoted +string, it's evaluated as a block. The block executes the call to +C<mysub(1,2,3)>, and then takes a reference to that. So the whole block +returns a reference to a scalar, which is then dereferenced by C<${...}> +and stuck into the double-quoted string. + +=head2 Symbolic references + +We said that references spring into existence as necessary if they are +undefined, but we didn't say what happens if a value used as a +reference is already defined, but I<ISN'T> a hard reference. If you +use it as a reference in this case, it'll be treated as a symbolic +reference. That is, the value of the scalar is taken to be the I<NAME> +of a variable, rather than a direct link to a (possibly) anonymous +value. + +People frequently expect it to work like this. So it does. + + $name = "foo"; + $$name = 1; # Sets $foo + ${$name} = 2; # Sets $foo + ${$name x 2} = 3; # Sets $foofoo + $name->[0] = 4; # Sets $foo[0] + @$name = (); # Clears @foo + &$name(); # Calls &foo() (as in Perl 4) + $pack = "THAT"; + ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval + +This is very powerful, and slightly dangerous, in that it's possible +to intend (with the utmost sincerity) to use a hard reference, and +accidentally use a symbolic reference instead. To protect against +that, you can say + + use strict 'refs'; + +and then only hard references will be allowed for the rest of the enclosing +block. An inner block may countermand that with + + no strict 'refs'; + +Only package variables are visible to symbolic references. Lexical +variables (declared with my()) aren't in a symbol table, and thus are +invisible to this mechanism. For example: + + local($value) = 10; + $ref = \$value; + { + my $value = 20; + print $$ref; + } + +This will still print 10, not 20. Remember that local() affects package +variables, which are all "global" to the package. + +=head2 Further Reading + +Besides the obvious documents, source code can be instructive. +Some rather pathological examples of the use of references can be found +in the F<t/op/ref.t> regression test in the Perl source directory. |