summaryrefslogtreecommitdiff
path: root/pod/perlreftut.pod
diff options
context:
space:
mode:
authorGurusamy Sarathy <gsar@cpan.org>1998-11-28 17:21:07 +0000
committerGurusamy Sarathy <gsar@cpan.org>1998-11-28 17:21:07 +0000
commita1e2a3203e4b30744c9b7c687f0438326033e3c3 (patch)
tree943da214521e0f861963d9bef658eaf764f639ff /pod/perlreftut.pod
parenta1ea39dc8940632216c22b20b6b3596817204581 (diff)
downloadperl-a1e2a3203e4b30744c9b7c687f0438326033e3c3.tar.gz
add perlreftut.pod
p4raw-id: //depot/perl@2357
Diffstat (limited to 'pod/perlreftut.pod')
-rw-r--r--pod/perlreftut.pod397
1 files changed, 397 insertions, 0 deletions
diff --git a/pod/perlreftut.pod b/pod/perlreftut.pod
new file mode 100644
index 0000000000..2fac79df00
--- /dev/null
+++ b/pod/perlreftut.pod
@@ -0,0 +1,397 @@
+
+=head1 NAME
+
+perlreftut - Mark's very short tutorial about references
+
+=head1 DESCRIPTION
+
+One of the most important new features in Perl 5 was the capability to
+manage complicated data structures like multidimensional arrays and
+nested hashes. To enable these, Perl 5 introduced a feature called
+`references', and using references is the key to managing complicated,
+structured data in Perl. Unfortunately, there's a lot of funny syntax
+to learn, and the main manual page can be hard to follow. The manual
+is quite complete, and sometimes people find that a problem, because it
+can be hard to tell what is important and what isn't.
+
+Fortunately, you only need to know 10% of what's in the main page to get
+90% of the benefit. This page will show you that 10%.
+
+=head1 Who Needs Complicated Data Structures?
+
+One problem that came up all the time in Perl 4 was how to represent a
+hash whose values were lists. Perl 4 had hashes, of course, but the
+values had to be scalars; they couldn't be lists.
+
+Why would you want a hash of lists? Let's take a simple example: You
+have a file of city and state names, like this:
+
+ Chicago, Illinois
+ New York, New York
+ Albany, New York
+ Springfield, Illinois
+ Trenton, New Jersey
+ Evanston, Illinois
+
+and you want to produce an output like this, with each state mentioned
+once, and then an alphabetical list of the cities in that state:
+
+ Illinois: Chicago, Evanston, Springfield.
+ New Jersey: Trenton.
+ New York: Albany, New York.
+
+The natural way to do this is to have a hash whose keys are state
+names. Associated with each state name key is a list of the cities in
+that state. Each time you read a line of input, split it into a state
+and a city, look up the list of cities already known to be in that
+state, and append the new city to the list. When you're done reading
+the input, iterate over the hash as usual, sorting each list of cities
+before you print it out.
+
+If hash values can't be lists, you lose. In Perl 4, hash values can't
+be lists; they can only be strings. You lose. You'd probably have to
+combine all the cities into a single string somehow, and then when
+time came to write the output, you'd have to break the string into a
+list, sort the list, and turn it back into a string. This is messy
+and error-prone. And it's frustrating, because Perl already has
+perfectly good lists that would solve the problem if only you could
+use them.
+
+=head1 The Solution
+
+Unfortunately, by the time Perl 5 rolled around, we were already stuck
+with this design: Hash values must be scalars. The solution to this is
+references.
+
+A reference is a scalar value that I<refers to> an entire array or an
+entire hash (or to just about anything else.) Names are one kind of
+reference that you're already familiar with. Think of the President:
+a messy, inconvenient bag of blood and bones. But to talk about him,
+or to represent him in a computer program, all you need is the easy,
+convenient scalar string "Bill Clinton".
+
+References in Perl are like names for arrays and hashes. They're
+Perl's private, internal names, so you can be sure they're
+unambiguous. Unlike "Bill Clinton", a reference only refers to one
+thing, and you always know what it refers to. If you have a reference
+to an array, you can recover the entire array from it. If you have a
+reference to a hash, you can recover the entire hash. But the
+reference is still an easy, compact scalar value.
+
+You can't have a hash whose values are arrays; hash values can only be
+scalars. We're stuck with that. But a single reference can refer to
+an entire array, and references are scalars, so you can have a hash of
+references to arrays, and it'll act a lot like a hash of arrays, and
+it'll be just as useful as a hash of arrays.
+
+We'll come back to this city-state problem later, after we've seen
+some syntax for managing references.
+
+
+=head1 Syntax
+
+There are just two ways to make a reference, and just two ways to use
+it once you have it.
+
+=head2 Making References
+
+B<Make Rule 1>
+
+If you put a C<\> in front of a variable, you get a
+reference to that variable.
+
+ $aref = \@array; # $aref now holds a reference to @array
+ $href = \%hash; # $href now holds a reference to %hash
+
+Once the reference is stored in a variable like $aref or $href, you
+can copy it or store it just the same as any other scalar value:
+
+ $xy = $aref; # $xy now holds a reference to @array
+ $p[3] = $href; # $p[3] now holds a reference to %hash
+ $z = $p[3]; # $z now holds a reference to %hash
+
+
+These examples show how to make references to variables with names.
+Sometimes you want to make an array or a hash that doesn't have a
+name. This is analogous to the way you like to be able to use the
+string C<"\n"> or the number 80 without having to store it in a named
+variable first.
+
+B<Make Rule 2>
+
+C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
+that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
+reference to that hash.
+
+ $aref = [ 1, "foo", undef, 13 ];
+ # $aref now holds a reference to an array
+
+ $href = { APR => 4, AUG => 8 };
+ # $href now holds a reference to a hash
+
+
+The references you get from rule 2 are the same kind of
+references that you get from rule 1:
+
+ # This:
+ $aref = [ 1, 2, 3 ];
+
+ # Does the same as this:
+ @array = (1, 2, 3);
+ $aref = \@array;
+
+
+The first line is an abbreviation for the following two lines, except
+that it doesn't create the superfluous array variable C<@array>.
+
+
+=head2 Using References
+
+What can you do with a reference once you have it? It's a scalar
+value, and we've seen that you can store it as a scalar and get it back
+again just like any scalar. There are just two more ways to use it:
+
+B<Use Rule 1>
+
+If C<$aref> contains a reference to an array, then you
+can put C<{$aref}> anywhere you would normally put the name of an
+array. For example, C<@{$aref}> instead of C<@array>.
+
+Here are some examples of that:
+
+Arrays:
+
+
+ @a @{$aref} An array
+ reverse @a reverse @{$aref} Reverse the array
+ $a[3] ${$aref}[3] An element of the array
+ $a[3] = 17; ${$aref}[3] = 17 Assigning an element
+
+
+On each line are two expressions that do the same thing. The
+left-hand versions operate on the array C<@a>, and the right-hand
+versions operate on the array that is referred to by C<$aref>, but
+once they find the array they're operating on, they do the same things
+to the arrays.
+
+Using a hash reference is I<exactly> the same:
+
+ %h %{$href} A hash
+ keys %h keys %{$href} Get the keys from the hash
+ $h{'red'} ${$href}{'red'} An element of the hash
+ $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
+
+
+B<Use Rule 2>
+
+C<${$aref}[3]> is too hard to read, so you can write C<$aref-E<gt>[3]>
+instead.
+
+C<${$href}{red}> is too hard to read, so you can write
+C<$href-E<gt>{red}> instead.
+
+Most often, when you have an array or a hash, you want to get or set a
+single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have
+too much punctuation, and Perl lets you abbreviate.
+
+If C<$aref> holds a reference to an array, then C<$aref-E<gt>[3]> is
+the fourth element of the array. Don't confuse this with C<$aref[3]>,
+which is the fourth element of a totally different array, one
+deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the
+same way that C<$item> and C<@item> are.
+
+Similarly, C<$href-E<gt>{'red'}> is part of the hash referred to by
+the scalar variable C<$href>, perhaps even one with no name.
+C<$href{'red'}> is part of the deceptively named C<%href> hash. It's
+easy to forget to leave out the C<-E<gt>>, and if you do, you'll get
+bizarre results when your program gets array and hash elements out of
+totally unexpected hashes and arrays that weren't the ones you wanted
+to use.
+
+
+=head1 An Example
+
+Let's see a quick example of how all this is useful.
+
+First, remember that C<[1, 2, 3]> makes an anonymous array containing
+C<(1, 2, 3)>, and gives you a reference to that array.
+
+Now think about
+
+ @a = ( [1, 2, 3],
+ [4, 5, 6],
+ [7, 8, 9]
+ );
+
+@a is an array with three elements, and each one is a reference to
+another array.
+
+C<$a[1]> is one of these references. It refers to an array, the array
+containing C<(4, 5, 6)>, and because it is a reference to an array,
+B<USE RULE 2> says that we can write C<$a[1]-E<gt>[2]> to get the
+third element from that array. C<$a[1]-E<gt>[2]> is the 6.
+Similarly, C<$a[0]-E<gt>[1]> is the 2. What we have here is like a
+two-dimensional array; you can write C<$a[ROW]-E<gt>[COLUMN]> to get
+or set the element in any row and any column of the array.
+
+The notation still looks a little cumbersome, so there's one more
+abbreviation:
+
+=head1 Arrow Rule
+
+In between two B<subscripts>, the arrow is optional.
+
+Instead of C<$a[1]-E<gt>[2]>, we can write C<$a[1][2]>; it means the
+same thing. Instead of C<$a[0]-E<gt>[1]>, we can write C<$a[0][1]>;
+it means the same thing.
+
+Now it really looks like two-dimensional arrays!
+
+You can see why the arrows are important. Without them, we would have
+had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For
+three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
+the unreadable C<${${$x[2]}[3]}[5]>.
+
+
+=head1 Solution
+
+Here's the answer to the problem I posed the the beginning of the
+article, of reformatting a file of city and state names.
+
+ 1 while (<>) {
+ 2 chomp;
+ 3 my ($city, $state) = split /, /;
+ 4 push @{$table{$state}}, $city;
+ 5 }
+ 6
+ 7 foreach $state (sort keys %table) {
+ 8 print "$state: ";
+ 9 my @cities = @{$table{$state}};
+ 10 print join ', ', sort @cities;
+ 11 print ".\n";
+ 12 }
+
+
+The program has two pieces: Lines 1--5 read the input and build a
+data structure, and lines 7--12 analyze the data and print out the
+report.
+
+In the first part, line 4 is the important one. We're going to have a
+hash, C<%table>, whose keys are state names, and whose values are
+(references to) arrays of city names. After acquiring a city and
+state name, the program looks up C<$table{$state}>, which holds (a
+reference to) the list of cities seen in that state so far. Line 4 is
+totally analogous to
+
+ push @array, $city;
+
+except that the name C<array> has been replaced by the reference
+C<{$table{$state}}>. The C<push> adds a city name to the end of the
+referred-to array.
+
+In the second part, line 9 is the important one. Again,
+C<$table{$state}> is (a reference to) the list of cities in the state, so
+we can recover the original list, and copy it into the array C<@cities>,
+by using C<@{$table{$state}}>. Line 9 is totally analogous to
+
+ @cities = @array;
+
+except that the name C<array> has been replaced by the reference
+C<{$table{$state}}>. The C<@> tells Perl to get the entire array.
+
+The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>,
+C<print>, and doesn't involve references at all.
+
+There's one fine point I skipped. Suppose the program has just read
+the first line in its input that happens to mention the state of Ohio.
+Control is at line 4, C<$state> is C<'Ohio'>, and C<$city> is
+C<'Cleveland'>. Since this is the first city in Ohio,
+C<$table{$state}> is undefined---in fact there isn't an C<'Ohio'> key
+in C<%table> at all. What does line 4 do here?
+
+ 4 push @{$table{$state}}, $city;
+
+
+This is Perl, so it does the exact right thing. It sees that you want
+to push C<Cleveland> onto an array that doesn't exist, so it helpfully
+makes a new, empty, anonymous array for you, installs it in the table,
+and then pushes C<Cleveland> onto it. This is called `autovivification'.
+
+
+=head1 The Rest
+
+I promised to give you 90% of the benefit with 10% of the details, and
+that means I left out 90% of the details. Now that you have an
+overview of the important parts, it should be easier to read the
+L<perlref> manual page, which discusses 100% of the details.
+
+Some of the highlights of L<perlref>:
+
+=over 4
+
+=item *
+
+You can make references to anything, including scalars, functions, and
+other references.
+
+=item *
+
+In B<USE RULE 1>, you can often omit the curly braces. For example,
+C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
+C<${$aref}[1]>. If you're jsut starting out, you might want to adopt
+the habit of always including the curly braces.
+
+=item *
+
+To see if a variable contains a reference, use the `ref' function.
+It returns true if its argument is a reference. Actually it's a
+little better than that: It returns HASH for hash references and
+ARRAYfor array references.
+
+=item *
+
+If you try to use a reference like a string, you get strings like
+
+ ARRAY(0x80f5dec) or HASH(0x826afc0)
+
+If you ever see a string that looks like this, you'll know you
+printed out a reference by mistake.
+
+A side effect of this representation is that you can use C<eq> to see
+if two references refer to the same thing. (But you should usually use
+C<==> instead because it's much faster.)
+
+=item *
+
+You can use a string as if it were a reference. If you use the string
+C<"foo"> as an array reference, it's taken to be a reference to the
+array C<@foo>. This is called a I<soft reference> or I<symbolic reference>.
+
+=back
+
+You might prefer to go on to L<perllol> instead of L<perlref>; it
+discusses lists of lists and multidimensional arrays in detail. After
+that, you should move on to L<perldsc>; it's a Data Structure Cookbook
+that shows recipes for using and printing out arrays of hashes, hashes
+of arrays, and other kinds of data.
+
+=head1 Summary
+
+Everyone needs compound data structures, and in Perl the way you get
+them is with references. There are four important rules for managing
+references: Two for making references and two for using them. Once
+you know these rules you can do most of the important things you need
+to do with references.
+
+=head1 Credits
+
+Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>)
+
+This article originally appeared in I<The Perl Journal> volume 3, #2.
+Reprinted with permission.
+
+The original title was I<Understand References Today>.
+
+
+=cut
+