add perlreftut.pod

p4raw-id: //depot/perl@2357
author: Gurusamy Sarathy <gsar@cpan.org> 1998-11-28 17:21:07 +0000
committer: Gurusamy Sarathy <gsar@cpan.org> 1998-11-28 17:21:07 +0000
commit: a1e2a3203e4b30744c9b7c687f0438326033e3c3 (patch)
tree: 943da214521e0f861963d9bef658eaf764f639ff /pod/perlreftut.pod
parent: a1ea39dc8940632216c22b20b6b3596817204581 (diff)
download: perl-a1e2a3203e4b30744c9b7c687f0438326033e3c3.tar.gz
1 files changed, 397 insertions, 0 deletions
diff --git a/pod/perlreftut.pod b/pod/perlreftut.pod
new file mode 100644
index 0000000000..2fac79df00
--- /dev/null
+++ b/pod/perlreftut.pod
@@ -0,0 +1,397 @@
+
+=head1 NAME
+
+perlreftut - Mark's very short tutorial about references
+
+=head1 DESCRIPTION
+
+One of the most important new features in Perl 5 was the capability to
+manage complicated data structures like multidimensional arrays and
+nested hashes.  To enable these, Perl 5 introduced a feature called
+`references', and using references is the key to managing complicated,
+structured data in Perl.  Unfortunately, there's a lot of funny syntax
+to learn, and the main manual page can be hard to follow.  The manual
+is quite complete, and sometimes people find that a problem, because it
+can be hard to tell what is important and what isn't.
+
+Fortunately, you only need to know 10% of what's in the main page to get
+90% of the benefit.  This page will show you that 10%.
+
+=head1 Who Needs Complicated Data Structures?
+
+One problem that came up all the time in Perl 4 was how to represent a
+hash whose values were lists.  Perl 4 had hashes, of course, but the
+values had to be scalars; they couldn't be lists.  
+
+Why would you want a hash of lists?  Let's take a simple example: You
+have a file of city and state names, like this:
+
+	Chicago, Illinois
+	New York, New York
+	Albany, New York
+	Springfield, Illinois
+	Trenton, New Jersey
+	Evanston, Illinois
+
+and you want to produce an output like this, with each state mentioned
+once, and then an alphabetical list of the cities in that state:
+
+	Illinois:  Chicago, Evanston, Springfield.
+	New Jersey: Trenton.
+	New York: Albany, New York.
+
+The natural way to do this is to have a hash whose keys are state
+names.  Associated with each state name key is a list of the cities in
+that state.  Each time you read a line of input, split it into a state
+and a city, look up the list of cities already known to be in that
+state, and append the new city to the list.  When you're done reading
+the input, iterate over the hash as usual, sorting each list of cities
+before you print it out.
+
+If hash values can't be lists, you lose.  In Perl 4, hash values can't
+be lists; they can only be strings.  You lose.  You'd probably have to
+combine all the cities into a single string somehow, and then when
+time came to write the output, you'd have to break the string into a
+list, sort the list, and turn it back into a string.  This is messy
+and error-prone.  And it's frustrating, because Perl already has
+perfectly good lists that would solve the problem if only you could
+use them.
+
+=head1 The Solution
+
+Unfortunately, by the time Perl 5 rolled around, we were already stuck
+with this design: Hash values must be scalars.  The solution to this is
+references.
+
+A reference is a scalar value that I<refers to> an entire array or an
+entire hash (or to just about anything else.)  Names are one kind of
+reference that you're already familiar with.  Think of the President:
+a messy, inconvenient bag of blood and bones.  But to talk about him,
+or to represent him in a computer program, all you need is the easy,
+convenient scalar string "Bill Clinton".
+
+References in Perl are like names for arrays and hashes.  They're
+Perl's private, internal names, so you can be sure they're
+unambiguous.  Unlike "Bill Clinton", a reference only refers to one
+thing, and you always know what it refers to.  If you have a reference
+to an array, you can recover the entire array from it.  If you have a
+reference to a hash, you can recover the entire hash.  But the
+reference is still an easy, compact scalar value.
+
+You can't have a hash whose values are arrays; hash values can only be
+scalars.  We're stuck with that.  But a single reference can refer to
+an entire array, and references are scalars, so you can have a hash of
+references to arrays, and it'll act a lot like a hash of arrays, and
+it'll be just as useful as a hash of arrays.
+
+We'll come back to this city-state problem later, after we've seen
+some syntax for managing references.
+
+
+=head1 Syntax
+
+There are just two ways to make a reference, and just two ways to use
+it once you have it.
+
+=head2 Making References
+
+B<Make Rule 1>
+
+If you put a C<\> in front of a variable, you get a
+reference to that variable.
+
+    $aref = \@array;         # $aref now holds a reference to @array
+    $href = \%hash;          # $href now holds a reference to %hash
+
+Once the reference is stored in a variable like $aref or $href, you
+can copy it or store it just the same as any other scalar value:
+
+    $xy = $aref;             # $xy now holds a reference to @array
+    $p[3] = $href;           # $p[3] now holds a reference to %hash
+    $z = $p[3];              # $z now holds a reference to %hash
+
+
+These examples show how to make references to variables with names.
+Sometimes you want to make an array or a hash that doesn't have a
+name.  This is analogous to the way you like to be able to use the
+string C<"\n"> or the number 80 without having to store it in a named
+variable first.
+
+B<Make Rule 2>
+
+C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
+that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
+reference to that hash.
+
+    $aref = [ 1, "foo", undef, 13 ];  
+    # $aref now holds a reference to an array
+
+    $href = { APR => 4, AUG => 8 };   
+    # $href now holds a reference to a hash
+
+
+The references you get from rule 2 are the same kind of
+references that you get from rule 1:
+
+	# This:
+	$aref = [ 1, 2, 3 ];
+
+	# Does the same as this:
+	@array = (1, 2, 3);
+	$aref = \@array;
+
+
+The first line is an abbreviation for the following two lines, except
+that it doesn't create the superfluous array variable C<@array>.
+
+
+=head2 Using References
+
+What can you do with a reference once you have it?  It's a scalar
+value, and we've seen that you can store it as a scalar and get it back
+again just like any scalar.  There are just two more ways to use it:
+
+B<Use Rule 1>
+
+If C<$aref> contains a reference to an array, then you
+can put C<{$aref}> anywhere you would normally put the name of an
+array.  For example, C<@{$aref}> instead of C<@array>.
+
+Here are some examples of that:
+
+Arrays:
+
+
+	@a		@{$aref}		An array
+	reverse @a	reverse @{$aref}	Reverse the array
+	$a[3]		${$aref}[3]		An element of the array
+	$a[3] = 17;	${$aref}[3] = 17	Assigning an element
+
+
+On each line are two expressions that do the same thing.  The
+left-hand versions operate on the array C<@a>, and the right-hand
+versions operate on the array that is referred to by C<$aref>, but
+once they find the array they're operating on, they do the same things
+to the arrays.
+
+Using a hash reference is I<exactly> the same:
+
+	%h		%{$href}	      A hash
+	keys %h		keys %{$href}	      Get the keys from the hash
+	$h{'red'}	${$href}{'red'}	      An element of the hash
+	$h{'red'} = 17	${$href}{'red'} = 17  Assigning an element
+
+
+B<Use Rule 2>
+
+C<${$aref}[3]> is too hard to read, so you can write C<$aref-E<gt>[3]>
+instead.
+
+C<${$href}{red}> is too hard to read, so you can write
+C<$href-E<gt>{red}> instead.
+
+Most often, when you have an array or a hash, you want to get or set a
+single element from it.  C<${$aref}[3]> and C<${$href}{'red'}> have
+too much punctuation, and Perl lets you abbreviate.
+
+If C<$aref> holds a reference to an array, then C<$aref-E<gt>[3]> is
+the fourth element of the array.  Don't confuse this with C<$aref[3]>,
+which is the fourth element of a totally different array, one
+deceptively named C<@aref>.  C<$aref> and C<@aref> are unrelated the
+same way that C<$item> and C<@item> are.
+
+Similarly, C<$href-E<gt>{'red'}> is part of the hash referred to by
+the scalar variable C<$href>, perhaps even one with no name.
+C<$href{'red'}> is part of the deceptively named C<%href> hash.  It's
+easy to forget to leave out the C<-E<gt>>, and if you do, you'll get
+bizarre results when your program gets array and hash elements out of
+totally unexpected hashes and arrays that weren't the ones you wanted
+to use.
+
+
+=head1 An Example
+
+Let's see a quick example of how all this is useful.
+
+First, remember that C<[1, 2, 3]> makes an anonymous array containing
+C<(1, 2, 3)>, and gives you a reference to that array.
+
+Now think about
+
+	@a = ( [1, 2, 3],
+               [4, 5, 6],
+	       [7, 8, 9]
+             );
+
+@a is an array with three elements, and each one is a reference to
+another array.
+
+C<$a[1]> is one of these references.  It refers to an array, the array
+containing C<(4, 5, 6)>, and because it is a reference to an array,
+B<USE RULE 2> says that we can write C<$a[1]-E<gt>[2]> to get the
+third element from that array.  C<$a[1]-E<gt>[2]> is the 6.
+Similarly, C<$a[0]-E<gt>[1]> is the 2.  What we have here is like a
+two-dimensional array; you can write C<$a[ROW]-E<gt>[COLUMN]> to get
+or set the element in any row and any column of the array.
+
+The notation still looks a little cumbersome, so there's one more
+abbreviation:  
+
+=head1 Arrow Rule
+
+In between two B<subscripts>, the arrow is optional.
+
+Instead of C<$a[1]-E<gt>[2]>, we can write C<$a[1][2]>; it means the
+same thing.  Instead of C<$a[0]-E<gt>[1]>, we can write C<$a[0][1]>;
+it means the same thing.
+
+Now it really looks like two-dimensional arrays!
+
+You can see why the arrows are important.  Without them, we would have
+had to write C<${$a[1]}[2]> instead of C<$a[1][2]>.  For
+three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
+the unreadable C<${${$x[2]}[3]}[5]>.
+
+
+=head1 Solution
+
+Here's the answer to the problem I posed the the beginning of the
+article, of reformatting a file of city and state names.
+
+    1   while (<>) {
+    2     chomp;
+    3     my ($city, $state) = split /, /;
+    4     push @{$table{$state}}, $city;
+    5   }
+    6
+    7   foreach $state (sort keys %table) {
+    8     print "$state: ";
+    9     my @cities = @{$table{$state}};
+   10     print join ', ', sort @cities;
+   11     print ".\n";
+   12	}
+
+
+The program has two pieces:  Lines 1--5 read the input and build a
+data structure, and lines 7--12 analyze the data and print out the
+report.  
+
+In the first part, line 4 is the important one.  We're going to have a
+hash, C<%table>, whose keys are state names, and whose values are
+(references to) arrays of city names.  After acquiring a city and
+state name, the program looks up C<$table{$state}>, which holds (a
+reference to) the list of cities seen in that state so far.  Line 4 is
+totally analogous to
+
+	push @array, $city;
+
+except that the name C<array> has been replaced by the reference
+C<{$table{$state}}>.  The C<push> adds a city name to the end of the
+referred-to array.
+
+In the second part, line 9 is the important one.  Again,
+C<$table{$state}> is (a reference to) the list of cities in the state, so
+we can recover the original list, and copy it into the array C<@cities>,
+by using C<@{$table{$state}}>.  Line 9 is totally analogous to
+
+	@cities = @array;
+
+except that the name C<array> has been replaced by the reference
+C<{$table{$state}}>.  The C<@> tells Perl to get the entire array.
+
+The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>,
+C<print>, and doesn't involve references at all.
+
+There's one fine point I skipped.  Suppose the program has just read
+the first line in its input that happens to mention the state of Ohio.
+Control is at line 4, C<$state> is C<'Ohio'>, and C<$city> is
+C<'Cleveland'>.  Since this is the first city in Ohio,
+C<$table{$state}> is undefined---in fact there isn't an C<'Ohio'> key
+in C<%table> at all.  What does line 4 do here?
+
+ 4	push @{$table{$state}}, $city;
+
+
+This is Perl, so it does the exact right thing.  It sees that you want
+to push C<Cleveland> onto an array that doesn't exist, so it helpfully
+makes a new, empty, anonymous array for you, installs it in the table,
+and then pushes C<Cleveland> onto it.  This is called `autovivification'.
+
+
+=head1 The Rest
+
+I promised to give you 90% of the benefit with 10% of the details, and
+that means I left out 90% of the details.  Now that you have an
+overview of the important parts, it should be easier to read the
+L<perlref> manual page, which discusses 100% of the details.
+
+Some of the highlights of L<perlref>:
+
+=over 4
+
+=item *
+
+You can make references to anything, including scalars, functions, and
+other references.
+
+=item *
+
+In B<USE RULE 1>, you can often omit the curly braces.  For example,
+C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
+C<${$aref}[1]>.  If you're jsut starting out, you might want to adopt
+the habit of always including the curly braces.
+
+=item * 
+
+To see if a variable contains a reference, use the `ref' function.
+It returns true if its argument is a reference.  Actually it's a
+little better than that:  It returns HASH for hash references and
+ARRAYfor array references.
+
+=item * 
+
+If you try to use a reference like a string, you get strings like
+
+	ARRAY(0x80f5dec)   or    HASH(0x826afc0)
+
+If you ever see a string that looks like this, you'll know you
+printed out a reference by mistake.
+
+A side effect of this representation is that you can use C<eq> to see
+if two references refer to the same thing.  (But you should usually use
+C<==> instead because it's much faster.)
+
+=item *
+
+You can use a string as if it were a reference.  If you use the string
+C<"foo"> as an array reference, it's taken to be a reference to the
+array C<@foo>.  This is called a I<soft reference> or I<symbolic reference>.
+
+=back
+
+You might prefer to go on to L<perllol> instead of L<perlref>; it
+discusses lists of lists and multidimensional arrays in detail.  After
+that, you should move on to L<perldsc>; it's a Data Structure Cookbook
+that shows recipes for using and printing out arrays of hashes, hashes
+of arrays, and other kinds of data.
+
+=head1 Summary
+
+Everyone needs compound data structures, and in Perl the way you get
+them is with references.  There are four important rules for managing
+references: Two for making references and two for using them.  Once
+you know these rules you can do most of the important things you need
+to do with references.
+
+=head1 Credits
+
+Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>)
+
+This article originally appeared in I<The Perl Journal> volume 3, #2.
+Reprinted with permission.
+
+The original title was I<Understand References Today>.
+
+
+=cut
+
author	Gurusamy Sarathy <gsar@cpan.org>	1998-11-28 17:21:07 +0000
committer	Gurusamy Sarathy <gsar@cpan.org>	1998-11-28 17:21:07 +0000
commit	a1e2a3203e4b30744c9b7c687f0438326033e3c3 (patch)
tree	943da214521e0f861963d9bef658eaf764f639ff /pod/perlreftut.pod
parent	a1ea39dc8940632216c22b20b6b3596817204581 (diff)
download	perl-a1e2a3203e4b30744c9b7c687f0438326033e3c3.tar.gz