diff options
author | Gurusamy Sarathy <gsar@cpan.org> | 1998-11-28 17:21:07 +0000 |
---|---|---|
committer | Gurusamy Sarathy <gsar@cpan.org> | 1998-11-28 17:21:07 +0000 |
commit | a1e2a3203e4b30744c9b7c687f0438326033e3c3 (patch) | |
tree | 943da214521e0f861963d9bef658eaf764f639ff /pod/perlreftut.pod | |
parent | a1ea39dc8940632216c22b20b6b3596817204581 (diff) | |
download | perl-a1e2a3203e4b30744c9b7c687f0438326033e3c3.tar.gz |
add perlreftut.pod
p4raw-id: //depot/perl@2357
Diffstat (limited to 'pod/perlreftut.pod')
-rw-r--r-- | pod/perlreftut.pod | 397 |
1 files changed, 397 insertions, 0 deletions
diff --git a/pod/perlreftut.pod b/pod/perlreftut.pod new file mode 100644 index 0000000000..2fac79df00 --- /dev/null +++ b/pod/perlreftut.pod @@ -0,0 +1,397 @@ + +=head1 NAME + +perlreftut - Mark's very short tutorial about references + +=head1 DESCRIPTION + +One of the most important new features in Perl 5 was the capability to +manage complicated data structures like multidimensional arrays and +nested hashes. To enable these, Perl 5 introduced a feature called +`references', and using references is the key to managing complicated, +structured data in Perl. Unfortunately, there's a lot of funny syntax +to learn, and the main manual page can be hard to follow. The manual +is quite complete, and sometimes people find that a problem, because it +can be hard to tell what is important and what isn't. + +Fortunately, you only need to know 10% of what's in the main page to get +90% of the benefit. This page will show you that 10%. + +=head1 Who Needs Complicated Data Structures? + +One problem that came up all the time in Perl 4 was how to represent a +hash whose values were lists. Perl 4 had hashes, of course, but the +values had to be scalars; they couldn't be lists. + +Why would you want a hash of lists? Let's take a simple example: You +have a file of city and state names, like this: + + Chicago, Illinois + New York, New York + Albany, New York + Springfield, Illinois + Trenton, New Jersey + Evanston, Illinois + +and you want to produce an output like this, with each state mentioned +once, and then an alphabetical list of the cities in that state: + + Illinois: Chicago, Evanston, Springfield. + New Jersey: Trenton. + New York: Albany, New York. + +The natural way to do this is to have a hash whose keys are state +names. Associated with each state name key is a list of the cities in +that state. Each time you read a line of input, split it into a state +and a city, look up the list of cities already known to be in that +state, and append the new city to the list. When you're done reading +the input, iterate over the hash as usual, sorting each list of cities +before you print it out. + +If hash values can't be lists, you lose. In Perl 4, hash values can't +be lists; they can only be strings. You lose. You'd probably have to +combine all the cities into a single string somehow, and then when +time came to write the output, you'd have to break the string into a +list, sort the list, and turn it back into a string. This is messy +and error-prone. And it's frustrating, because Perl already has +perfectly good lists that would solve the problem if only you could +use them. + +=head1 The Solution + +Unfortunately, by the time Perl 5 rolled around, we were already stuck +with this design: Hash values must be scalars. The solution to this is +references. + +A reference is a scalar value that I<refers to> an entire array or an +entire hash (or to just about anything else.) Names are one kind of +reference that you're already familiar with. Think of the President: +a messy, inconvenient bag of blood and bones. But to talk about him, +or to represent him in a computer program, all you need is the easy, +convenient scalar string "Bill Clinton". + +References in Perl are like names for arrays and hashes. They're +Perl's private, internal names, so you can be sure they're +unambiguous. Unlike "Bill Clinton", a reference only refers to one +thing, and you always know what it refers to. If you have a reference +to an array, you can recover the entire array from it. If you have a +reference to a hash, you can recover the entire hash. But the +reference is still an easy, compact scalar value. + +You can't have a hash whose values are arrays; hash values can only be +scalars. We're stuck with that. But a single reference can refer to +an entire array, and references are scalars, so you can have a hash of +references to arrays, and it'll act a lot like a hash of arrays, and +it'll be just as useful as a hash of arrays. + +We'll come back to this city-state problem later, after we've seen +some syntax for managing references. + + +=head1 Syntax + +There are just two ways to make a reference, and just two ways to use +it once you have it. + +=head2 Making References + +B<Make Rule 1> + +If you put a C<\> in front of a variable, you get a +reference to that variable. + + $aref = \@array; # $aref now holds a reference to @array + $href = \%hash; # $href now holds a reference to %hash + +Once the reference is stored in a variable like $aref or $href, you +can copy it or store it just the same as any other scalar value: + + $xy = $aref; # $xy now holds a reference to @array + $p[3] = $href; # $p[3] now holds a reference to %hash + $z = $p[3]; # $z now holds a reference to %hash + + +These examples show how to make references to variables with names. +Sometimes you want to make an array or a hash that doesn't have a +name. This is analogous to the way you like to be able to use the +string C<"\n"> or the number 80 without having to store it in a named +variable first. + +B<Make Rule 2> + +C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to +that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a +reference to that hash. + + $aref = [ 1, "foo", undef, 13 ]; + # $aref now holds a reference to an array + + $href = { APR => 4, AUG => 8 }; + # $href now holds a reference to a hash + + +The references you get from rule 2 are the same kind of +references that you get from rule 1: + + # This: + $aref = [ 1, 2, 3 ]; + + # Does the same as this: + @array = (1, 2, 3); + $aref = \@array; + + +The first line is an abbreviation for the following two lines, except +that it doesn't create the superfluous array variable C<@array>. + + +=head2 Using References + +What can you do with a reference once you have it? It's a scalar +value, and we've seen that you can store it as a scalar and get it back +again just like any scalar. There are just two more ways to use it: + +B<Use Rule 1> + +If C<$aref> contains a reference to an array, then you +can put C<{$aref}> anywhere you would normally put the name of an +array. For example, C<@{$aref}> instead of C<@array>. + +Here are some examples of that: + +Arrays: + + + @a @{$aref} An array + reverse @a reverse @{$aref} Reverse the array + $a[3] ${$aref}[3] An element of the array + $a[3] = 17; ${$aref}[3] = 17 Assigning an element + + +On each line are two expressions that do the same thing. The +left-hand versions operate on the array C<@a>, and the right-hand +versions operate on the array that is referred to by C<$aref>, but +once they find the array they're operating on, they do the same things +to the arrays. + +Using a hash reference is I<exactly> the same: + + %h %{$href} A hash + keys %h keys %{$href} Get the keys from the hash + $h{'red'} ${$href}{'red'} An element of the hash + $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element + + +B<Use Rule 2> + +C<${$aref}[3]> is too hard to read, so you can write C<$aref-E<gt>[3]> +instead. + +C<${$href}{red}> is too hard to read, so you can write +C<$href-E<gt>{red}> instead. + +Most often, when you have an array or a hash, you want to get or set a +single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have +too much punctuation, and Perl lets you abbreviate. + +If C<$aref> holds a reference to an array, then C<$aref-E<gt>[3]> is +the fourth element of the array. Don't confuse this with C<$aref[3]>, +which is the fourth element of a totally different array, one +deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the +same way that C<$item> and C<@item> are. + +Similarly, C<$href-E<gt>{'red'}> is part of the hash referred to by +the scalar variable C<$href>, perhaps even one with no name. +C<$href{'red'}> is part of the deceptively named C<%href> hash. It's +easy to forget to leave out the C<-E<gt>>, and if you do, you'll get +bizarre results when your program gets array and hash elements out of +totally unexpected hashes and arrays that weren't the ones you wanted +to use. + + +=head1 An Example + +Let's see a quick example of how all this is useful. + +First, remember that C<[1, 2, 3]> makes an anonymous array containing +C<(1, 2, 3)>, and gives you a reference to that array. + +Now think about + + @a = ( [1, 2, 3], + [4, 5, 6], + [7, 8, 9] + ); + +@a is an array with three elements, and each one is a reference to +another array. + +C<$a[1]> is one of these references. It refers to an array, the array +containing C<(4, 5, 6)>, and because it is a reference to an array, +B<USE RULE 2> says that we can write C<$a[1]-E<gt>[2]> to get the +third element from that array. C<$a[1]-E<gt>[2]> is the 6. +Similarly, C<$a[0]-E<gt>[1]> is the 2. What we have here is like a +two-dimensional array; you can write C<$a[ROW]-E<gt>[COLUMN]> to get +or set the element in any row and any column of the array. + +The notation still looks a little cumbersome, so there's one more +abbreviation: + +=head1 Arrow Rule + +In between two B<subscripts>, the arrow is optional. + +Instead of C<$a[1]-E<gt>[2]>, we can write C<$a[1][2]>; it means the +same thing. Instead of C<$a[0]-E<gt>[1]>, we can write C<$a[0][1]>; +it means the same thing. + +Now it really looks like two-dimensional arrays! + +You can see why the arrows are important. Without them, we would have +had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For +three-dimensional arrays, they let us write C<$x[2][3][5]> instead of +the unreadable C<${${$x[2]}[3]}[5]>. + + +=head1 Solution + +Here's the answer to the problem I posed the the beginning of the +article, of reformatting a file of city and state names. + + 1 while (<>) { + 2 chomp; + 3 my ($city, $state) = split /, /; + 4 push @{$table{$state}}, $city; + 5 } + 6 + 7 foreach $state (sort keys %table) { + 8 print "$state: "; + 9 my @cities = @{$table{$state}}; + 10 print join ', ', sort @cities; + 11 print ".\n"; + 12 } + + +The program has two pieces: Lines 1--5 read the input and build a +data structure, and lines 7--12 analyze the data and print out the +report. + +In the first part, line 4 is the important one. We're going to have a +hash, C<%table>, whose keys are state names, and whose values are +(references to) arrays of city names. After acquiring a city and +state name, the program looks up C<$table{$state}>, which holds (a +reference to) the list of cities seen in that state so far. Line 4 is +totally analogous to + + push @array, $city; + +except that the name C<array> has been replaced by the reference +C<{$table{$state}}>. The C<push> adds a city name to the end of the +referred-to array. + +In the second part, line 9 is the important one. Again, +C<$table{$state}> is (a reference to) the list of cities in the state, so +we can recover the original list, and copy it into the array C<@cities>, +by using C<@{$table{$state}}>. Line 9 is totally analogous to + + @cities = @array; + +except that the name C<array> has been replaced by the reference +C<{$table{$state}}>. The C<@> tells Perl to get the entire array. + +The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>, +C<print>, and doesn't involve references at all. + +There's one fine point I skipped. Suppose the program has just read +the first line in its input that happens to mention the state of Ohio. +Control is at line 4, C<$state> is C<'Ohio'>, and C<$city> is +C<'Cleveland'>. Since this is the first city in Ohio, +C<$table{$state}> is undefined---in fact there isn't an C<'Ohio'> key +in C<%table> at all. What does line 4 do here? + + 4 push @{$table{$state}}, $city; + + +This is Perl, so it does the exact right thing. It sees that you want +to push C<Cleveland> onto an array that doesn't exist, so it helpfully +makes a new, empty, anonymous array for you, installs it in the table, +and then pushes C<Cleveland> onto it. This is called `autovivification'. + + +=head1 The Rest + +I promised to give you 90% of the benefit with 10% of the details, and +that means I left out 90% of the details. Now that you have an +overview of the important parts, it should be easier to read the +L<perlref> manual page, which discusses 100% of the details. + +Some of the highlights of L<perlref>: + +=over 4 + +=item * + +You can make references to anything, including scalars, functions, and +other references. + +=item * + +In B<USE RULE 1>, you can often omit the curly braces. For example, +C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as +C<${$aref}[1]>. If you're jsut starting out, you might want to adopt +the habit of always including the curly braces. + +=item * + +To see if a variable contains a reference, use the `ref' function. +It returns true if its argument is a reference. Actually it's a +little better than that: It returns HASH for hash references and +ARRAYfor array references. + +=item * + +If you try to use a reference like a string, you get strings like + + ARRAY(0x80f5dec) or HASH(0x826afc0) + +If you ever see a string that looks like this, you'll know you +printed out a reference by mistake. + +A side effect of this representation is that you can use C<eq> to see +if two references refer to the same thing. (But you should usually use +C<==> instead because it's much faster.) + +=item * + +You can use a string as if it were a reference. If you use the string +C<"foo"> as an array reference, it's taken to be a reference to the +array C<@foo>. This is called a I<soft reference> or I<symbolic reference>. + +=back + +You might prefer to go on to L<perllol> instead of L<perlref>; it +discusses lists of lists and multidimensional arrays in detail. After +that, you should move on to L<perldsc>; it's a Data Structure Cookbook +that shows recipes for using and printing out arrays of hashes, hashes +of arrays, and other kinds of data. + +=head1 Summary + +Everyone needs compound data structures, and in Perl the way you get +them is with references. There are four important rules for managing +references: Two for making references and two for using them. Once +you know these rules you can do most of the important things you need +to do with references. + +=head1 Credits + +Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>) + +This article originally appeared in I<The Perl Journal> volume 3, #2. +Reprinted with permission. + +The original title was I<Understand References Today>. + + +=cut + |