diff options
author | Karl Williamson <khw@cpan.org> | 2016-12-26 10:36:39 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-12-26 10:54:23 -0700 |
commit | 28285ecfd802f93d99836921dc9e2830a4636554 (patch) | |
tree | 4b804ef20a0078cf76ee6caf2c2d9e22acdf17fe /pod/perlretut.pod | |
parent | d720149d59afad1fa0ae15d5f092fdc47bd1a4f7 (diff) | |
download | perl-28285ecfd802f93d99836921dc9e2830a4636554.tar.gz |
perlretut: Add some introductory remarks
In re-reading this, I realized that it assumed some basic knowledge, so
this commit adds text to explain what formerly was assumed.
Diffstat (limited to 'pod/perlretut.pod')
-rw-r--r-- | pod/perlretut.pod | 39 |
1 files changed, 30 insertions, 9 deletions
diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 734ca5cfb3..d74276c91d 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -17,21 +17,42 @@ expressions display an efficiency and flexibility unknown in most other computer languages. Mastering even the basics of regular expressions will allow you to manipulate text with surprising ease. -What is a regular expression? A regular expression is simply a string -that describes a pattern. Patterns are in common use these days; +What is a regular expression? At its most basic, a regular expression +is a template that is used to determine if a string has certain +characteristics. The string is most often some text, such as a line, +sentence, web page, or even a whole book, but less commonly it could be +some binary data as well. +Suppose we want to determine if the text in variable, C<$var> contains +the sequence of characters C<m> C<u> C<s> C<h> C<r> C<o> C<o> C<m> +(blanks added for legibility). We can write in Perl + + $var =~ m/mushroom/ + +The value of this expression will be TRUE if C<$var> contains that +sequence of characters, and FALSE otherwise. The portion enclosed in +C<"E<sol>"> characters denotes the characteristic we are looking for. +We use the term I<pattern> for it. The process of looking to see if the +pattern occurs in the string is called I<matching>, and the C<"=~"> +operator along with the C<"m//"> tell Perl to try to match the pattern +against the string. Note that the pattern is also a string, but a very +special kind of one, as we will see. Patterns are in common use these +days; examples are the patterns typed into a search engine to find web pages and the patterns used to list files in a directory, e.g., C<ls *.txt> or C<dir *.*>. In Perl, the patterns described by regular expressions -are used to search strings, extract desired parts of strings, and to -do search and replace operations. +are used not only to search strings, but to also extract desired parts +of strings, and to do search and replace operations. Regular expressions have the undeserved reputation of being abstract -and difficult to understand. Regular expressions are constructed using +and difficult to understand. This really stems simply because the +notation used to express them tends to be terse and dense, and not +because of inherent complexity. We recommend using the C<"/x"> regular +expression modifier (described below) along with plenty of white space +to make them less dense, and easier to read. Regular expressions are +constructed using simple concepts like conditionals and loops and are no more difficult to understand than the corresponding C<if> conditionals and C<while> -loops in the Perl language itself. In fact, the main challenge in -learning regular expressions is just getting used to the terse -notation used to express these concepts. +loops in the Perl language itself. This tutorial flattens the learning curve by discussing regular expression concepts, along with their notation, one at a time and with @@ -58,7 +79,7 @@ find things that, while legal, may not be what you intended. =head2 Simple word matching The simplest regexp is simply a word, or more generally, a string of -characters. A regexp consisting of a word matches any string that +characters. A regexp consisting of just a word matches any string that contains that word: "Hello World" =~ /World/; # matches |