FAQ sync.

p4raw-id: //depot/perl@12800
author: Jarkko Hietaniemi <jhi@iki.fi> 2001-11-01 14:06:04 +0000
committer: Jarkko Hietaniemi <jhi@iki.fi> 2001-11-01 14:06:04 +0000
commit: e67d034ea5e4ea58629b3083c2d0b12e43640519 (patch)
tree: 8a7fc3223c314d32b1d940b54310c0bc98c5ecd9 /pod/perlfaq9.pod
parent: 6ec9efeca46af8ccad8021f3fbd9ab7f1721da05 (diff)
download: perl-e67d034ea5e4ea58629b3083c2d0b12e43640519.tar.gz
1 files changed, 22 insertions, 16 deletions
diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod
index f7b81d5e5a..62e3069c41 100644
--- a/pod/perlfaq9.pod
+++ b/pod/perlfaq9.pod
@@ -1,6 +1,6 @@
 =head1 NAME
 
-perlfaq9 - Networking ($Revision: 1.3 $, $Date: 2001/10/16 13:27:22 $)
+perlfaq9 - Networking ($Revision: 1.4 $, $Date: 2001/10/31 23:54:56 $)
 
 =head1 DESCRIPTION
 
@@ -143,22 +143,28 @@ on text like this:
 
 =head2 How do I extract URLs?
 
-A quick but imperfect approach is
+You can easily extract all sorts of URLs from HTML with
+C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
+frames, and many other tags that can contain a URL.  If you need 
+anything more complex, you can create your own subclass of 
+C<HTML::LinkExtor> or C<HTML::Parser>.  You might even use 
+C<HTML::SimpleLinkExtor> as an example for something specifically
+suited to your needs.
+
+Less complete solutions involving regular expressions can save 
+you a lot of processing time if you know that the input is simple.  One
+solution from Tom Christiansen runs 100 times faster than most
+module based approaches but only extracts URLs from anchors where the first
+attribute is HREF and there are no other attributes. 
+
+        #!/usr/bin/perl -n00
+        # qxurl - tchrist@perl.com
+        print "$2\n" while m{
+            < \s*
+              A \s+ HREF \s* = \s* (["']) (.*?) \1
+            \s* >
+        }gsix;
 
-    #!/usr/bin/perl -n00
-    # qxurl - tchrist@perl.com
-    print "$2\n" while m{
-	< \s*
-	  A \s+ HREF \s* = \s* (["']) (.*?) \1
-	\s* >
-    }gsix;
-
-This version does not adjust relative URLs, understand alternate
-bases, deal with HTML comments, deal with HREF and NAME attributes
-in the same tag, understand extra qualifiers like TARGET, or accept
-URLs themselves as arguments.  It also runs about 100x faster than a
-more "complete" solution using the LWP suite of modules, such as the
-http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
 
 =head2 How do I download a file from the user's machine?  How do I open a file on another machine?
author	Jarkko Hietaniemi <jhi@iki.fi>	2001-11-01 14:06:04 +0000
committer	Jarkko Hietaniemi <jhi@iki.fi>	2001-11-01 14:06:04 +0000
commit	e67d034ea5e4ea58629b3083c2d0b12e43640519 (patch)
tree	8a7fc3223c314d32b1d940b54310c0bc98c5ecd9 /pod/perlfaq9.pod
parent	6ec9efeca46af8ccad8021f3fbd9ab7f1721da05 (diff)
download	perl-e67d034ea5e4ea58629b3083c2d0b12e43640519.tar.gz