diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-11-01 14:06:04 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-11-01 14:06:04 +0000 |
commit | e67d034ea5e4ea58629b3083c2d0b12e43640519 (patch) | |
tree | 8a7fc3223c314d32b1d940b54310c0bc98c5ecd9 /pod/perlfaq9.pod | |
parent | 6ec9efeca46af8ccad8021f3fbd9ab7f1721da05 (diff) | |
download | perl-e67d034ea5e4ea58629b3083c2d0b12e43640519.tar.gz |
FAQ sync.
p4raw-id: //depot/perl@12800
Diffstat (limited to 'pod/perlfaq9.pod')
-rw-r--r-- | pod/perlfaq9.pod | 38 |
1 files changed, 22 insertions, 16 deletions
diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index f7b81d5e5a..62e3069c41 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq9 - Networking ($Revision: 1.3 $, $Date: 2001/10/16 13:27:22 $) +perlfaq9 - Networking ($Revision: 1.4 $, $Date: 2001/10/31 23:54:56 $) =head1 DESCRIPTION @@ -143,22 +143,28 @@ on text like this: =head2 How do I extract URLs? -A quick but imperfect approach is +You can easily extract all sorts of URLs from HTML with +C<HTML::SimpleLinkExtor> which handles anchors, images, objects, +frames, and many other tags that can contain a URL. If you need +anything more complex, you can create your own subclass of +C<HTML::LinkExtor> or C<HTML::Parser>. You might even use +C<HTML::SimpleLinkExtor> as an example for something specifically +suited to your needs. + +Less complete solutions involving regular expressions can save +you a lot of processing time if you know that the input is simple. One +solution from Tom Christiansen runs 100 times faster than most +module based approaches but only extracts URLs from anchors where the first +attribute is HREF and there are no other attributes. + + #!/usr/bin/perl -n00 + # qxurl - tchrist@perl.com + print "$2\n" while m{ + < \s* + A \s+ HREF \s* = \s* (["']) (.*?) \1 + \s* > + }gsix; - #!/usr/bin/perl -n00 - # qxurl - tchrist@perl.com - print "$2\n" while m{ - < \s* - A \s+ HREF \s* = \s* (["']) (.*?) \1 - \s* > - }gsix; - -This version does not adjust relative URLs, understand alternate -bases, deal with HTML comments, deal with HREF and NAME attributes -in the same tag, understand extra qualifiers like TARGET, or accept -URLs themselves as arguments. It also runs about 100x faster than a -more "complete" solution using the LWP suite of modules, such as the -http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program. =head2 How do I download a file from the user's machine? How do I open a file on another machine? |