summaryrefslogtreecommitdiff
path: root/Changes
diff options
context:
space:
mode:
authorLorry Tar Creator <lorry-tar-importer@lorry>2013-05-08 22:21:52 +0000
committerLorry Tar Creator <lorry-tar-importer@lorry>2013-05-08 22:21:52 +0000
commit2f253cfc85ffd55a8acb988e91f0bc5ab348124c (patch)
tree4734ccd522c71dd455879162006742002f8c1565 /Changes
downloadHTML-Parser-tarball-master.tar.gz
Diffstat (limited to 'Changes')
-rw-r--r--Changes1664
1 files changed, 1664 insertions, 0 deletions
diff --git a/Changes b/Changes
new file mode 100644
index 0000000..933d43c
--- /dev/null
+++ b/Changes
@@ -0,0 +1,1664 @@
+_______________________________________________________________________________
+2013-05-09 Release 3.71
+
+Gisle Aas (1):
+ Transform ':' in headers to '-' [RT#80524]
+
+
+_______________________________________________________________________________
+2013-03-28 Release 3.70
+
+François Perrad (1):
+ Fix for cross-compiling with Buildroot
+
+Gisle Aas (1):
+ Comment typo fix
+
+Yves Orton (1):
+ Fix Issue #3 / RT #84144: HTML::Entities::decode_entities() needs
+ to call SV_CHECK_THINKFIRST() before checking READONLY flag
+
+
+_______________________________________________________________________________
+2011-10-15 Release 3.69
+
+Gisle Aas (4):
+ Documentation fix; encode_utf8 mixup [RT#71151]
+ Make it clearer that there are 2 (actually 3) options for handing "UTF-8 garbage"
+ Github is the official repo
+ Can't be bothered to try to fix the failures that occur on perl-5.6
+
+Barbie (1):
+ fix to TokeParser to correctly handle option configuration
+
+Jon Jensen (1):
+ Aesthetic change: remove extra ;
+
+Ville Skyttä (1):
+ Trim surrounding whitespace from extracted URLs.
+
+
+_______________________________________________________________________________
+2010-09-01 Release 3.68
+
+Gisle Aas (1):
+ Declare the encoding of the POD to be utf8
+
+
+_______________________________________________________________________________
+2010-08-17 Release 3.67
+
+Nicholas Clark (1):
+ bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368]
+
+
+_______________________________________________________________________________
+2010-07-09 Release 3.66
+
+Gisle Aas (1):
+ Fix entity decoding in utf8_mode for the title header
+
+
+_______________________________________________________________________________
+2010-04-04 Release 3.65
+
+Gisle Aas (1):
+ Eliminate buggy entities_decode_old
+
+Salvatore Bonaccorso (1):
+ Fixed endianness typo [RT#50811]
+
+Ville Skyttä (1):
+ Documentation fixes.
+
+
+_______________________________________________________________________________
+2009-10-25 Release 3.64
+
+Gisle Aas (5):
+ Convert files to UTF-8
+ Don't allow decode_entities() to generate illegal Unicode chars
+ Copyright 2009
+ Remove rendundant (repeated) test
+ Make parse_file() method use 3-arg open [RT#49434]
+
+
+
+_______________________________________________________________________________
+2009-10-22 Release 3.63
+
+Gisle Aas (2):
+ Take more care to prepare the char range for encode_entities [RT#50170]
+ decode_entities confused by trailing incomplete entity
+
+
+
+_______________________________________________________________________________
+2009-08-13 Release 3.62
+
+Ville Skyttä (4):
+ HTTP::Header doc typo fix.
+ Do not bother tracking style or script, they're ignored.
+ Bring HTML 5 head elements up to date with WD-html5-20090423.
+ Improve HeadParser performance.
+
+Gisle Aas (1):
+ Doc patch: Make it clearer what the return value from ->parse is
+
+
+
+_______________________________________________________________________________
+2009-06-20 Release 3.61
+
+Gisle Aas (2):
+ Test that triggers the crash that Chip fixed
+ Complete documented list of literal tags
+
+Chip Salzenberg (1):
+ Avoid crash (referenced pend_text instead of skipped_text)
+
+Antonio Radici (1):
+ Reference HTML::LinkExttor [RT#43164]
+
+
+
+_______________________________________________________________________________
+2009-02-09 Release 3.60
+
+Ville Skytta (5):
+ Spelling fixes.
+ Test multi-value headers.
+ Documentation improvements.
+ Do not terminate head parsing on the <object> element (added in HTML 4.0).
+ Add support for HTML 5 <meta charset> and new HEAD elements.
+
+Damyan Ivanov (1):
+ Short description of the htextsub example
+
+Mike South (1):
+ Suppress warning when encode_entities is called with undef [RT#27567]
+
+Zefram (1):
+ HTML::Parser doesn't compile with perl 5.8.0.
+
+
+
+_______________________________________________________________________________
+2008-11-24 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.59
+
+ Restore perl-5.6 compatibility for HTML::HeadParser.
+
+ Improved META.yml
+
+
+
+_______________________________________________________________________________
+2008-11-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.58
+
+ Suppress "Parsing of undecoded UTF-8 will give garbage" warning
+ with attr_encoded [RT#29089]
+
+ HTML::HeadParser:
+ - Recognize the Unicode BOM in utf8_mode as well [RT#27522]
+ - Avoid ending up with '/' keys attribute in Link headers.
+
+
+
+_______________________________________________________________________________
+2008-11-16 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.57
+
+ The <iframe> element content is now parsed in literal mode.
+
+ Parsing of <script> and <style> content ends on the first end tag
+ even when that tag was in a quoted string. That seems to be the
+ behaviour of all modern browsers.
+
+ Implement backquote() attribute as requested by Alex Kapranoff.
+
+ Test and documentation tweaks from Alex Kapranoff.
+
+
+
+_______________________________________________________________________________
+2007-01-12 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.56
+
+ Cloning of parser state for compatibility with threads.
+ Fixed by Bo Lindbergh <blgl@hagernas.com>.
+
+ Don't require whitespace between declaration tokens.
+ <http://rt.cpan.org/Ticket/Display.html?id=20864>
+
+
+
+_______________________________________________________________________________
+2006-07-10 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.55
+
+ Treat <> at the end of document as text. Used to be
+ reported as a comment.
+
+ Improved Firefox compatibility for bad HTML:
+ - Unclosed <script>, <style> are now treated as empty tags.
+ - Unclosed <textarea>, <xmp> and <plaintext> treat rest as text.
+ - Unclosed <title> closes at next tag.
+
+ Make <!a'b> a comment by itself.
+
+
+
+_______________________________________________________________________________
+2006-04-28 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.54
+
+ Yaakov Belch discovered yet another issue with <script> parsing.
+ Enabling of 'empty_element_tags' got the parser confused
+ if it found such a tag for elements that are normally parsed
+ in literal mode. Of these <script src="..."/> is the only
+ one likely to be found in documents.
+ <http://rt.cpan.org//Ticket/Display.html?id=18965>
+
+
+
+_______________________________________________________________________________
+2006-04-27 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.53
+
+ When ignore_element was enabled it got confused if the
+ corresponding tags did not nest properly; the end tag
+ was treated it as if it was a start tag.
+ Found and fixed by Yaakov Belch <code@yaakovnet.net>.
+ <http://rt.cpan.org/Ticket/Display.html?id=18936>
+
+
+
+_______________________________________________________________________________
+2006-04-26 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.52
+
+ Make sure the 'start_document' fires exactly once for
+ each document parsed. For earlier releases it did not
+ fire at all for empty documents and could fire multiple
+ times if parse was called with empty chunks.
+
+ Documentation tweaks and typo fixes.
+
+
+
+_______________________________________________________________________________
+2006-03-22 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.51
+
+ Named entities outside the Latin-1 range are now only expanded
+ when properly terminated with ";". This makes HTML::Parser
+ compatible with Firefox/Konqueror/MSIE when it comes to how these
+ entities are expanded in attribute values. Firefox does expand
+ unterminated non-Latin-1 entities in plain text, so here
+ HTML::Parser only stays compatible with Konqueror/MSIE.
+ Fixes <http://rt.cpan.org/Ticket/Display.html?id=17962>.
+
+ Fixed some documentation typos spotted by <william@knowmad.com>.
+ <http://rt.cpan.org/Ticket/Display.html?id=18062>
+
+
+
+_______________________________________________________________________________
+2006-02-14 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.50
+
+ The 3.49 release didn't compile with VC++ because it mixed code
+ and declarations. Fixed by Steve Hay <steve.hay@uk.radan.com>.
+
+
+
+_______________________________________________________________________________
+2006-02-08 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.49
+
+ Events could sometimes still fire after a handler has signaled eof.
+
+ Marked_sections with text ending in square bracket parsed wrong.
+ Fix provided by <paul.bijnens@xplanation.com>.
+ <http://rt.cpan.org/Ticket/Display.html?id=16749>
+
+
+
+_______________________________________________________________________________
+2005-12-02 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.48
+
+ Enabling empty_element_tags by default for HTML::TokeParser
+ was a mistake. Reverted that change.
+ <http://rt.cpan.org/Ticket/Display.html?id=16164>
+
+ When processing a document with "marked_sections => 1", the
+ skipped text missed the first 3 bytes "<![".
+ <http://rt.cpan.org/Ticket/Display.html?id=16207>
+
+
+
+2005-11-22 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.47
+
+ Added empty_element_tags and xml_pic configuration
+ options. These make it possible to enable these XML
+ features without enabling the full XML-mode.
+
+ The empty_element_tags is enabled by default for
+ HTML::TokeParser.
+
+
+
+2005-10-24 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.46
+
+ Don't try to treat an literal &nbsp; as space.
+ This breaks Unicode parsing.
+ <http://rt.cpan.org/Ticket/Display.html?id=15068>
+
+ The unbroken_text option is now on by default
+ for HTML::TokeParser.
+
+ HTML::Entities::encode will now encode "'" by default.
+
+ Improved report/ignore_tags documentation by
+ Norbert Kiesel <nkiesel@tbdnetworks.com>.
+
+ Test suite now use Test::More, by
+ Norbert Kiesel <nkiesel@tbdnetworks.com>.
+
+ Fix HTML::Entities typo spotted by
+ Stefan Funke <bundy@adm.arcor.net>.
+
+ Faster load time with XSLoader (perl-5.6 or better now required).
+
+ Fixed POD markup errors in some of the modules.
+
+
+
+2005-01-06 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.45
+
+ Fix stack memory leak caused by missing PUTBACK. Only
+ code that used $p->parse(\&cb) form was affected.
+ Fix provided by Gurusamy Sarathy <gsar@sophos.com>.
+
+
+
+2004-12-28 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.44
+
+ Fix confusion about nested quotes in <script> and <style> text.
+
+
+
+2004-12-06 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.43
+
+ The SvUTF8 flag was not propagated correctly when replacing
+ unterminated entities.
+
+ Fixed test failure because of missing binmode on Windows.
+
+
+
+2004-12-04 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.42
+
+ Avoid sv_catpvn_utf8_upgrade() as that macro was not
+ available in perl-5.8.0.
+ Patch by Reed Russell <Russell.Reed@acxiom.com>.
+
+ Add casts to suppress compilation warnings for char/U8
+ mismatches.
+
+ HTML::HeadParser will always push new header values.
+ This make sure we never loose old header values.
+
+
+
+2004-11-30 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.41
+
+ Fix unresolved symbol error with perl-5.005.
+
+
+
+2004-11-29 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.40
+
+ Make utf8_mode only available on perl-5.8 or better. It produced
+ garbage with older versions of perl.
+
+ Emit warning if entities are decoded and something in the first
+ chunk looks like hi-bit UTF-8. Previously this warning was only
+ triggered for documents with BOM.
+
+
+
+2004-11-23 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.39_92
+
+ More documentation of the Unicode issues. Moved around HTML::Parser
+ documentation a bit.
+
+ New boolean option; $p->utf8_mode to allow parsing of raw UTF-8.
+
+ Documented that HTML::Entities::decode_entities() can take multiple
+ arguments.
+
+ Unterminated entities are now decoded in text (compatibility
+ with MSIE misfeature).
+
+ Document HTML::Entities::_decode_entities(); this variation of the
+ decode_entities() function has been available for a long time, but
+ have not been documented until now.
+
+ HTML::Entities::_decode_entities() can now be told to try to
+ expand unterminated entities.
+
+ Simplified Makefile.PL
+
+
+
+2004-11-23 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.39_91
+
+ The HTML::HeadParser will skip Unicode BOM. Previously it
+ would consider the <head> section done when it saw the BOM.
+
+ The parser will look for Unicode BOM and give appropriate
+ warnings if the form found indicate trouble.
+
+ If no matching end tag is found for <script>, <style>, <xmp>
+ <title>, <textarea> then generate one where the next tag
+ starts.
+
+ For <script> and <style> recognize quoted strings and don't
+ consider end element if the corresponding end tag is found
+ inside such a string.
+
+
+
+2004-11-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.39_90
+
+ The <title> element is now parsed in literal mode, which
+ means that other tags are not recognized until </title> has
+ been seen.
+
+ Unicode support for perl-5.8 and better.
+
+ Decoding Unicode entities always enabled; no longer a compile
+ time option.
+
+ Propagation of UTF8 state on strings.
+ Patch contributed by John Gardiner Myers <jgmyers@proofpoint.com>.
+
+ Calculate offsets and lengths in chars for Unicode strings.
+
+ Fixed link typo in the HTML::TokeParser documentation.
+
+
+
+2004-11-11 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.38
+
+ New boolean option; $p->closing_plaintext
+ Contributed by Alex Kapranoff <alex@kapranoff.ru>
+
+
+
+2004-11-10 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.37
+
+ Improved handling of HTML encoded surrogate pairs and illegally
+ encoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>.
+ Patch by John Gardiner Myers <jgmyers@proofpoint.com>.
+
+ Avoid generating bad UTF8 strings when decoding entities
+ representing chars beyond #255 in 8-bit strings. Such bad
+ UTF8 sometimes made perl-5.8.5 and older segfault.
+
+ Undocument v2 style subclassing in synopsis section.
+
+ Internal cleanup:
+
+ Make 'gcc -Wall' happier.
+
+ Avoid modification of PVs during parsing of attrspec.
+ Another patch by John Gardiner Myers.
+
+
+
+2004-04-01 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.36
+
+ Improved MSIE/Mozilla compatibility. If the same attribute
+ name repeats for a start tag, use the first value instead
+ of the last. Patch by Nick Duffek <html-parser@duffek.com>.
+ <http://rt.cpan.org/Ticket/Display.html?id=5472>
+
+
+
+2003-12-12 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.35
+
+ Documentation fixes by Paul Croome <Paul.Croome@softwareag.com>.
+
+ Removed redundant dSP.
+
+
+
+2003-10-27 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.34
+
+ Fix segfault that happened when the parse callback caused
+ the stack to get reallocated. The original bug report was
+ <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217616>
+
+
+
+2003-10-14 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.33
+
+ Perl 5.005 or better is now required. For some reason we get
+ a test failure with perl-5.004 and I don't really feel like
+ debugging that perl any more. Details about this failure can
+ be found at <http://rt.cpan.org/Ticket/Display.html?id=4065>.
+
+ New HTML::TokeParser method called 'get_phrase'. It returns
+ all current text while ignoring any phrase-level markup.
+
+ The HTML::TokeParser method 'get_text' now expands skipped
+ non-phrase-level tags as a single space.
+
+
+
+2003-10-10 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.32
+
+ If the document parsed ended with some kind of unterminated markup,
+ then the parser state was not reset properly and this piece of markup
+ would show up in the beginning of the next document parsed.
+ <http://rt.cpan.org/Ticket/Display.html?id=3954>
+
+ The get_text and get_trimmed_text methods of HTML::TokeParser can
+ now take multiple end tags as argument. Patch by <siegmann@tinbergen.nl>
+ at <http://rt.cpan.org/Ticket/Display.html?id=3166>.
+
+ Various documentation tweaks.
+
+ Included another example program: hdump
+
+
+
+2003-08-19 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.31
+
+ The -DDEBUGGING fix in 3.30 was not really there :-(
+
+
+
+2003-08-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.30
+
+ The previous release failed to compile on a -DDEBUGGING perl
+ like the one provided by Redhat 9.
+
+ Got rid of references to perl-5.7.
+
+ Further fixes to avoid warnings from Visual C.
+ Patch by Steve Hay <steve.hay@uk.radan.com>.
+
+
+
+2003-08-14 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.29
+
+ Setting xml_mode now implies strict_names also for end tags.
+
+ Avoid warning from Visual C. Patch by <gsar@activestate.com>.
+
+ 64-bit fix from Doug Larrick <doug@ties.org>
+ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=195500
+
+ Try to parse similar to Mozilla/MSIE in certain edge cases.
+ All these are outside of the official definition of HTML but
+ HTML spam often tries to take advantage of these.
+
+ - New configuration attribute 'strict_end'. Unless enabled
+ we will allow end tags to contain extra words or stuff
+ that look like attributes before the '>'. This means that
+ tags like these:
+
+ </foo foo="<ignored>">
+ </foo ignored>
+ </foo ">" ignored>
+
+ are now all parsed as a 'foo' end tag instead of text.
+ Even if the extra stuff looks like attributes they will not
+ be reported if requested via the 'attr' or 'tokens' argspecs
+ for the 'end' handler.
+
+ - Parse '</:comment>' and '</ comment>' as comments unless
+ strict_comment is enabled. Previous versions of the parser
+ would report these as text. If these comments contain
+ quoted words prefixed by space or '=' these words can
+ contain '>' without terminating the comment.
+
+ - Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
+ Previous versions of the parser would terminate the comment
+ at the first '>' and report the rest as text.
+
+ - Legacy comment mode: Parse with comments terminated with a
+ lone '>' if no '-->' is found before eof.
+
+ - Incomplete tag at eof is reported as a 'comment' instead
+ of 'text' unless strict_comment is enabled.
+
+
+
+2003-04-16 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.28
+
+ When 'strict_comment' is off (which it is by default)
+ treat anything that matches <!...> a comment.
+
+ Should now be more efficient on threaded perls.
+
+
+
+2003-01-18 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.27
+
+ Typo fixes to the documentation.
+
+ HTML::Entities::escape_entities_numeric contributed
+ by Sean M. Burke <sburke@cpan.org>.
+
+ Included one more example program 'hlc' that show
+ how to downcase all tags in an HTML file.
+
+
+
+2002-03-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.26
+
+ Avoid core dump in some cases where the callback croaks.
+ The perl_call_method and perl_call_sv needs G_EVAL flag
+ to be safe.
+
+ New parser attributes; 'attr_encoded' and 'case_sensitive'.
+ Contributed by Guy Albertelli II <guy@albertelli.com>.
+
+ HTML::Entities
+ - don't encode \r by default as suggested by Sean M. Burke.
+
+ HTML::HeadParser
+ - ignore empty http-equiv
+ - allow multiple <link> elements. Patch by
+ Timur I. Bakeyev <timur@gnu.org>
+
+ Avoid warnings from bleadperl on the uentities test.
+
+
+
+2001-05-11 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.25
+
+ Minor tweaks for build failures on perl5.004_04, perl-5.6.0,
+ and for macro clash under Windows.
+
+ Improved parsing of <plaintext>... :-)
+
+
+
+2001-05-09 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.24
+
+ $p->parse(CODE)
+
+ New events: start_document, end_document
+
+ New argspecs: skipped_text, offset_end
+
+ The offset/line/column counters was not properly reset
+ after eof.
+
+
+
+2001-05-01 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.23
+
+ If the $p->ignore_elements filter did not work as it should if
+ handlers for start/end events was not registered.
+
+
+
+2001-04-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.22
+
+ The <textarea> element is now parsed in literal mode, i.e. no other tags
+ recognized until the </textarea> tag is seen. Unlike other literal elements,
+ the text content is not 'cdata'.
+
+ The XML &apos; entity is decoded. It apos-char itself is still encoded as
+ &#39; as &apos; is not really an HTML tag, and not recognized by many HTML
+ browsers.
+
+
+
+2001-04-10 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.21
+
+ Fix a memory leak which occurred when using filter methods.
+
+ Avoid a few compiler warnings (DEC C):
+ - Trailing comma found in enumerator list
+ - "unsigned char" is not compatible with "const char".
+
+ Doc update.
+
+
+
+2001-04-02 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.20
+
+ Some minor documentation updates.
+
+
+
+2001-03-30 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19_94
+
+ Implemented 'tag', 'line', 'column' argspecs.
+
+ HTML::PullParser doc update.
+ eg/hform is an example of HTML::PullParser usage.
+
+
+
+2001-03-27 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19_93
+
+ Shorten 'report_only_tags' to 'report_tags'.
+ I think it reads better.
+
+ Bleadperl portability fixes.
+
+
+
+2001-03-25 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19_92
+
+ HTML::HeadParser made more efficient by using 'ignore_elements'.
+
+ HTML::LinkExtor made more efficient by using 'report_only_tags'.
+
+ HTML::TokeParser generalized into HTML::PullParser. HTML::PullParser
+ only support the get_token/unget_token interface of HTML::TokeParser,
+ but is more flexible because the information that make up an token
+ is customisable. HTML::TokeParser is made into an HTML::PullParser
+ subclass.
+
+
+
+2001-03-19 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19_91
+
+ Array references can be passed to the filter methods. Makes it easier
+ to use them as constructor options.
+
+ Example programs updated to use filters.
+
+ Reset ignored_element state on EOF.
+
+ Documentation updates.
+
+ The netscape_buggy_comment() method now generates mandatory warning
+ about its deprecation.
+
+
+
+2001-03-13 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19_90
+
+ This is an developer only release. It contains some new
+ experimental features. The interface to these might still change.
+
+ Implemented filters to reduce the numbers of callbacks generated:
+ - $p->ignore_tags()
+ - $p->report_only_tags()
+ - $p->ignore_elements()
+
+ New @attr argspec. Less overhead than 'attr' and allow
+ compatibility with XML::Parser style start events.
+
+ The whole argspec can be wrapped up in @{...} to signal
+ flattening. Only makes a difference when the target is an
+ array.
+
+
+
+2001-03-09 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.19
+
+ Avoid the entity2char global. That should make the module
+ more thread safe. Patch by Gurusamy Sarathy <gsar@ActiveState.com>.
+
+
+
+2001-02-24 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.18
+
+ There was a C++ style comment left in util.c. Strict C
+ compilers do not like that kind of stuff.
+
+
+
+2001-02-23 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.17
+
+ The 3.16 release broke MULTIPLICITY builds. Fixed.
+
+
+
+2001-02-22 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.16
+
+ The unbroken_text option now works across ignored tags.
+
+ Fix casting of pointers on some 64 bit platforms.
+
+ Fix decoding of Unicode entities. Only optionally available for
+ perl-5.7.0 or better.
+
+ Expose internal decode_entities() function at the Perl level.
+
+ Reindented some code.
+
+
+
+2000-12-26 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.15
+
+ HTML::TokeParser's get_tag() method now takes multiple
+ tags to match. Hopefully the documentation is also a bit clearer.
+
+ #define PERL_NO_GET_CONTEXT: Should speed up things for thread
+ enabled versions of perl.
+
+ Quote some more entities that also happens to be perl keywords.
+ This avoids warnings on perl-5.004.
+
+ Unicode entities only triggered for perl-5.7.0 or higher.
+
+
+
+2000-12-03 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.14
+
+ If a handler triggered by flushing text at eof called the
+ eof method then infinite recursion occurred. Fixed.
+ Bug discovered by Jonathan Stowe <gellyfish@gellyfish.com>.
+
+ Allow <!doctype ...> to be parsed as declaration.
+
+
+
+2000-09-17 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.13
+
+ Experimental support for decoding of Unicode entities.
+
+
+
+2000-09-14 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.12
+
+ Some tweaks to get it to compile with "Optimierender Microsoft (R)
+ 32-Bit C/C++-Compiler, Version 12.00.8168, fuer x86."
+ Patch by Matthias Waldorf <matthias.waldorf@zoom.de>.
+
+ HTML::Entities documentation spelling patch by
+ David Dyck <dcd@tc.fluke.com>.
+
+
+
+2000-08-22 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.11
+
+ HTML::LinkExtor and eg/hrefsub now obtain %linkElements from
+ the HTML::Tagset module.
+
+
+
+2000-06-29 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.10
+
+ Avoid core dump when stack gets relocated as the result of
+ text handler invocation while $p->unbroken_text is enabled.
+ Needed to refresh the stack pointer.
+
+
+
+2000-06-28 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.09
+
+ Avoid core dump if somebody clobbers the aliased $self argument of
+ a handler.
+
+ HTML::TokeParser documentation update suggested by
+ Paul Makepeace <Paul.Makepeace@realprogrammers.com>.
+
+
+
+2000-05-23 Gisle Aas <gisle@ActiveState.com>
+
+ Release 3.08
+
+ Fix core dump for large start tags.
+ Bug spotted by Alexander Fraser <green795@hotmail.com>
+
+ Added yet another example program: eg/hanchors
+
+ Typo fix by Jamie McCarthy <jamie@mccarthy.org>
+
+
+
+2000-03-20 Gisle Aas <gisle@aas.no>
+
+ Release 3.07
+
+ Fix perl5.004 builds (was broken in 3.06)
+
+ Declaration parsing mode now only triggers for <!DOCTYPE ...> and
+ <!ENTITY ...>. Based on patch by la mouton <kero@3sheep.com>.
+
+
+
+2000-03-06 Gisle Aas <gisle@aas.no>
+
+ Release 3.06
+
+ Multi-threading/MULTIPLICITY compilation fix.
+ Both Doug MacEachern <dougm@pobox.com> and
+ Matthias Urlichs <smurf@noris.net> provided a patch.
+
+ Avoid some "statement not reached" warnings from picky
+ compilers.
+
+ Remove final commas in enums as ANSI C does not allow
+ them and some compilers actually care.
+ Patch by James Walden <jamesw@ichips.intel.com>
+
+ Added eg/htextsub example program.
+
+
+
+2000-01-22 Gisle Aas <gisle@aas.no>
+
+ Release 3.05
+
+ Implemented $p->unbroken_text option
+
+ Don't parse content of certain HTML elements as CDATA when
+ xml_mode is enabled.
+
+ Offset was reported with wrong sign for text at end of chunk.
+
+
+
+2000-01-15 Gisle Aas <gisle@aas.no>
+
+ Release 3.04
+
+ Backed out 3.03-patch that checked for legal handler and attribute
+ names in the HTML::Parser constructor.
+
+ Documentation typo fixed by Michael.
+
+
+
+2000-01-14 Gisle Aas <gisle@aas.no>
+
+ Release 3.03
+
+ We did not get out of comment mode for comments ending with an
+ odd number of "-" before ">". Patch by la mouton <kero@3sheep.com>
+
+ Documentation patch by Michael.
+
+
+
+1999-12-21 Gisle Aas <gisle@aas.no>
+
+ Release 3.02
+
+ Hide ~-magic IV-pointer to 'struct p_state' behind a reference.
+ This allow copying of the internal _hparser_xs_state element, and
+ will make HTML-Tree-0.61 work again.
+
+ Introduced $p->init() which might be useful for subclasses that
+ only want the initialization part of the constructor.
+
+ Filled out DIAGNOSTICS section of the HTML::Parser POD.
+
+
+
+1999-12-19 Gisle Aas <gisle@aas.no>
+
+ Release 3.01
+
+ Rely on ~-magic instead of a DESTROY method to deallocate
+ the internal 'struct p_state'. This avoid memory leaks
+ when people simply wipe of the content of the object hash.
+
+ One of the assertion in hparser.c had opposite logic. This made
+ the parser fail when compiled with a -DDEBUGGING perl.
+
+ Don't assume any specific order of hash keys in the t/cases.t.
+ This test failed with some newer development releases of perl.
+
+
+
+1999-12-14 Gisle Aas <gisle@aas.no>
+
+ Release 3.00
+
+ Documentation update (most of it from Michael)
+
+ Minor patch to eg/hstrip so that it use a "" handler
+ instead of &ignore.
+
+ Test suite patches from Michael
+
+
+
+1999-12-13 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_96
+
+ Patches from Michael:
+
+ - A handler of "" means that the event will be ignored.
+ More efficient than using 'sub {}' as handler.
+
+ - Don't use a perl hash for looking up argspec keywords.
+
+ - Documentation tweaks.
+
+
+
+1999-12-09 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_95 (this is a 3.00 candidate)
+
+ Fixed core dump when "<" was followed by an 8-bit character.
+ Spotted and test case provided by Doug MacEachern. Doug had
+ been running HTML-Parser-XS through more that 1 million urls that
+ had been downloaded via LWP.
+
+ Handlers can now invoke $p->eof to request the parsing to terminate.
+ HTML::HeadParser has been simplified by taking advantage of this.
+ Also added a title-extraction example that uses this.
+
+ Michael once again fixed my bad English in the HTML::Parser
+ documentation.
+
+ netscape_buggy_comment will carp instead of warn
+
+ updated TODO/README
+
+ Documented that HTML::Filter is depreciated.
+
+ Made backslash reserved in literal argspec strings.
+
+ Added several new test scripts.
+
+
+
+1999-12-08 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_94 (should almost be a 3.00 candidate)
+
+ Renamed 'cdata_flag' as 'is_cdata'.
+
+ Dropped support for wrapping callback handler and argspec
+ in an array and passing a reference to $p->handler. It
+ created ambiguities when you want to pass a array as
+ handler destination and not update argspec. The wrapping
+ for constructor arguments are unchanged.
+
+ Reworked the documentation after updates from Michael.
+
+ Simplified internal check_handler(). It should probably simply
+ be inlined in handler() again.
+
+ Added argspec 'length' and 'undef'
+
+ Fix statement-less label. Fix suggested by Matthew Langford
+ <langfml@Eng.Auburn.EDU>.
+
+ Added two more example programs: eg/hstrip and eg/htext.
+
+ Various minor patches from Michael.
+
+
+
+1999-12-07 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_93
+
+ Documentation update
+
+ $p->bool_attr_value renamed as $p->boolean_attribute_value
+
+ Internal renaming: attrspec --> argspec
+
+ Introduced internal 'enum argcode' in hparser.c
+
+ Added eg/hrefsub
+
+
+
+1999-12-05 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_92
+
+ More documentation patches from Michael
+
+ Renamed 'token1' as 'token0' as suggested by Michael
+
+ For artificial end tags we now report 'tokens', but not 'tokenpos'.
+
+ Boolean attribute values show up as (0, 0) in 'tokenpos' now.
+
+ If $p->bool_attr_value is set it will influence 'tokens'
+
+ Fix for core dump when parsing <a "> when $p->strict_names(0).
+ Based on fix by Michael.
+
+ Will av_extend() the tokens/tokenspos arrays.
+
+ New test suite script by Michael: t/attrspec.t
+
+
+
+1999-12-04 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_91
+
+ Implemented attrspec 'offset'
+
+ Documentation patch from Michael
+
+ Some more cleanup/updated TODO
+
+
+
+1999-12-03 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_90 (first beta for 3.00)
+
+ Using "realloc" as a parameter name in grow_tokens created
+ problems for some people. Fix by Paul Schinder <schinder@pobox.com>
+
+ Patch by Michael that makes array handler destinations really work.
+
+ Patch by Michael that make HTML::TokeParser use this. This gave a
+ a speedup of about 80%.
+
+ Patch by Michael that makes t/cases into a real test.
+
+ Small HTML::Parser documentation patch by Michael.
+
+ Renamed attrspec 'origtext' to 'text' and 'decoded_text' to 'dtext'
+
+ Split up Parser.xs. Moved stuff into hparser.c and util.c
+
+ Dropped html_ prefix from internal parser functions.
+
+ Renamed internal function html_handle() as report_event().
+
+
+
+1999-12-02 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_17
+
+ HTML::Parser documentation patch from Michael.
+
+ Fix memory leaks in html_handler()
+
+ Patch that makes an array legal as handler destination.
+ Also from Michael.
+
+ The end of marked sections does not eat successive newline
+ any more.
+
+ The artificial end event for empty tag in xml_mode did not
+ report an empty origtext.
+
+ New constructor option: 'api_version'
+
+
+
+1999-12-01 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_16
+
+ Support "event" in argspec. It expands to the name of the
+ handler (minus "default").
+
+ Fix core dump for large start tags. The tokens_grow() routine
+ needed an adjustment. Added test for this; t/largstags.t.
+
+
+
+1999-11-30 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_15
+
+ Major restructuring/simplification of callback interface based on
+ initial work by Michael. The main news is that you now need to
+ tell what arguments you want to be provided to your callbacks.
+
+ The following parser options has been eliminated:
+
+ $p->decode_text_entities
+ $p->keep_case
+ $p->v2_compat
+ $p->pass_self
+ $p->attr_pos
+
+
+
+1999-11-26 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_14
+
+ Documentation update by Michael A. Chase.
+
+ Fix for declaration parsing by Michael A. Chase.
+
+ Workaround for perl5.004_05 bug. Can't return &PL_sv_undef.
+
+
+
+1999-11-22 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_13
+
+ New Parser.pm POD based on initial work by Michael A. Chase.
+ All new features should now be described.
+
+ $p->callback(start => undef) will not reset the callback.
+
+ $p->xml_mode() did not parse attributes correct because
+ HCTYPE_NOT_SPACE_EQ_SLASH_GT flag was never set.
+
+ A few more tests.
+
+
+
+1999-11-18 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_12
+
+ Implemented $p->attr_pos attribute. This causes attr positions
+ within $origtext of the start tag to be reported instead of the
+ attribute values. The positions are reported as 4 numbers; end of
+ previous attr, start of this attr, start of attr value, and end of
+ attr. This should make substr() manipulations of $origtext easy.
+
+ Implemented $p->unbroken_text attribute. This makes sure that
+ text segments are never broken and given back as separate text
+ callbacks. It delays text callbacks until some other markup
+ has been recognized.
+
+ More English corrections by Michael A. Chase.
+
+ HTML::LinkExtor now recognizes even more URI attributes as
+ suggested by Sean M. Burke <sburke@netadventure.net>
+
+ Completed marked sections support. It is also now a compile
+ time decision if you want this supported or not. The only
+ drawback of enabling it should be a possible parsing speed
+ reduction. I have not measured this yet.
+
+ The keys for callbacks initialized in the constructor are now
+ suffixed with "_cb".
+
+ Renamed $p->pass_cbdata to $p->pass_self.
+
+ Added magic number to the p_state struct.
+
+
+
+1999-11-17 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_11
+
+ Don't leak $@ modifications from HTML::Parser constructor.
+
+ Included HTML::Parser POD.
+
+ Marked sections almost work. CDATA and RCDATA should work.
+
+ For tags that take us into literal_mode; <script>, <style>,
+ <xmp>, we did not recognize the end tag unless it was written
+ in all lower case.
+
+
+
+1999-11-16 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_10
+
+ The mkhctype and mkpfunc scripts were using \z inside RE. This
+ did not work for perl5.004. Replaced them with plain old
+ dollar signs.
+
+
+
+1999-11-15 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_09
+
+ Grammar fixes by Michael A. Chase <mchase@ix.netcom.com>
+
+ Some more test suite patches for Win32 by Michael A. Chase
+ <mchase@ix.netcom.com>
+
+ Implemented $p->strict_names attribute. By default we now
+ allow almost anything in tag and attribute names. This is much
+ closer to the behaviour of some popular browsers. This allows us
+ to parse broken tags like this example from the LWP mailing list:
+ <IMG ALIGN=MIDDLE SRC=newprevlstGr.gif ALT=[PREV LIST] BORDER=0>
+
+ Introduced some tables in "hctype.h" and "pfunc.h". These
+ are built by the corresponding "mk..." script.
+
+
+
+1999-11-10 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_08
+
+ Make Parser.xs compile on perl5.004_05 too.
+
+ New callback called 'default'. This will be called for any
+ document text no other callback shows an interest in.
+
+ Patch by Michael A. Chase <mchase@ix.netcom.com> that should
+ help clean up files for the test suite on Win32.
+
+ Can now set up various attributes with key/value pairs passed to
+ the constructor.
+
+ $p->parse_file() will open the file in binmode()
+
+ Pass complete processing instruction tag as second argument
+ to process callback.
+
+ New boolean attribute v2_compat. This influences how attributes
+ are reported for start tags.
+
+ HTML::Filter now filters process instructions too.
+
+ Faster HTML::LinkExtor by taking advantage of the new
+ callback interface. The module now also uses URI.pm (instead
+ of the old URI::URL) to absolutize URIs.
+
+ Faster HTML::TokeParser by taking advantage of new
+ accum interface.
+
+
+
+1999-11-09 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_07
+
+ Entities in attribute values are now always expanded.
+
+ If you set the $p->decode_text_entities to a true value, then
+ you don't have to decode the text yourself.
+
+ In xml_mode we don't report empty element tags as a start tag
+ with an extra parameter any more. Instead we generate an artificial
+ end tag.
+
+ 'xml_mode' now implies 'keep_case'.
+
+ The parser now keeps its own copy of the bool_attr_value value.
+
+ Avoid memory leak for text callbacks
+
+ Avoid using ERROR as a goto label.
+
+ Introduced common internal accessor function for all boolean parser
+ attributes.
+
+ Tweaks to make Parser.xs compile under perl5.004.
+
+
+
+1999-11-08 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_06
+
+ Internal fast decode_entities(). By using it we are able to make
+ the HTML::Entities::decode function 6 times faster than the old one
+ implemented in pure Perl.
+
+ $p->bool_attr_value() can be set to influence the value that
+ boolean attributes will be assigned. The default is to assign
+ a value identical to the attribute name.
+
+ Process instructions are reported as "PI" in @accum
+
+ $p->xml_mode(1) modifies how processing instructions are terminated
+ and allows "/>" at the end of start tags.
+
+ Turn off optimizations when compiling with gcc on Solaris. Avoids
+ what we believe to be a compiler bug. Should probably figure out
+ which versions of gcc have this bug.
+
+
+
+1999-11-05 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_05
+
+ The previous release did not even compile. I forgot to try 'make test'
+ before uploading.
+
+
+
+1999-11-05 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_04
+
+ Generalized <XMP>-support to cover all literal parsing. Currently
+ activated for <script>, <style>, <xmp> and <plaintext>.
+
+
+
+1999-11-05 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_03
+
+ <XMP>-support.
+
+ Allow ":" in tag and attribute names
+
+ Include rest of the HTML::* files from the old HTML::Parser
+ package. This should make testing easier.
+
+
+
+1999-11-04 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_02
+
+ Implemented keep_case() option. If this attribute is true, then
+ we don't lowercase tag and attribute names.
+
+ Implemented accum() that takes an array reference. Tokens are
+ pushed onto this array instead of sent to callbacks.
+
+ Implemented strict_comment().
+
+
+
+1999-11-03 Gisle Aas <gisle@aas.no>
+
+ Release 2.99_01
+
+ Baseline of XS implementation
+
+
+
+1999-11-05 Gisle Aas <gisle@aas.no>
+
+ Release 2.25
+
+ Allow ":" in attribute names as a workaround for Microsoft Excel
+ 2000 which generates such files.
+
+ Make deprecate warning if netscape_buggy_comment() method is
+ used. The method is used in strict_comment().
+
+ Avoid duplication of parse_file() method in HTML::HeadParser.
+
+
+
+1999-10-29 Gisle Aas <gisle@aas.no>
+
+ Release 2.24
+
+ $p->parse_file() will not close a handle passed to it any more.
+ If passed a filename that can't be opened it will return undef
+ instead of raising an exception, and strings like "*STDIN" are not
+ treated as globs any more.
+
+ HTML::LinkExtor knows about background attribute of <tables>.
+ Patch by Clinton Wong <clintdw@netcom.com>
+
+ HTML::TokeParser will parse large inline strings much faster now.
+ The string holding the document must not be changed during parsing.
+
+
+
+1999-06-09 Gisle Aas <gisle@aas.no>
+
+ Release 2.23
+
+ Documentation updates.
+
+
+
+1998-12-18 Gisle Aas <aas@sn.no>
+
+ Release 2.22
+
+ Protect HTML::HeadParser from evil $SIG{__DIE__} hooks.
+
+
+
+1998-11-13 Gisle Aas <aas@sn.no>
+
+ Release 2.21
+
+ HTML::TokeParser can now parse strings directly and does the
+ right thing if you pass it a GLOB. Based on patch by
+ Sami Itkonen <si@iki.fi>.
+
+ HTML::Parser now allows space before and after "--" in Netscape
+ comments. Patch by Peter Orbaek <poe@daimi.au.dk>.
+
+
+
+1998-07-08 Gisle Aas <aas@sn.no>
+
+ Release 2.20
+
+ Added HTML::TokeParser. Check it out!
+
+
+
+1998-07-07 Gisle Aas <aas@sn.no>
+
+ Release 2.19
+
+ Don't end a text chunk with space when we try to avoid breaking up
+ words.
+
+
+
+1998-06-22 Gisle Aas <aas@sn.no>
+
+ Release 2.18
+
+ HTML::HeadParser->parse_file will now stop parsing when the
+ <body> starts as it should.
+
+ HTML::LinkExtor more easily subclassable by introducing the
+ $self->_found_link method.
+
+
+
+1998-04-28 Gisle Aas <aas@sn.no>
+
+ Release 2.17
+
+ Never split words (a sequence of non-space) between two invocations
+ of $self->text. This is just a simplification of the code that tried
+ not to break entities.
+
+ HTML::Parser->parse_file now use smaller chunks as already
+ suggested by the HTML::Parser documentation.
+
+
+
+1998-04-02 Gisle Aas <aas@sn.no>
+
+ Release 2.16
+
+ The HTML::Parser could some times break hex entities (like &#xFFFF;)
+ in the middle.
+
+ Removed remaining forced dependencies on libwww-perl modules. It
+ means that all tests should now pass, even if libwww-perl was not
+ installed previously.
+
+ More tests.
+
+
+
+1998-04-01 Gisle Aas <aas@sn.no>
+
+ Release 2.14, HTML::* modules unbundled from libwww-perl-5.22.