diff options
Diffstat (limited to 'Changes')
-rw-r--r-- | Changes | 1664 |
1 files changed, 1664 insertions, 0 deletions
@@ -0,0 +1,1664 @@ +_______________________________________________________________________________ +2013-05-09 Release 3.71 + +Gisle Aas (1): + Transform ':' in headers to '-' [RT#80524] + + +_______________________________________________________________________________ +2013-03-28 Release 3.70 + +François Perrad (1): + Fix for cross-compiling with Buildroot + +Gisle Aas (1): + Comment typo fix + +Yves Orton (1): + Fix Issue #3 / RT #84144: HTML::Entities::decode_entities() needs + to call SV_CHECK_THINKFIRST() before checking READONLY flag + + +_______________________________________________________________________________ +2011-10-15 Release 3.69 + +Gisle Aas (4): + Documentation fix; encode_utf8 mixup [RT#71151] + Make it clearer that there are 2 (actually 3) options for handing "UTF-8 garbage" + Github is the official repo + Can't be bothered to try to fix the failures that occur on perl-5.6 + +Barbie (1): + fix to TokeParser to correctly handle option configuration + +Jon Jensen (1): + Aesthetic change: remove extra ; + +Ville Skyttä (1): + Trim surrounding whitespace from extracted URLs. + + +_______________________________________________________________________________ +2010-09-01 Release 3.68 + +Gisle Aas (1): + Declare the encoding of the POD to be utf8 + + +_______________________________________________________________________________ +2010-08-17 Release 3.67 + +Nicholas Clark (1): + bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368] + + +_______________________________________________________________________________ +2010-07-09 Release 3.66 + +Gisle Aas (1): + Fix entity decoding in utf8_mode for the title header + + +_______________________________________________________________________________ +2010-04-04 Release 3.65 + +Gisle Aas (1): + Eliminate buggy entities_decode_old + +Salvatore Bonaccorso (1): + Fixed endianness typo [RT#50811] + +Ville Skyttä (1): + Documentation fixes. + + +_______________________________________________________________________________ +2009-10-25 Release 3.64 + +Gisle Aas (5): + Convert files to UTF-8 + Don't allow decode_entities() to generate illegal Unicode chars + Copyright 2009 + Remove rendundant (repeated) test + Make parse_file() method use 3-arg open [RT#49434] + + + +_______________________________________________________________________________ +2009-10-22 Release 3.63 + +Gisle Aas (2): + Take more care to prepare the char range for encode_entities [RT#50170] + decode_entities confused by trailing incomplete entity + + + +_______________________________________________________________________________ +2009-08-13 Release 3.62 + +Ville Skyttä (4): + HTTP::Header doc typo fix. + Do not bother tracking style or script, they're ignored. + Bring HTML 5 head elements up to date with WD-html5-20090423. + Improve HeadParser performance. + +Gisle Aas (1): + Doc patch: Make it clearer what the return value from ->parse is + + + +_______________________________________________________________________________ +2009-06-20 Release 3.61 + +Gisle Aas (2): + Test that triggers the crash that Chip fixed + Complete documented list of literal tags + +Chip Salzenberg (1): + Avoid crash (referenced pend_text instead of skipped_text) + +Antonio Radici (1): + Reference HTML::LinkExttor [RT#43164] + + + +_______________________________________________________________________________ +2009-02-09 Release 3.60 + +Ville Skytta (5): + Spelling fixes. + Test multi-value headers. + Documentation improvements. + Do not terminate head parsing on the <object> element (added in HTML 4.0). + Add support for HTML 5 <meta charset> and new HEAD elements. + +Damyan Ivanov (1): + Short description of the htextsub example + +Mike South (1): + Suppress warning when encode_entities is called with undef [RT#27567] + +Zefram (1): + HTML::Parser doesn't compile with perl 5.8.0. + + + +_______________________________________________________________________________ +2008-11-24 Gisle Aas <gisle@ActiveState.com> + + Release 3.59 + + Restore perl-5.6 compatibility for HTML::HeadParser. + + Improved META.yml + + + +_______________________________________________________________________________ +2008-11-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.58 + + Suppress "Parsing of undecoded UTF-8 will give garbage" warning + with attr_encoded [RT#29089] + + HTML::HeadParser: + - Recognize the Unicode BOM in utf8_mode as well [RT#27522] + - Avoid ending up with '/' keys attribute in Link headers. + + + +_______________________________________________________________________________ +2008-11-16 Gisle Aas <gisle@ActiveState.com> + + Release 3.57 + + The <iframe> element content is now parsed in literal mode. + + Parsing of <script> and <style> content ends on the first end tag + even when that tag was in a quoted string. That seems to be the + behaviour of all modern browsers. + + Implement backquote() attribute as requested by Alex Kapranoff. + + Test and documentation tweaks from Alex Kapranoff. + + + +_______________________________________________________________________________ +2007-01-12 Gisle Aas <gisle@ActiveState.com> + + Release 3.56 + + Cloning of parser state for compatibility with threads. + Fixed by Bo Lindbergh <blgl@hagernas.com>. + + Don't require whitespace between declaration tokens. + <http://rt.cpan.org/Ticket/Display.html?id=20864> + + + +_______________________________________________________________________________ +2006-07-10 Gisle Aas <gisle@ActiveState.com> + + Release 3.55 + + Treat <> at the end of document as text. Used to be + reported as a comment. + + Improved Firefox compatibility for bad HTML: + - Unclosed <script>, <style> are now treated as empty tags. + - Unclosed <textarea>, <xmp> and <plaintext> treat rest as text. + - Unclosed <title> closes at next tag. + + Make <!a'b> a comment by itself. + + + +_______________________________________________________________________________ +2006-04-28 Gisle Aas <gisle@ActiveState.com> + + Release 3.54 + + Yaakov Belch discovered yet another issue with <script> parsing. + Enabling of 'empty_element_tags' got the parser confused + if it found such a tag for elements that are normally parsed + in literal mode. Of these <script src="..."/> is the only + one likely to be found in documents. + <http://rt.cpan.org//Ticket/Display.html?id=18965> + + + +_______________________________________________________________________________ +2006-04-27 Gisle Aas <gisle@ActiveState.com> + + Release 3.53 + + When ignore_element was enabled it got confused if the + corresponding tags did not nest properly; the end tag + was treated it as if it was a start tag. + Found and fixed by Yaakov Belch <code@yaakovnet.net>. + <http://rt.cpan.org/Ticket/Display.html?id=18936> + + + +_______________________________________________________________________________ +2006-04-26 Gisle Aas <gisle@ActiveState.com> + + Release 3.52 + + Make sure the 'start_document' fires exactly once for + each document parsed. For earlier releases it did not + fire at all for empty documents and could fire multiple + times if parse was called with empty chunks. + + Documentation tweaks and typo fixes. + + + +_______________________________________________________________________________ +2006-03-22 Gisle Aas <gisle@ActiveState.com> + + Release 3.51 + + Named entities outside the Latin-1 range are now only expanded + when properly terminated with ";". This makes HTML::Parser + compatible with Firefox/Konqueror/MSIE when it comes to how these + entities are expanded in attribute values. Firefox does expand + unterminated non-Latin-1 entities in plain text, so here + HTML::Parser only stays compatible with Konqueror/MSIE. + Fixes <http://rt.cpan.org/Ticket/Display.html?id=17962>. + + Fixed some documentation typos spotted by <william@knowmad.com>. + <http://rt.cpan.org/Ticket/Display.html?id=18062> + + + +_______________________________________________________________________________ +2006-02-14 Gisle Aas <gisle@ActiveState.com> + + Release 3.50 + + The 3.49 release didn't compile with VC++ because it mixed code + and declarations. Fixed by Steve Hay <steve.hay@uk.radan.com>. + + + +_______________________________________________________________________________ +2006-02-08 Gisle Aas <gisle@ActiveState.com> + + Release 3.49 + + Events could sometimes still fire after a handler has signaled eof. + + Marked_sections with text ending in square bracket parsed wrong. + Fix provided by <paul.bijnens@xplanation.com>. + <http://rt.cpan.org/Ticket/Display.html?id=16749> + + + +_______________________________________________________________________________ +2005-12-02 Gisle Aas <gisle@ActiveState.com> + + Release 3.48 + + Enabling empty_element_tags by default for HTML::TokeParser + was a mistake. Reverted that change. + <http://rt.cpan.org/Ticket/Display.html?id=16164> + + When processing a document with "marked_sections => 1", the + skipped text missed the first 3 bytes "<![". + <http://rt.cpan.org/Ticket/Display.html?id=16207> + + + +2005-11-22 Gisle Aas <gisle@ActiveState.com> + + Release 3.47 + + Added empty_element_tags and xml_pic configuration + options. These make it possible to enable these XML + features without enabling the full XML-mode. + + The empty_element_tags is enabled by default for + HTML::TokeParser. + + + +2005-10-24 Gisle Aas <gisle@ActiveState.com> + + Release 3.46 + + Don't try to treat an literal as space. + This breaks Unicode parsing. + <http://rt.cpan.org/Ticket/Display.html?id=15068> + + The unbroken_text option is now on by default + for HTML::TokeParser. + + HTML::Entities::encode will now encode "'" by default. + + Improved report/ignore_tags documentation by + Norbert Kiesel <nkiesel@tbdnetworks.com>. + + Test suite now use Test::More, by + Norbert Kiesel <nkiesel@tbdnetworks.com>. + + Fix HTML::Entities typo spotted by + Stefan Funke <bundy@adm.arcor.net>. + + Faster load time with XSLoader (perl-5.6 or better now required). + + Fixed POD markup errors in some of the modules. + + + +2005-01-06 Gisle Aas <gisle@ActiveState.com> + + Release 3.45 + + Fix stack memory leak caused by missing PUTBACK. Only + code that used $p->parse(\&cb) form was affected. + Fix provided by Gurusamy Sarathy <gsar@sophos.com>. + + + +2004-12-28 Gisle Aas <gisle@ActiveState.com> + + Release 3.44 + + Fix confusion about nested quotes in <script> and <style> text. + + + +2004-12-06 Gisle Aas <gisle@ActiveState.com> + + Release 3.43 + + The SvUTF8 flag was not propagated correctly when replacing + unterminated entities. + + Fixed test failure because of missing binmode on Windows. + + + +2004-12-04 Gisle Aas <gisle@ActiveState.com> + + Release 3.42 + + Avoid sv_catpvn_utf8_upgrade() as that macro was not + available in perl-5.8.0. + Patch by Reed Russell <Russell.Reed@acxiom.com>. + + Add casts to suppress compilation warnings for char/U8 + mismatches. + + HTML::HeadParser will always push new header values. + This make sure we never loose old header values. + + + +2004-11-30 Gisle Aas <gisle@ActiveState.com> + + Release 3.41 + + Fix unresolved symbol error with perl-5.005. + + + +2004-11-29 Gisle Aas <gisle@ActiveState.com> + + Release 3.40 + + Make utf8_mode only available on perl-5.8 or better. It produced + garbage with older versions of perl. + + Emit warning if entities are decoded and something in the first + chunk looks like hi-bit UTF-8. Previously this warning was only + triggered for documents with BOM. + + + +2004-11-23 Gisle Aas <gisle@ActiveState.com> + + Release 3.39_92 + + More documentation of the Unicode issues. Moved around HTML::Parser + documentation a bit. + + New boolean option; $p->utf8_mode to allow parsing of raw UTF-8. + + Documented that HTML::Entities::decode_entities() can take multiple + arguments. + + Unterminated entities are now decoded in text (compatibility + with MSIE misfeature). + + Document HTML::Entities::_decode_entities(); this variation of the + decode_entities() function has been available for a long time, but + have not been documented until now. + + HTML::Entities::_decode_entities() can now be told to try to + expand unterminated entities. + + Simplified Makefile.PL + + + +2004-11-23 Gisle Aas <gisle@ActiveState.com> + + Release 3.39_91 + + The HTML::HeadParser will skip Unicode BOM. Previously it + would consider the <head> section done when it saw the BOM. + + The parser will look for Unicode BOM and give appropriate + warnings if the form found indicate trouble. + + If no matching end tag is found for <script>, <style>, <xmp> + <title>, <textarea> then generate one where the next tag + starts. + + For <script> and <style> recognize quoted strings and don't + consider end element if the corresponding end tag is found + inside such a string. + + + +2004-11-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.39_90 + + The <title> element is now parsed in literal mode, which + means that other tags are not recognized until </title> has + been seen. + + Unicode support for perl-5.8 and better. + + Decoding Unicode entities always enabled; no longer a compile + time option. + + Propagation of UTF8 state on strings. + Patch contributed by John Gardiner Myers <jgmyers@proofpoint.com>. + + Calculate offsets and lengths in chars for Unicode strings. + + Fixed link typo in the HTML::TokeParser documentation. + + + +2004-11-11 Gisle Aas <gisle@ActiveState.com> + + Release 3.38 + + New boolean option; $p->closing_plaintext + Contributed by Alex Kapranoff <alex@kapranoff.ru> + + + +2004-11-10 Gisle Aas <gisle@ActiveState.com> + + Release 3.37 + + Improved handling of HTML encoded surrogate pairs and illegally + encoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>. + Patch by John Gardiner Myers <jgmyers@proofpoint.com>. + + Avoid generating bad UTF8 strings when decoding entities + representing chars beyond #255 in 8-bit strings. Such bad + UTF8 sometimes made perl-5.8.5 and older segfault. + + Undocument v2 style subclassing in synopsis section. + + Internal cleanup: + + Make 'gcc -Wall' happier. + + Avoid modification of PVs during parsing of attrspec. + Another patch by John Gardiner Myers. + + + +2004-04-01 Gisle Aas <gisle@ActiveState.com> + + Release 3.36 + + Improved MSIE/Mozilla compatibility. If the same attribute + name repeats for a start tag, use the first value instead + of the last. Patch by Nick Duffek <html-parser@duffek.com>. + <http://rt.cpan.org/Ticket/Display.html?id=5472> + + + +2003-12-12 Gisle Aas <gisle@ActiveState.com> + + Release 3.35 + + Documentation fixes by Paul Croome <Paul.Croome@softwareag.com>. + + Removed redundant dSP. + + + +2003-10-27 Gisle Aas <gisle@ActiveState.com> + + Release 3.34 + + Fix segfault that happened when the parse callback caused + the stack to get reallocated. The original bug report was + <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217616> + + + +2003-10-14 Gisle Aas <gisle@ActiveState.com> + + Release 3.33 + + Perl 5.005 or better is now required. For some reason we get + a test failure with perl-5.004 and I don't really feel like + debugging that perl any more. Details about this failure can + be found at <http://rt.cpan.org/Ticket/Display.html?id=4065>. + + New HTML::TokeParser method called 'get_phrase'. It returns + all current text while ignoring any phrase-level markup. + + The HTML::TokeParser method 'get_text' now expands skipped + non-phrase-level tags as a single space. + + + +2003-10-10 Gisle Aas <gisle@ActiveState.com> + + Release 3.32 + + If the document parsed ended with some kind of unterminated markup, + then the parser state was not reset properly and this piece of markup + would show up in the beginning of the next document parsed. + <http://rt.cpan.org/Ticket/Display.html?id=3954> + + The get_text and get_trimmed_text methods of HTML::TokeParser can + now take multiple end tags as argument. Patch by <siegmann@tinbergen.nl> + at <http://rt.cpan.org/Ticket/Display.html?id=3166>. + + Various documentation tweaks. + + Included another example program: hdump + + + +2003-08-19 Gisle Aas <gisle@ActiveState.com> + + Release 3.31 + + The -DDEBUGGING fix in 3.30 was not really there :-( + + + +2003-08-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.30 + + The previous release failed to compile on a -DDEBUGGING perl + like the one provided by Redhat 9. + + Got rid of references to perl-5.7. + + Further fixes to avoid warnings from Visual C. + Patch by Steve Hay <steve.hay@uk.radan.com>. + + + +2003-08-14 Gisle Aas <gisle@ActiveState.com> + + Release 3.29 + + Setting xml_mode now implies strict_names also for end tags. + + Avoid warning from Visual C. Patch by <gsar@activestate.com>. + + 64-bit fix from Doug Larrick <doug@ties.org> + http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=195500 + + Try to parse similar to Mozilla/MSIE in certain edge cases. + All these are outside of the official definition of HTML but + HTML spam often tries to take advantage of these. + + - New configuration attribute 'strict_end'. Unless enabled + we will allow end tags to contain extra words or stuff + that look like attributes before the '>'. This means that + tags like these: + + </foo foo="<ignored>"> + </foo ignored> + </foo ">" ignored> + + are now all parsed as a 'foo' end tag instead of text. + Even if the extra stuff looks like attributes they will not + be reported if requested via the 'attr' or 'tokens' argspecs + for the 'end' handler. + + - Parse '</:comment>' and '</ comment>' as comments unless + strict_comment is enabled. Previous versions of the parser + would report these as text. If these comments contain + quoted words prefixed by space or '=' these words can + contain '>' without terminating the comment. + + - Parse '<! "<>" foo>' as comment containing ' "<>" foo'. + Previous versions of the parser would terminate the comment + at the first '>' and report the rest as text. + + - Legacy comment mode: Parse with comments terminated with a + lone '>' if no '-->' is found before eof. + + - Incomplete tag at eof is reported as a 'comment' instead + of 'text' unless strict_comment is enabled. + + + +2003-04-16 Gisle Aas <gisle@ActiveState.com> + + Release 3.28 + + When 'strict_comment' is off (which it is by default) + treat anything that matches <!...> a comment. + + Should now be more efficient on threaded perls. + + + +2003-01-18 Gisle Aas <gisle@ActiveState.com> + + Release 3.27 + + Typo fixes to the documentation. + + HTML::Entities::escape_entities_numeric contributed + by Sean M. Burke <sburke@cpan.org>. + + Included one more example program 'hlc' that show + how to downcase all tags in an HTML file. + + + +2002-03-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.26 + + Avoid core dump in some cases where the callback croaks. + The perl_call_method and perl_call_sv needs G_EVAL flag + to be safe. + + New parser attributes; 'attr_encoded' and 'case_sensitive'. + Contributed by Guy Albertelli II <guy@albertelli.com>. + + HTML::Entities + - don't encode \r by default as suggested by Sean M. Burke. + + HTML::HeadParser + - ignore empty http-equiv + - allow multiple <link> elements. Patch by + Timur I. Bakeyev <timur@gnu.org> + + Avoid warnings from bleadperl on the uentities test. + + + +2001-05-11 Gisle Aas <gisle@ActiveState.com> + + Release 3.25 + + Minor tweaks for build failures on perl5.004_04, perl-5.6.0, + and for macro clash under Windows. + + Improved parsing of <plaintext>... :-) + + + +2001-05-09 Gisle Aas <gisle@ActiveState.com> + + Release 3.24 + + $p->parse(CODE) + + New events: start_document, end_document + + New argspecs: skipped_text, offset_end + + The offset/line/column counters was not properly reset + after eof. + + + +2001-05-01 Gisle Aas <gisle@ActiveState.com> + + Release 3.23 + + If the $p->ignore_elements filter did not work as it should if + handlers for start/end events was not registered. + + + +2001-04-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.22 + + The <textarea> element is now parsed in literal mode, i.e. no other tags + recognized until the </textarea> tag is seen. Unlike other literal elements, + the text content is not 'cdata'. + + The XML ' entity is decoded. It apos-char itself is still encoded as + ' as ' is not really an HTML tag, and not recognized by many HTML + browsers. + + + +2001-04-10 Gisle Aas <gisle@ActiveState.com> + + Release 3.21 + + Fix a memory leak which occurred when using filter methods. + + Avoid a few compiler warnings (DEC C): + - Trailing comma found in enumerator list + - "unsigned char" is not compatible with "const char". + + Doc update. + + + +2001-04-02 Gisle Aas <gisle@ActiveState.com> + + Release 3.20 + + Some minor documentation updates. + + + +2001-03-30 Gisle Aas <gisle@ActiveState.com> + + Release 3.19_94 + + Implemented 'tag', 'line', 'column' argspecs. + + HTML::PullParser doc update. + eg/hform is an example of HTML::PullParser usage. + + + +2001-03-27 Gisle Aas <gisle@ActiveState.com> + + Release 3.19_93 + + Shorten 'report_only_tags' to 'report_tags'. + I think it reads better. + + Bleadperl portability fixes. + + + +2001-03-25 Gisle Aas <gisle@ActiveState.com> + + Release 3.19_92 + + HTML::HeadParser made more efficient by using 'ignore_elements'. + + HTML::LinkExtor made more efficient by using 'report_only_tags'. + + HTML::TokeParser generalized into HTML::PullParser. HTML::PullParser + only support the get_token/unget_token interface of HTML::TokeParser, + but is more flexible because the information that make up an token + is customisable. HTML::TokeParser is made into an HTML::PullParser + subclass. + + + +2001-03-19 Gisle Aas <gisle@ActiveState.com> + + Release 3.19_91 + + Array references can be passed to the filter methods. Makes it easier + to use them as constructor options. + + Example programs updated to use filters. + + Reset ignored_element state on EOF. + + Documentation updates. + + The netscape_buggy_comment() method now generates mandatory warning + about its deprecation. + + + +2001-03-13 Gisle Aas <gisle@ActiveState.com> + + Release 3.19_90 + + This is an developer only release. It contains some new + experimental features. The interface to these might still change. + + Implemented filters to reduce the numbers of callbacks generated: + - $p->ignore_tags() + - $p->report_only_tags() + - $p->ignore_elements() + + New @attr argspec. Less overhead than 'attr' and allow + compatibility with XML::Parser style start events. + + The whole argspec can be wrapped up in @{...} to signal + flattening. Only makes a difference when the target is an + array. + + + +2001-03-09 Gisle Aas <gisle@ActiveState.com> + + Release 3.19 + + Avoid the entity2char global. That should make the module + more thread safe. Patch by Gurusamy Sarathy <gsar@ActiveState.com>. + + + +2001-02-24 Gisle Aas <gisle@ActiveState.com> + + Release 3.18 + + There was a C++ style comment left in util.c. Strict C + compilers do not like that kind of stuff. + + + +2001-02-23 Gisle Aas <gisle@ActiveState.com> + + Release 3.17 + + The 3.16 release broke MULTIPLICITY builds. Fixed. + + + +2001-02-22 Gisle Aas <gisle@ActiveState.com> + + Release 3.16 + + The unbroken_text option now works across ignored tags. + + Fix casting of pointers on some 64 bit platforms. + + Fix decoding of Unicode entities. Only optionally available for + perl-5.7.0 or better. + + Expose internal decode_entities() function at the Perl level. + + Reindented some code. + + + +2000-12-26 Gisle Aas <gisle@ActiveState.com> + + Release 3.15 + + HTML::TokeParser's get_tag() method now takes multiple + tags to match. Hopefully the documentation is also a bit clearer. + + #define PERL_NO_GET_CONTEXT: Should speed up things for thread + enabled versions of perl. + + Quote some more entities that also happens to be perl keywords. + This avoids warnings on perl-5.004. + + Unicode entities only triggered for perl-5.7.0 or higher. + + + +2000-12-03 Gisle Aas <gisle@ActiveState.com> + + Release 3.14 + + If a handler triggered by flushing text at eof called the + eof method then infinite recursion occurred. Fixed. + Bug discovered by Jonathan Stowe <gellyfish@gellyfish.com>. + + Allow <!doctype ...> to be parsed as declaration. + + + +2000-09-17 Gisle Aas <gisle@ActiveState.com> + + Release 3.13 + + Experimental support for decoding of Unicode entities. + + + +2000-09-14 Gisle Aas <gisle@ActiveState.com> + + Release 3.12 + + Some tweaks to get it to compile with "Optimierender Microsoft (R) + 32-Bit C/C++-Compiler, Version 12.00.8168, fuer x86." + Patch by Matthias Waldorf <matthias.waldorf@zoom.de>. + + HTML::Entities documentation spelling patch by + David Dyck <dcd@tc.fluke.com>. + + + +2000-08-22 Gisle Aas <gisle@ActiveState.com> + + Release 3.11 + + HTML::LinkExtor and eg/hrefsub now obtain %linkElements from + the HTML::Tagset module. + + + +2000-06-29 Gisle Aas <gisle@ActiveState.com> + + Release 3.10 + + Avoid core dump when stack gets relocated as the result of + text handler invocation while $p->unbroken_text is enabled. + Needed to refresh the stack pointer. + + + +2000-06-28 Gisle Aas <gisle@ActiveState.com> + + Release 3.09 + + Avoid core dump if somebody clobbers the aliased $self argument of + a handler. + + HTML::TokeParser documentation update suggested by + Paul Makepeace <Paul.Makepeace@realprogrammers.com>. + + + +2000-05-23 Gisle Aas <gisle@ActiveState.com> + + Release 3.08 + + Fix core dump for large start tags. + Bug spotted by Alexander Fraser <green795@hotmail.com> + + Added yet another example program: eg/hanchors + + Typo fix by Jamie McCarthy <jamie@mccarthy.org> + + + +2000-03-20 Gisle Aas <gisle@aas.no> + + Release 3.07 + + Fix perl5.004 builds (was broken in 3.06) + + Declaration parsing mode now only triggers for <!DOCTYPE ...> and + <!ENTITY ...>. Based on patch by la mouton <kero@3sheep.com>. + + + +2000-03-06 Gisle Aas <gisle@aas.no> + + Release 3.06 + + Multi-threading/MULTIPLICITY compilation fix. + Both Doug MacEachern <dougm@pobox.com> and + Matthias Urlichs <smurf@noris.net> provided a patch. + + Avoid some "statement not reached" warnings from picky + compilers. + + Remove final commas in enums as ANSI C does not allow + them and some compilers actually care. + Patch by James Walden <jamesw@ichips.intel.com> + + Added eg/htextsub example program. + + + +2000-01-22 Gisle Aas <gisle@aas.no> + + Release 3.05 + + Implemented $p->unbroken_text option + + Don't parse content of certain HTML elements as CDATA when + xml_mode is enabled. + + Offset was reported with wrong sign for text at end of chunk. + + + +2000-01-15 Gisle Aas <gisle@aas.no> + + Release 3.04 + + Backed out 3.03-patch that checked for legal handler and attribute + names in the HTML::Parser constructor. + + Documentation typo fixed by Michael. + + + +2000-01-14 Gisle Aas <gisle@aas.no> + + Release 3.03 + + We did not get out of comment mode for comments ending with an + odd number of "-" before ">". Patch by la mouton <kero@3sheep.com> + + Documentation patch by Michael. + + + +1999-12-21 Gisle Aas <gisle@aas.no> + + Release 3.02 + + Hide ~-magic IV-pointer to 'struct p_state' behind a reference. + This allow copying of the internal _hparser_xs_state element, and + will make HTML-Tree-0.61 work again. + + Introduced $p->init() which might be useful for subclasses that + only want the initialization part of the constructor. + + Filled out DIAGNOSTICS section of the HTML::Parser POD. + + + +1999-12-19 Gisle Aas <gisle@aas.no> + + Release 3.01 + + Rely on ~-magic instead of a DESTROY method to deallocate + the internal 'struct p_state'. This avoid memory leaks + when people simply wipe of the content of the object hash. + + One of the assertion in hparser.c had opposite logic. This made + the parser fail when compiled with a -DDEBUGGING perl. + + Don't assume any specific order of hash keys in the t/cases.t. + This test failed with some newer development releases of perl. + + + +1999-12-14 Gisle Aas <gisle@aas.no> + + Release 3.00 + + Documentation update (most of it from Michael) + + Minor patch to eg/hstrip so that it use a "" handler + instead of &ignore. + + Test suite patches from Michael + + + +1999-12-13 Gisle Aas <gisle@aas.no> + + Release 2.99_96 + + Patches from Michael: + + - A handler of "" means that the event will be ignored. + More efficient than using 'sub {}' as handler. + + - Don't use a perl hash for looking up argspec keywords. + + - Documentation tweaks. + + + +1999-12-09 Gisle Aas <gisle@aas.no> + + Release 2.99_95 (this is a 3.00 candidate) + + Fixed core dump when "<" was followed by an 8-bit character. + Spotted and test case provided by Doug MacEachern. Doug had + been running HTML-Parser-XS through more that 1 million urls that + had been downloaded via LWP. + + Handlers can now invoke $p->eof to request the parsing to terminate. + HTML::HeadParser has been simplified by taking advantage of this. + Also added a title-extraction example that uses this. + + Michael once again fixed my bad English in the HTML::Parser + documentation. + + netscape_buggy_comment will carp instead of warn + + updated TODO/README + + Documented that HTML::Filter is depreciated. + + Made backslash reserved in literal argspec strings. + + Added several new test scripts. + + + +1999-12-08 Gisle Aas <gisle@aas.no> + + Release 2.99_94 (should almost be a 3.00 candidate) + + Renamed 'cdata_flag' as 'is_cdata'. + + Dropped support for wrapping callback handler and argspec + in an array and passing a reference to $p->handler. It + created ambiguities when you want to pass a array as + handler destination and not update argspec. The wrapping + for constructor arguments are unchanged. + + Reworked the documentation after updates from Michael. + + Simplified internal check_handler(). It should probably simply + be inlined in handler() again. + + Added argspec 'length' and 'undef' + + Fix statement-less label. Fix suggested by Matthew Langford + <langfml@Eng.Auburn.EDU>. + + Added two more example programs: eg/hstrip and eg/htext. + + Various minor patches from Michael. + + + +1999-12-07 Gisle Aas <gisle@aas.no> + + Release 2.99_93 + + Documentation update + + $p->bool_attr_value renamed as $p->boolean_attribute_value + + Internal renaming: attrspec --> argspec + + Introduced internal 'enum argcode' in hparser.c + + Added eg/hrefsub + + + +1999-12-05 Gisle Aas <gisle@aas.no> + + Release 2.99_92 + + More documentation patches from Michael + + Renamed 'token1' as 'token0' as suggested by Michael + + For artificial end tags we now report 'tokens', but not 'tokenpos'. + + Boolean attribute values show up as (0, 0) in 'tokenpos' now. + + If $p->bool_attr_value is set it will influence 'tokens' + + Fix for core dump when parsing <a "> when $p->strict_names(0). + Based on fix by Michael. + + Will av_extend() the tokens/tokenspos arrays. + + New test suite script by Michael: t/attrspec.t + + + +1999-12-04 Gisle Aas <gisle@aas.no> + + Release 2.99_91 + + Implemented attrspec 'offset' + + Documentation patch from Michael + + Some more cleanup/updated TODO + + + +1999-12-03 Gisle Aas <gisle@aas.no> + + Release 2.99_90 (first beta for 3.00) + + Using "realloc" as a parameter name in grow_tokens created + problems for some people. Fix by Paul Schinder <schinder@pobox.com> + + Patch by Michael that makes array handler destinations really work. + + Patch by Michael that make HTML::TokeParser use this. This gave a + a speedup of about 80%. + + Patch by Michael that makes t/cases into a real test. + + Small HTML::Parser documentation patch by Michael. + + Renamed attrspec 'origtext' to 'text' and 'decoded_text' to 'dtext' + + Split up Parser.xs. Moved stuff into hparser.c and util.c + + Dropped html_ prefix from internal parser functions. + + Renamed internal function html_handle() as report_event(). + + + +1999-12-02 Gisle Aas <gisle@aas.no> + + Release 2.99_17 + + HTML::Parser documentation patch from Michael. + + Fix memory leaks in html_handler() + + Patch that makes an array legal as handler destination. + Also from Michael. + + The end of marked sections does not eat successive newline + any more. + + The artificial end event for empty tag in xml_mode did not + report an empty origtext. + + New constructor option: 'api_version' + + + +1999-12-01 Gisle Aas <gisle@aas.no> + + Release 2.99_16 + + Support "event" in argspec. It expands to the name of the + handler (minus "default"). + + Fix core dump for large start tags. The tokens_grow() routine + needed an adjustment. Added test for this; t/largstags.t. + + + +1999-11-30 Gisle Aas <gisle@aas.no> + + Release 2.99_15 + + Major restructuring/simplification of callback interface based on + initial work by Michael. The main news is that you now need to + tell what arguments you want to be provided to your callbacks. + + The following parser options has been eliminated: + + $p->decode_text_entities + $p->keep_case + $p->v2_compat + $p->pass_self + $p->attr_pos + + + +1999-11-26 Gisle Aas <gisle@aas.no> + + Release 2.99_14 + + Documentation update by Michael A. Chase. + + Fix for declaration parsing by Michael A. Chase. + + Workaround for perl5.004_05 bug. Can't return &PL_sv_undef. + + + +1999-11-22 Gisle Aas <gisle@aas.no> + + Release 2.99_13 + + New Parser.pm POD based on initial work by Michael A. Chase. + All new features should now be described. + + $p->callback(start => undef) will not reset the callback. + + $p->xml_mode() did not parse attributes correct because + HCTYPE_NOT_SPACE_EQ_SLASH_GT flag was never set. + + A few more tests. + + + +1999-11-18 Gisle Aas <gisle@aas.no> + + Release 2.99_12 + + Implemented $p->attr_pos attribute. This causes attr positions + within $origtext of the start tag to be reported instead of the + attribute values. The positions are reported as 4 numbers; end of + previous attr, start of this attr, start of attr value, and end of + attr. This should make substr() manipulations of $origtext easy. + + Implemented $p->unbroken_text attribute. This makes sure that + text segments are never broken and given back as separate text + callbacks. It delays text callbacks until some other markup + has been recognized. + + More English corrections by Michael A. Chase. + + HTML::LinkExtor now recognizes even more URI attributes as + suggested by Sean M. Burke <sburke@netadventure.net> + + Completed marked sections support. It is also now a compile + time decision if you want this supported or not. The only + drawback of enabling it should be a possible parsing speed + reduction. I have not measured this yet. + + The keys for callbacks initialized in the constructor are now + suffixed with "_cb". + + Renamed $p->pass_cbdata to $p->pass_self. + + Added magic number to the p_state struct. + + + +1999-11-17 Gisle Aas <gisle@aas.no> + + Release 2.99_11 + + Don't leak $@ modifications from HTML::Parser constructor. + + Included HTML::Parser POD. + + Marked sections almost work. CDATA and RCDATA should work. + + For tags that take us into literal_mode; <script>, <style>, + <xmp>, we did not recognize the end tag unless it was written + in all lower case. + + + +1999-11-16 Gisle Aas <gisle@aas.no> + + Release 2.99_10 + + The mkhctype and mkpfunc scripts were using \z inside RE. This + did not work for perl5.004. Replaced them with plain old + dollar signs. + + + +1999-11-15 Gisle Aas <gisle@aas.no> + + Release 2.99_09 + + Grammar fixes by Michael A. Chase <mchase@ix.netcom.com> + + Some more test suite patches for Win32 by Michael A. Chase + <mchase@ix.netcom.com> + + Implemented $p->strict_names attribute. By default we now + allow almost anything in tag and attribute names. This is much + closer to the behaviour of some popular browsers. This allows us + to parse broken tags like this example from the LWP mailing list: + <IMG ALIGN=MIDDLE SRC=newprevlstGr.gif ALT=[PREV LIST] BORDER=0> + + Introduced some tables in "hctype.h" and "pfunc.h". These + are built by the corresponding "mk..." script. + + + +1999-11-10 Gisle Aas <gisle@aas.no> + + Release 2.99_08 + + Make Parser.xs compile on perl5.004_05 too. + + New callback called 'default'. This will be called for any + document text no other callback shows an interest in. + + Patch by Michael A. Chase <mchase@ix.netcom.com> that should + help clean up files for the test suite on Win32. + + Can now set up various attributes with key/value pairs passed to + the constructor. + + $p->parse_file() will open the file in binmode() + + Pass complete processing instruction tag as second argument + to process callback. + + New boolean attribute v2_compat. This influences how attributes + are reported for start tags. + + HTML::Filter now filters process instructions too. + + Faster HTML::LinkExtor by taking advantage of the new + callback interface. The module now also uses URI.pm (instead + of the old URI::URL) to absolutize URIs. + + Faster HTML::TokeParser by taking advantage of new + accum interface. + + + +1999-11-09 Gisle Aas <gisle@aas.no> + + Release 2.99_07 + + Entities in attribute values are now always expanded. + + If you set the $p->decode_text_entities to a true value, then + you don't have to decode the text yourself. + + In xml_mode we don't report empty element tags as a start tag + with an extra parameter any more. Instead we generate an artificial + end tag. + + 'xml_mode' now implies 'keep_case'. + + The parser now keeps its own copy of the bool_attr_value value. + + Avoid memory leak for text callbacks + + Avoid using ERROR as a goto label. + + Introduced common internal accessor function for all boolean parser + attributes. + + Tweaks to make Parser.xs compile under perl5.004. + + + +1999-11-08 Gisle Aas <gisle@aas.no> + + Release 2.99_06 + + Internal fast decode_entities(). By using it we are able to make + the HTML::Entities::decode function 6 times faster than the old one + implemented in pure Perl. + + $p->bool_attr_value() can be set to influence the value that + boolean attributes will be assigned. The default is to assign + a value identical to the attribute name. + + Process instructions are reported as "PI" in @accum + + $p->xml_mode(1) modifies how processing instructions are terminated + and allows "/>" at the end of start tags. + + Turn off optimizations when compiling with gcc on Solaris. Avoids + what we believe to be a compiler bug. Should probably figure out + which versions of gcc have this bug. + + + +1999-11-05 Gisle Aas <gisle@aas.no> + + Release 2.99_05 + + The previous release did not even compile. I forgot to try 'make test' + before uploading. + + + +1999-11-05 Gisle Aas <gisle@aas.no> + + Release 2.99_04 + + Generalized <XMP>-support to cover all literal parsing. Currently + activated for <script>, <style>, <xmp> and <plaintext>. + + + +1999-11-05 Gisle Aas <gisle@aas.no> + + Release 2.99_03 + + <XMP>-support. + + Allow ":" in tag and attribute names + + Include rest of the HTML::* files from the old HTML::Parser + package. This should make testing easier. + + + +1999-11-04 Gisle Aas <gisle@aas.no> + + Release 2.99_02 + + Implemented keep_case() option. If this attribute is true, then + we don't lowercase tag and attribute names. + + Implemented accum() that takes an array reference. Tokens are + pushed onto this array instead of sent to callbacks. + + Implemented strict_comment(). + + + +1999-11-03 Gisle Aas <gisle@aas.no> + + Release 2.99_01 + + Baseline of XS implementation + + + +1999-11-05 Gisle Aas <gisle@aas.no> + + Release 2.25 + + Allow ":" in attribute names as a workaround for Microsoft Excel + 2000 which generates such files. + + Make deprecate warning if netscape_buggy_comment() method is + used. The method is used in strict_comment(). + + Avoid duplication of parse_file() method in HTML::HeadParser. + + + +1999-10-29 Gisle Aas <gisle@aas.no> + + Release 2.24 + + $p->parse_file() will not close a handle passed to it any more. + If passed a filename that can't be opened it will return undef + instead of raising an exception, and strings like "*STDIN" are not + treated as globs any more. + + HTML::LinkExtor knows about background attribute of <tables>. + Patch by Clinton Wong <clintdw@netcom.com> + + HTML::TokeParser will parse large inline strings much faster now. + The string holding the document must not be changed during parsing. + + + +1999-06-09 Gisle Aas <gisle@aas.no> + + Release 2.23 + + Documentation updates. + + + +1998-12-18 Gisle Aas <aas@sn.no> + + Release 2.22 + + Protect HTML::HeadParser from evil $SIG{__DIE__} hooks. + + + +1998-11-13 Gisle Aas <aas@sn.no> + + Release 2.21 + + HTML::TokeParser can now parse strings directly and does the + right thing if you pass it a GLOB. Based on patch by + Sami Itkonen <si@iki.fi>. + + HTML::Parser now allows space before and after "--" in Netscape + comments. Patch by Peter Orbaek <poe@daimi.au.dk>. + + + +1998-07-08 Gisle Aas <aas@sn.no> + + Release 2.20 + + Added HTML::TokeParser. Check it out! + + + +1998-07-07 Gisle Aas <aas@sn.no> + + Release 2.19 + + Don't end a text chunk with space when we try to avoid breaking up + words. + + + +1998-06-22 Gisle Aas <aas@sn.no> + + Release 2.18 + + HTML::HeadParser->parse_file will now stop parsing when the + <body> starts as it should. + + HTML::LinkExtor more easily subclassable by introducing the + $self->_found_link method. + + + +1998-04-28 Gisle Aas <aas@sn.no> + + Release 2.17 + + Never split words (a sequence of non-space) between two invocations + of $self->text. This is just a simplification of the code that tried + not to break entities. + + HTML::Parser->parse_file now use smaller chunks as already + suggested by the HTML::Parser documentation. + + + +1998-04-02 Gisle Aas <aas@sn.no> + + Release 2.16 + + The HTML::Parser could some times break hex entities (like ) + in the middle. + + Removed remaining forced dependencies on libwww-perl modules. It + means that all tests should now pass, even if libwww-perl was not + installed previously. + + More tests. + + + +1998-04-01 Gisle Aas <aas@sn.no> + + Release 2.14, HTML::* modules unbundled from libwww-perl-5.22. |