Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Fixed code that was causing deprecation warnings in recent Python 3 | Leonard Richardson | 2018-07-14 | 1 | -3/+3 |
| | | | versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496] | ||||
* | Indentation change contributed by Pranav Salunke. | Leonard Richardson | 2016-12-19 | 1 | -2/+2 |
|\ | |||||
| * | Minor change. Extra indent for character so it looks nicer. | Pranav Salunke | 2016-04-06 | 1 | -2/+2 |
| | | |||||
* | | Use a dedicated logger instead of the root logger. [bug=1511661] | Leonard Richardson | 2016-07-17 | 1 | -1/+1 |
| | | |||||
* | | Use a dedicated logger instead of the root logger. [bug=1511661] | Leonard Richardson | 2016-07-17 | 1 | -3/+4 |
| | | |||||
* | | Removed imports to pdb, since pdb is not available in some environments. ↵ | Leonard Richardson | 2016-07-16 | 1 | -1/+0 |
| | | | | | | | | [bug=1491700] | ||||
* | | Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file. | Leonard Richardson | 2016-07-16 | 1 | -0/+2 |
|/ | |||||
* | Add a __license__ statement to all source files. | Leonard Richardson | 2015-09-28 | 1 | -0/+1 |
| | |||||
* | Unicode data cannot have a byte-order mark. Returning early stops a warning ↵ | Leonard Richardson | 2015-07-03 | 1 | -0/+3 |
| | | | | from happening. | ||||
* | Added an exclude_encodings argument to UnicodeDammit and to the | Leonard Richardson | 2015-06-27 | 1 | -3/+9 |
| | | | | Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] | ||||
* | Added a sanity check helper method that makes sure all the elements of a ↵ | Leonard Richardson | 2015-06-26 | 1 | -1/+2 |
| | | | | tree are properly connected via .next_element and .previous_element. | ||||
* | Fixed a crash in Unicode, Dammit's encoding detector when the name | Leonard Richardson | 2015-06-25 | 1 | -1/+1 |
| | | | of the encoding itself contained invalid bytes. [bug=1360913] | ||||
* | Fixed a bug that caused Unicode data put into UnicodeDammit to | Leonard Richardson | 2013-10-02 | 1 | -6/+9 |
| | | | return None instead of the original data. [bug=1214983] | ||||
* | Inlined some commonly called code to save a function call. | Leonard Richardson | 2013-06-03 | 1 | -4/+4 |
| | |||||
* | Limit how much of the document is searched via regular expression for a ↵ | Leonard Richardson | 2013-06-03 | 1 | -4/+11 |
| | | | | declared encoding. | ||||
* | Turns out we had two bits of code to strip byte-order marks. | Leonard Richardson | 2013-06-02 | 1 | -34/+43 |
| | |||||
* | It turns out most of the untested code wasn't doing anything useful. | Leonard Richardson | 2013-06-02 | 1 | -108/+20 |
| | |||||
* | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | 2013-05-31 | 1 | -5/+16 |
| | |||||
* | Refactored code a bit. | Leonard Richardson | 2013-05-30 | 1 | -14/+13 |
| | |||||
* | Split out the code that guesses at encodings from the code that tries to ↵ | Leonard Richardson | 2013-05-30 | 1 | -128/+189 |
| | | | | decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself. | ||||
* | The default XML formatter will now replace ampersands even if they appear to ↵ | Leonard Richardson | 2013-05-20 | 1 | -0/+25 |
| | | | | be part of entities. That is, "<" will become "&lt;".[bug=1182183] | ||||
* | Doc fixes. | Leonard Richardson | 2012-11-03 | 1 | -1/+0 |
| | |||||
* | Fixed cchardet import. | Leonard Richardson | 2012-08-17 | 1 | -3/+3 |
| | |||||
* | Mentioned cchardet in docs. | Leonard Richardson | 2012-07-03 | 1 | -1/+1 |
| | |||||
* | When sniffing encodings, if the cchardet library is installed, use it ↵ | Leonard Richardson | 2012-07-03 | 1 | -10/+22 |
| | | | | instead of chardet. It's much faster. [bug=1020748] | ||||
* | Use logging.warning() instead of warning.warn() to notify the user that ↵ | Leonard Richardson | 2012-07-03 | 1 | -4/+3 |
| | | | | characters were replaced with REPLACEMENT CHARACTER. [bug=1013862] | ||||
* | Comments, processing instructions, document type declarations, and markup ↵ | Leonard Richardson | 2012-05-24 | 1 | -11/+18 |
| | | | | declarations are now treated as preformatted strings, the way CData blocks are. [bug=1001025] Also in this commit: renamed detwingle method to detwingle(). | ||||
* | Fixed the handling of " with the built-in parser. [bug=993871] | Leonard Richardson | 2012-05-03 | 1 | -7/+7 |
| | |||||
* | Added experimental support for fixing Windows-1252 characters embedded in ↵ | Leonard Richardson | 2012-04-27 | 1 | -0/+196 |
| | | | | UTF-8 documents. | ||||
* | Fixed a bug in decoding data that contained a byte-order mark, such as data ↵ | Leonard Richardson | 2012-04-26 | 1 | -20/+28 |
| | | | | encoded in UTF-16LE. [bug=988980] | ||||
* | Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters. | Leonard Richardson | 2012-04-16 | 1 | -8/+148 |
| | |||||
* | Attribute values are now run through the provided output formatter. ↵ | Leonard Richardson | 2012-04-16 | 1 | -33/+37 |
| | | | | Previously they were always run through the 'minimal' formatter. [bug=980237] | ||||
* | Issue a warning if characters were replaced with REPLACEMENT CHARACTER ↵ | Leonard Richardson | 2012-02-16 | 1 | -0/+5 |
| | | | | during Unicode conversion. | ||||
* | As a last-ditch attempt to turn data into Unicode, use errors=replace ↵ | Leonard Richardson | 2012-02-09 | 1 | -9/+25 |
| | | | | instead of errors=strict. | ||||
* | Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like ↵ | Leonard Richardson | 2012-02-09 | 1 | -2/+4 |
| | | | | <meta charset="utf-8" />. [bug=837268] | ||||
* | Minor Unicode, Dammit cleanup. | Leonard Richardson | 2012-02-09 | 1 | -11/+11 |
| | |||||
* | Improved Unicode, Dammit's behavior when you give it Unicode to begin with. | Leonard Richardson | 2012-02-09 | 1 | -2/+4 |
| | |||||
* | Various changes so most tests pass on Python 3. | Thomas Kluyver | 2011-06-29 | 1 | -33/+33 |
| | |||||
* | OK, figured that out. | Leonard Richardson | 2011-05-21 | 1 | -7/+6 |
|\ | |||||
| * | Changed dammit.py to require fewer changes to be Python 3 compatible. | Leonard Richardson | 2011-05-21 | 1 | -7/+6 |
| | | |||||
* | | PEP8ifying | Aaron DeVore | 2011-03-05 | 1 | -45/+46 |
|/ | |||||
* | Added a tree builder for the built-in HTMLParser, and tests. | Leonard Richardson | 2011-02-27 | 1 | -3/+5 |
| | |||||
* | Renamed the beautifulsoup module to bs4 to save typing. | Leonard Richardson | 2011-02-27 | 1 | -0/+410 |