summaryrefslogtreecommitdiff
path: root/TODO.txt
blob: e57d799f8cda3c09cce6b161590d87722632e2b0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Optimizations
-------------

The html5lib tree builder doesn't use the standard tree-building API,
which worries me. (This may also be why the tree builder doesn't
support SoupStrainers, but I think that has more to do with the fact
that the html5lib tree builder is constantly rearranging the tree, and
will crash if something it parsed earlier didn't actually make it into
the tree.)

markup_attr_map can be optimized since it's always a map now.

CDATA
-----

The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. Except it doesn't. (This argument is also
present for HTMLParser, and also does nothing there.)

Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.