TODO.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Bugs
----

* I think whitespace may not be processed correctly.

* html5lib doesn't support SoupStrainers, which is OK, but there
  should be a warning about it.

Big features
------------

* Add namespace support.

Optimizations
-------------

markup_attr_map can be optimized since it's always a map now.

BS3 features not yet ported
---------------------------

* In BS3, "soup.aTag" is the same as 'soup.find("a")'. This lets you
locate a tag called (let's say) "find" with attribute
access. "soup.find" won't do what you want, but "soup.findTag" will.

This still works In BS4 but it's deprecated. I could make
"soup.find_tag" work the same way as "soup.find('find')", but I don't
think it's worth it.

CDATA
-----

The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. Except it doesn't. (This argument is also
present for HTMLParser, and also does nothing there.)

Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.