From b3aa1fe88487ea8fbd4533d410d2fa26962ed608 Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Mon, 24 Dec 2018 09:00:11 -0500 Subject: Rewrote select() documentation and namespace example. --- doc/source/index.rst | 74 +++++++++++++++++++++++++--------------------------- 1 file changed, 36 insertions(+), 38 deletions(-) diff --git a/doc/source/index.rst b/doc/source/index.rst index f1a006e..2977029 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -1662,10 +1662,22 @@ tag it contains. CSS selectors ------------- -Beautiful Soup supports a large number of CSS selectors via `Soup Sieve -`_, only a small portion will be -discussed here. To select tags, just pass a string into the ``.select()`` method -of a ``Tag`` object or the ``BeautifulSoup`` object itself. +As of version 4.7.0, Beautiful Soup supports most CSS4 selectors via +the `SoupSieve `_ +project. If you installed Beautiful Soup through ``pip``, SoupSieve +was installed at the same time, so you don't have to do anything extra. + +``BeautifulSoup`` has a ``.select()`` method which uses SoupSieve to +run a CSS selector against a parsed document and return all the +matching elements. ``Tag`` has a similar method which runs a CSS +selector against the contents of a single tag. + +(Earlier versions of Beautiful Soup also have the ``.select()`` +method, but only the most commonly-used CSS selectors are supported.) + +The SoupSieve `documentation +`_ lists all the currently +supported CSS selectors, but here are some of the basics: You can find tags:: @@ -1762,49 +1774,35 @@ Find tags by attribute value:: soup.select('a[href*=".com/el"]') # [Elsie] -Match language codes:: - - multilingual_markup = """ -

Hello

-

Howdy, y'all

-

Pip-pip, old fruit

-

Bonjour mes amis

- """ - multilingual_soup = BeautifulSoup(multilingual_markup) - multilingual_soup.select('p[lang|=en]') - # [

Hello

, - #

Howdy, y'all

, - #

Pip-pip, old fruit

] - -Find only the first tag that matches a selector:: +There's also a method called ``select_one()``, which finds only the +first tag that matches a selector:: soup.select_one(".sister") # Elsie -You can also use namespaces as well, provided you define namespaces:: +If you've parsed XML that defines namespaces, you can use them in CSS +selectors. You just have to pass a dictionary of the namespace +mappings into ``select()``:: - import bs4 - xml = """ - ... + from bs4 import BeautifulSoup + xml = """ + I'm in namespace 1 + I'm in namespace 2 """ - namespaces = {"xyz": "http://namespaceuri.com/namespace"} - soup = bs4.BeautifulSoup(xml, "lxml-xml") - soup.select("xyz|el") - # [...] - -As Soup Sieve is inlcuded with Beautiful soup, you can also use it directly -on ``BeautifulSoup`` and ``Tag`` objects. + soup = BeautifulSoup(xml, "xml") -This is all a convenience for users who know the CSS selector syntax. You -can do all this stuff with the Beautiful Soup API. And if CSS -selectors are all you need, you might as well use lxml directly: it's -a lot faster. But this lets you `combine` complex CSS selectors with the -Beautiful Soup API. + soup.select("child") + # [I'm in namespace 1, I'm in namespace 2] -To learn more about all the CSS selectors supported, or to learn how to use -SoupSieve's API directly, checkout its `documentation -`_. + namespaces = dict(ns1="http://namespace1/", ns2="http://namespace2/") + soup.select("ns1|child", namespaces=namespaces) + # [I'm in namespace 1] +All of this is a convenience for people who know the CSS selector +syntax. You can do all this stuff with the Beautiful Soup API. And if +CSS selectors are all you need, you should parse the document +with lxml: it's a lot faster. But this lets you `combine` CSS +selectors with the Beautiful Soup API. Modifying the tree ================== -- cgit v1.2.1