书名：Python Web Scraping Cookbook
作者名：Michael Heydt
本章字数：122字
更新时间：2025-02-26 12:46:23

How to do it...

Now let's start playing with XPath and CSS selectors. The following selects all <tr> elements with a class equal to "planet":

In [2]: [(v, v.xpath("@name")) for v in tree.cssselect('tr.planet')]
Out[2]:
[(<Element tr at 0x10d3a2278>, ['Mercury']),
 (<Element tr at 0x10c16ed18>, ['Venus']),
 (<Element tr at 0x10e445688>, ['Earth']),
 (<Element tr at 0x10e477228>, ['Mars']),
 (<Element tr at 0x10e477408>, ['Jupiter']),
 (<Element tr at 0x10e477458>, ['Saturn']),
 (<Element tr at 0x10e4774a8>, ['Uranus']),
 (<Element tr at 0x10e4774f8>, ['Neptune']),
 (<Element tr at 0x10e477548>, ['Pluto'])]

Data for the Earth can be found in several ways. The following gets the row based on id:

In [3]: tr = tree.cssselect("tr#planet3")
   ...: tr[0], tr[0].xpath("./td[2]/text()")[0].strip()
   ...:
Out[3]: (<Element tr at 0x10e445688>, 'Earth')

The following uses an attribute with a specific value:

In [4]: tr = tree.cssselect("tr[name='Pluto']")
   ...: tr[0], tr[0].xpath("td[2]/text()")[0].strip()
   ...:
Out[5]: (<Element tr at 0x10e477548>, 'Pluto')

Note that unlike XPath, the @ symbol need not be used to specify an attribute.