- Python Web Scraping Cookbook
- Michael Heydt
- 122字
- 2021-06-30 18:44:02
How to do it...
Now let's start playing with XPath and CSS selectors. The following selects all <tr> elements with a class equal to "planet":
In [2]: [(v, v.xpath("@name")) for v in tree.cssselect('tr.planet')]
Out[2]:
[(<Element tr at 0x10d3a2278>, ['Mercury']),
(<Element tr at 0x10c16ed18>, ['Venus']),
(<Element tr at 0x10e445688>, ['Earth']),
(<Element tr at 0x10e477228>, ['Mars']),
(<Element tr at 0x10e477408>, ['Jupiter']),
(<Element tr at 0x10e477458>, ['Saturn']),
(<Element tr at 0x10e4774a8>, ['Uranus']),
(<Element tr at 0x10e4774f8>, ['Neptune']),
(<Element tr at 0x10e477548>, ['Pluto'])]
Data for the Earth can be found in several ways. The following gets the row based on id:
In [3]: tr = tree.cssselect("tr#planet3")
...: tr[0], tr[0].xpath("./td[2]/text()")[0].strip()
...:
Out[3]: (<Element tr at 0x10e445688>, 'Earth')
The following uses an attribute with a specific value:
In [4]: tr = tree.cssselect("tr[name='Pluto']")
...: tr[0], tr[0].xpath("td[2]/text()")[0].strip()
...:
Out[5]: (<Element tr at 0x10e477548>, 'Pluto')
Note that unlike XPath, the @ symbol need not be used to specify an attribute.