- Python Programming Blueprints
- Daniel Furtado Marcus Pennington
- 550字
- 2021-06-24 18:53:47
Adding helper methods
To start with, we need to import some packages:
import re
from weatherterm.core import Forecast
from weatherterm.core import Request
from weatherterm.core import Unit
from weatherterm.core import UnitConverter
And in the initializer, we are going to add the following code:
self._base_url = 'http://weather.com/weather/{forecast}/l/{area}'
self._request = Request(self._base_url)
self._temp_regex = re.compile('([0-9]+)\D{,2}([0-9]+)')
self._only_digits_regex = re.compile('[0-9]+')
self._unit_converter = UnitConverter(Unit.FAHRENHEIT)
In the initializer, we define the URL template we are going to use to perform requests to the weather website; then, we create a Request object. This is the object that will perform the requests for us.
Regular expressions are only used when parsing today's weather forecast temperatures.
We also define a UnitConverter object and set the default unit to Fahrenheit.
Now, we are ready to start adding two methods that will be responsible for actually searching for HTML elements within a certain class and return its contents. The first method is called _get_data:
def _get_data(self, container, search_items):
scraped_data = {}
for key, value in search_items.items():
result = container.find(value, class_=key)
data = None if result is None else result.get_text()
if data is not None:
scraped_data[key] = data
return scraped_data
The idea of this method is to search items within a container that matches some criteria. The container is just a DOM element in the HTML and the search_items is a dictionary where the key is a CSS class and the value is the type of the HTML element. It can be a DIV, SPAN, or anything that you wish to get the value from.
It starts looping through search_items.items() and uses the find method to find the element within the container. If the item is found, we use get_text to extract the text of the DOM element and add it to a dictionary that will be returned when there are no more items to search.
The second method that we will implement is the _parser method. This will make use of the _get_data that we just implemented:
def _parse(self, container, criteria):
results = [self._get_data(item, criteria)
for item in container.children]
return [result for result in results if result]
Here, we also get a container and criteria like the _get_data method. The container is a DOM element and the criterion is a dictionary of nodes that we want to find. The first comprehension gets all the container's children elements and passes them to the _get_data method.
The results will be a list of dictionaries with all the items that have been found, and we will only return the dictionaries that are not empty.
There are only two more helper methods we need to implement in order to get today's weather forecast in place. Let's implement a method called _clear_str_number:
def _clear_str_number(self, str_number):
result = self._only_digits_regex.match(str_number)
return '--' if result is None else result.group()
This method will use a regular expression to make sure that only digits are returned.
And the last method that needs to be implemented is the _get_additional_info method:
def _get_additional_info(self, content):
data = tuple(item.td.span.get_text()
for item in content.table.tbody.children)
return data[:2]
This method loops through the table rows, getting the text of every cell. This comprehension will return lots of information about the weather, but we are only interested in the first 2, the wind and the humidity.