- Python Web Scraping Cookbook
- Michael Heydt
- 237字
- 2021-06-30 18:44:10
How to do it - posting messages to an AWS queue
The 03/create_messages.py file contains code to read the planets data and to post the URL in the MoreInfo property to an SQS queue:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import boto3
import botocore
# declare our keys (normally, don't hard code this)
access_key="AKIAIXFTCYO7FEL55TCQ"
access_secret_key="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"
# create sqs client
sqs = boto3.client('sqs', "us-west-2",
aws_access_key_id = access_key,
aws_secret_access_key = access_secret_key)
# create / open the SQS queue
queue = sqs.create_queue(QueueName="PlanetMoreInfo")
print (queue)
# read and parse the planets HTML
html = urlopen("http://127.0.0.1:8080/pages/planets.html")
bsobj = BeautifulSoup(html, "lxml")
planets = []
planet_rows = bsobj.html.body.div.table.findAll("tr", {"class": "planet"})
for i in planet_rows:
tds = i.findAll("td")
# get the URL
more_info_url = tds[5].findAll("a")[0]["href"].strip()
# send the URL to the queue
sqs.send_message(QueueUrl=queue["QueueUrl"],
MessageBody=more_info_url)
print("Sent %s to %s" % (more_info_url, queue["QueueUrl"]))
Run the code in a terminal and you will see output similar to the following:
{'QueueUrl': 'https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo', 'ResponseMetadata': {'RequestId': '2aad7964-292a-5bf6-b838-2b7a5007af22', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 28 Aug 2017 20:02:53 GMT', 'content-type': 'text/xml', 'content-length': '336', 'connection': 'keep-alive', 'x-amzn-requestid': '2aad7964-292a-5bf6-b838-2b7a5007af22'}, 'RetryAttempts': 0}}
Sent https://en.wikipedia.org/wiki/Mercury_(planet) to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Venus to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Earth to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Mars to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Jupiter to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Saturn to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Uranus to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Neptune to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Sent https://en.wikipedia.org/wiki/Pluto to https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Now go into the AWS SQS console. You should see the queue has been created and that it holds 9 messages:
The Queue in SQS