Menu Close

Web Scraping

With this project I attempted to write a little python script designed to scrape weather information from this website here:  https://paulbosker.com/2022/08/21/weather/.   My intention was to schedule the scraping utility on some frequency (hourly?) and design the utility to save the information in a csv file on the web server.   Having this historical weather information I would be able to present the data in graphical forms on this website.

There were a couple things I struggled with:

  1. The web server is maintained on a Docker container which makes it more difficult to save files in the appropriate places.   I’m sure there is a way to accomplish this but I gave up researching it after an hour.
  2. Another thing I struggled with was the graphical chart on the website and designing it dynamic enough to simply read the updated csv file rather than manually uploading to the web server.
 
Below is the python code used to scrape my website:
 
import requests
from bs4 import BeautifulSoup

url = "https://paulbosker.com/2022/08/21/weather/"
outfl = ""

result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")

temp = doc.find_all(class_="lws-widget-big-value")[0]
press = doc.find_all(class_="lws-widget-med-value")[0]
humid = doc.find_all(class_="lws-widget-med-value")[1]
wind = doc.find_all(class_="lws-widget-big-value")[1]
rain = doc.find_all(class_="lws-widget-small-value-up")[2]
htemp = doc.find_all(class_="lws-widget-small-value-up")[0]
gust = doc.find_all(class_="lws-widget-small-value-up")[1]
ltemp = doc.find_all(class_="lws-widget-small-value-down")[0]
dt = doc.find_all(class_="elementor-shortcode")[0]

for t in temp:
pass
for p in press:
pass
for h in humid:
pass
for w in wind:
pass
for r in rain:
pass
for ht in htemp:
pass
for g in gust:
pass
for lt in ltemp:
pass
for d in dt:
pass

outfl = str(d) + "," + str(t) + "," + str(p) + "," + str(h) + "," + str(w) + "," + str(r)

f = open("/home/paul/python/weather_scrape.csv", "a+")
f.seek(0)
data = f.read(40)
f.write(str(outfl))
f.write("\n")
f.close()
 

After letting the program run for a couple days, I realized I had an issue.    The program was looking for the data in specific locations on the website.    For example, rainfall was always expected to be in the same position on the page.   I found that this was not always the case.    If there was no rain on a particular day, the rainfall amount didn’t even appear on the page which threw off where everything else was located on the page.  

I re-coded the program to be more dynamic in where it found particular pieces of information.    The program is now using the unit of measurement to determine what each item is.    So if a particular item drops off the web page, I can still be sure that everything else is accurate.    Below is the updated program:

 

import requests
from bs4 import BeautifulSoup

url = "https://paulbosker.com/2022/08/21/weather/"
outfl = ""
have_t = "N"
have_h = "N"
have_p = "N"

result = requests.get(url)
soup = BeautifulSoup(result.text, "html.parser")
lists = soup.find_all('div',class_="lws-widget-row")
dt = soup.find_all(class_="elementor-shortcode")[0]
for d in dt:
    pass

for list1 in lists:
    node = list1.find('div', class_="lws-widget-big-value")
    if node is not None:
        num = list1.find('div', class_="lws-widget-big-value").text
        if list1.find('div', class_="lws-widget-big-unit").text == "°F":
            if have_t == "N":
                t = num
                have_t = "Y"
        elif list1.find('div', class_="lws-widget-big-unit").text == "mph":
            w = num
        elif list1.find('div', class_="lws-widget-big-unit").text == "in/h":
            r = num

for list2 in lists:
    node = list2.find('div', class_="lws-widget-med-value")
    if node is not None:
        num = list2.find('div', class_="lws-widget-med-value").text
        if list2.find('div', class_="lws-widget-med-unit").text == "%":
            if have_h == "N":
                h = num
                have_h = "Y"
        elif list2.find('div', class_="lws-widget-med-unit").text == "inHg":
            if have_p == "N":
                p = num
                have_p = "Y"
outfl = str(d) + "," + str(t) + "," + str(w) + "," + str(h) + "," + str(p) + "," + str(r)

f = open("/home/paul/python/weather_scrape.csv", "a+")
f.seek(0)
data = f.read(40)
f.write(str(outfl))
f.write("\n")
f.close()

 

Leave a Reply

Your email address will not be published. Required fields are marked *