2025 | Data Prototyping and Scientific Computing
  • Syllabus
  • Calendar
  • Notes
  • Worksheets

Webscraper

Date Tue 09 September 2025 Tags Worksheet

Simple web scraper¶


Note:

Check out http://datascience.ibm.com/blog/markdown-for-jupyter-notebooks-cheatsheet/
Introduction to Jupyter: https://www.youtube.com/watch?v=Rc4JQWowG5I

In [ ]:
# Modules or libraries get imported here in Python, Julia, and R #
import re
import requests
from bs4 import BeautifulSoup
import time

Load a webpage into memory and parse it using Beautiful Soup¶

In [ ]:
# lyriclink = 'https://www.reddit.com/'
lyriclink = 'https://www.cnn.com/'
webpg = requests.get(lyriclink)

soup = BeautifulSoup(webpg.content,"html.parser")
In [ ]:
#soup.findAll(text=re.compile('Irma'))
#print(webpg.content)
In [ ]:
count=0
keyword = 'AI'
for lyrics in soup.findAll(string=re.compile(keyword)):
    count+=1
print("The Word "+ keyword + " showed up",count,"times!!")

For homework, modify the variables lyriclink and the keyword to scrape a different website for a different word then upload the modified code to the In-Class Assignment for Week01 on Brightspace.

In [ ]:
print(soup.prettify())
In [ ]:
 

© 2025 Brice Loose · Powered by pelican-bootstrap3, Pelican, Bootstrap

Back to top