Simple web scraper¶
Note:
Check out http://datascience.ibm.com/blog/markdown-for-jupyter-notebooks-cheatsheet/
Introduction to Jupyter: https://www.youtube.com/watch?v=Rc4JQWowG5I
In [ ]:
# Modules or libraries get imported here in Python, Julia, and R #
import re
import requests
from bs4 import BeautifulSoup
import time
Load a webpage into memory and parse it using Beautiful Soup¶
In [ ]:
# lyriclink = 'https://www.reddit.com/'
lyriclink = 'https://www.cnn.com/'
webpg = requests.get(lyriclink)
soup = BeautifulSoup(webpg.content,"html.parser")
In [ ]:
#soup.findAll(text=re.compile('Irma'))
#print(webpg.content)
In [ ]:
count=0
keyword = 'AI'
for lyrics in soup.findAll(string=re.compile(keyword)):
count+=1
print("The Word "+ keyword + " showed up",count,"times!!")
For homework, modify the variables lyriclink
and the keyword
to scrape a different website for a different word then upload the modified code to the In-Class Assignment for Week01 on Brightspace.
In [ ]:
print(soup.prettify())
In [ ]: