So I'm trying to use the following code to scrape all the tags from a website where the href attribute matches the pattern /how-to-use/[a-zA-Z]+
The code is here:
import requests
from bs4 import BeautifulSoup
import re
webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content
soup = BeautifulSoup(webpage, "html.parser")
def has_how_to_use(tag):
pattern = re.compile('\/how-to-use\/[a-zA-Z]+')
return bool(re.search(pattern, tag.attr('href')))
word_list = soup.find_all(has_how_to_use)
but I keep getting an error about not being able to call a NoneType object, I'm just not sure which bit is evaluating as a NoneType object
source https://stackoverflow.com/questions/69744366/cant-get-beautiful-soup-to-work-with-my-callback-function-and-regex
Comments
Post a Comment