Skip to main content

Can't get beautiful soup to work with my callback function and regex

So I'm trying to use the following code to scrape all the tags from a website where the href attribute matches the pattern /how-to-use/[a-zA-Z]+

The code is here:

import requests
from bs4 import BeautifulSoup
import re

webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content
soup = BeautifulSoup(webpage, "html.parser")

def has_how_to_use(tag):
    pattern = re.compile('\/how-to-use\/[a-zA-Z]+')
    return bool(re.search(pattern, tag.attr('href')))

word_list = soup.find_all(has_how_to_use)

but I keep getting an error about not being able to call a NoneType object, I'm just not sure which bit is evaluating as a NoneType object



source https://stackoverflow.com/questions/69744366/cant-get-beautiful-soup-to-work-with-my-callback-function-and-regex

Comments