I want to edit an HTML document and parse some text using Beautifulsoup. I'm interested in <span> tags but the ones that are NOT inside a <table> element. I want to skip all tables when finding the <span> elements.
I've tried to find all <span> elements first and then filter out the ones that have <table> in any parent level. Here is the code. But this is too slow.
for tag in soup.find_all('span'):
ancestor_tables = [x for x in tag.find_all_previous(name='table')]
if len(ancestor_tables) > 0:
continue
text = tag.text
Is there a more efficient alternative? Is it possible to 'hide' / skip tags while searching for <span> in find_all method?
source https://stackoverflow.com/questions/74538402/how-to-skip-a-tag-when-using-beautifulsoup-find-all
Comments
Post a Comment