I want to edit an HTML document and parse some text using Beautifulsoup. I'm interested in <span>
tags but the ones that are NOT inside a <table>
element. I want to skip all tables when finding the <span>
elements.
I've tried to find all <span>
elements first and then filter out the ones that have <table>
in any parent level. Here is the code. But this is too slow.
for tag in soup.find_all('span'):
ancestor_tables = [x for x in tag.find_all_previous(name='table')]
if len(ancestor_tables) > 0:
continue
text = tag.text
Is there a more efficient alternative? Is it possible to 'hide' / skip tags while searching for <span>
in find_all
method?
source https://stackoverflow.com/questions/74538402/how-to-skip-a-tag-when-using-beautifulsoup-find-all
Comments
Post a Comment