Sorry in advance for a potentially duplicate question, I'm somewhat new to re
and can't find an answer
I have a string something like this:
<foo><bar><biz><Baz><buz>Extract Me!</span><foo><bar>
I want to extract Extract Me!
, which is between a >
and the only </span>
that appears in the string. I tried >(.*)</span>
, but that extracts ><bar><biz><Baz><buz>Extract Me!</span>
.
Edit:
1:
This question got closed, linking to this as a duplicate, but making the regular expression >(.*)</span>
"non greedy" by turning it into >(.*?)</span>
yields the same result. I had already attempted this before posting.
2:
After some discussions, I was recommended to just use BeautifulSoup
, which makes sense. I've solved the issue with re.search(r'(?:>)(\b.*)(<\/span>)'
, but I'll provide a bit more code so further exploration can be done.
So:
Unveiling the curtain a bit, this is the pseudo code of what I'm working with:
src = selenium_driver.page_source
soup = BeautifulSoup(src)
list_of_things = soup.findAll(True, {'class':['list of classes']})
for thing in list_of_things:
print(type(thing))
print(thing)
extract_extractMe() # <- do stuff
The result of print(type(thing))
and print(thing)
would be something like this:
#type
<class 'bs4.element.Tag'>
#thing
<li class="property-item"><div class="property-text"><span data-spm-anchor-id="a2g0o.detail.1000016.i2.11ab42d1npvRCb">14CM</span></div></li>
I'm trying to extract 14CM
from each "thing"
source https://stackoverflow.com/questions/74892989/python-re-extract-smallest-substring-between-two-strings
Comments
Post a Comment