How would I use a regular expression to match a string with or without parentheses at the end in Python3
I've never used regular expressions but know a little bit of the syntax after reading a bit on the python website.
I'm trying to write a regular expression that would match to entries in a text document. Each entry is on a newline.
For example here's a portion of the document I'm searching:
ABSOLVE AH0 B Z AA1 L V
ABSOLVE(1) AE0 B Z AA1 L V
ABSOLVED AH0 B Z AA1 L V D
ABSOLVED(1) AE0 B Z AA1 L V D
ABSOLVES AH0 B Z AA1 L V Z
ABSOLVES(1) AE0 B Z AA1 L V Z
if I had the string 'ABSOLVE'
that I'm searching the document for I want to find 'ABSOLVE'
, 'ABSOLVE(1)'
, but NOT 'ABSOLVED'
or anything else listed. Each line in this text document has a word followed by 2 spaces then some other data that should be collected after a match is found.
I'm just not sure if I should use $
or \b
flags in an re or what. My idea was to use something like
re.sub(r'([A-Z])(.*?)( )', r'\1\3', text)
Then just search from there. But this seems really inefficient considering I have a huge file (>100K lines). Is there a better way to use regex in this case? would I want to use re.sub() to strip the parentheses or re.search() here? It's a really big file I'm searching. Is there an expression I can use to collect the data after match until it hits a newline? I'd really appreciate any help here and just some advice for using regular expressions. Thanks
source https://stackoverflow.com/questions/71164672/how-would-i-use-a-regular-expression-to-match-a-string-with-or-without-parenthes
Comments
Post a Comment