Skip to main content

How would I use a regular expression to match a string with or without parentheses at the end in Python3

I've never used regular expressions but know a little bit of the syntax after reading a bit on the python website.

I'm trying to write a regular expression that would match to entries in a text document. Each entry is on a newline.

For example here's a portion of the document I'm searching:

ABSOLVE  AH0 B Z AA1 L V
ABSOLVE(1)  AE0 B Z AA1 L V
ABSOLVED  AH0 B Z AA1 L V D
ABSOLVED(1)  AE0 B Z AA1 L V D
ABSOLVES  AH0 B Z AA1 L V Z
ABSOLVES(1)  AE0 B Z AA1 L V Z

if I had the string 'ABSOLVE' that I'm searching the document for I want to find 'ABSOLVE', 'ABSOLVE(1)', but NOT 'ABSOLVED' or anything else listed. Each line in this text document has a word followed by 2 spaces then some other data that should be collected after a match is found.

I'm just not sure if I should use $ or \b flags in an re or what. My idea was to use something like

re.sub(r'([A-Z])(.*?)(  )', r'\1\3', text)

Then just search from there. But this seems really inefficient considering I have a huge file (>100K lines). Is there a better way to use regex in this case? would I want to use re.sub() to strip the parentheses or re.search() here? It's a really big file I'm searching. Is there an expression I can use to collect the data after match until it hits a newline? I'd really appreciate any help here and just some advice for using regular expressions. Thanks



source https://stackoverflow.com/questions/71164672/how-would-i-use-a-regular-expression-to-match-a-string-with-or-without-parenthes

Comments