How can I remove the "\n" character in a large file with Python?

I have a large .txt file containing numerous email addresses, but it also contains many unnecessary "\n" characters. I want to extract only the email addresses and remove any other characters.

To accomplish this, I have written a small script in Python.

import re

filename = "input.txt"
output_filename = "output.txt"
email_regex = r'\s*([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,})\s*'

with open(filename, "r") as f, open(output_filename, "w") as out:
    for line in f:
        emails = re.findall(email_regex, line)
        for email in emails:
            out.write(email + "\n")

While the script successfully extracted regular email addresses, it encountered some difficulties with certain formats.

As an example, suppose I have a line of data that reads "CC\nexample@example.com\n". When I run my code, the resulting output is "nexample@example.com", which is not what I intended. Rather, I would like the output to be "example@example.com" without the leading "n" character."

Next, I tested another small Python script for a single email address, and the results were successful.

import re

string = "CC\nexample@example.com\n"
email_regex = r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}'

email = re.search(email_regex, string).group()

print(email)

So I want to get same result from a large file. If you have a solution for this, it would be good for me.

source https://stackoverflow.com/questions/76095385/how-can-i-remove-the-n-character-in-a-large-file-with-python

StacksPedia

Search This Blog

How can I remove the "\n" character in a large file with Python?

Labels

Comments

Post a Comment

Popular posts from this blog

Confusion between commands.Bot and discord.Client | Which one should I use?

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?