Goal:
- Compare 2 CSV files (Pandas DataFrames)
- If
user_id
value matches in rows, add values ofcountry
andyear_of_birth
columns from one DataFrame into corresponding row/columns in second DataFrame - Create new CSV file from resulting "full" (updated) DataFrame
The below code works, but it takes a LONG time when the CSV files are large. I have to imagine there is a better way to do this, I just haven't been able to figure it out.
Current code:
#!/usr/bin/env python
# Imports
import pandas as pd
# Variables / Arrays
import_optin_file_name = "full_data.csv"
import_extravars_file_name = "id-country-dob.csv"
export_file_name = "new_list.csv"
# Create DataFrames from 2 imported CSV files
optin_infile = pd.read_csv(import_optin_file_name)
extravars_infile = pd.read_csv(import_extravars_file_name)
# Create/Insert new columns to "optin_infile" dataframe (country,year_of_birth) with initial value of NULL
optin_infile.insert(loc=7, column='country', value='NULL')
optin_infile.insert(loc=8, column='year_of_birth', value='NULL')
# Iterate through rows, compare user_id - if match, add value to country, year_of_birth columns
for ev_index, ev_row in extravars_infile.iterrows():
for optin_index, optin_row in optin_infile.iterrows():
if ev_row['user_id'] == optin_row['user_id']:
optin_infile.loc[optin_index, 'country'] = ev_row['country']
optin_infile.loc[optin_index, 'year_of_birth'] = ev_row['year_of_birth']
# export to new CSV file
optin_infile.to_csv(export_file_name, header=True, index=False)
Any suggestions are greatly appreciated! Thanks!
source https://stackoverflow.com/questions/71288187/python-better-way-to-compare-2-csv-files-and-add-2-new-columns-if-condition-me
Comments
Post a Comment