Simplifying/collapsing blocks of df.iterrows() where another row is selected based on a value in the current row
I need help :) I've got a python script in which a snippet that repeats df.iterrows() a couple of times takes a good half hour to run. I've seen a lot of examples using lambda functions but this is where apparently I meet my limitations. It's too complex for me to figure out I guess.
Current code (simplified snippet):
for index, row in df.iterrows():
if row['series'] == -1 and row['sort'] == 1 and row['amount_t-2'] != 0:
index_match = (df[(df['sort'] == 4) & (df['year'] == (row['year']-1)) & (df['amount_t-1'] == (row['amount_t-2'])) & (df['cola'] == (row['cola'])) & (df_int['colb'] == (row['colb']))].index.tolist())
if len(index_match) == 1:
df.loc[index, 'series'] = df.loc[index_match[0],'series']
df.loc[index, 'colc'] = 'done'
df.loc[index, 'cold'] = ''
This snippet is repeated a number of times for different configurations/values. What's done here is looping through the entire dataset, creating a time series by setting the 'series' column to the same (index) number if certain conditions are met (i.e. if the values are linked)
Little background: the dataset consists of budgetary tables in which each row represents the state of a budget article for a certain budget phase/time (sort field: 1=budget, 2=first supplement, 3=second supplement, 4=actuals). In order to be able to trace a certain budget article through time, I want to link all relevant rows through a common value for the field 'series'.
When searching for a row to link to the current row, I filter the df by: 'previous phase' (sort-1 or in this case 4 if sort=1), 'year of previous phase' (same year for sort=2,3,4, previous if sort=1), matching value/amounts and matching article name etc.
So far, I've been meddling with something like the following code, but cannot get it to work. Am I at least close?
df['series'] = df.apply(lambda x: df.query("sort == 4 and year == @x['year']-1 and `amount_t-1` == @x['amount_t-2'] and cola == @x['cola'] and colb == @x['colb']").index[0] if x['series'] == -1 and x['sort'] == 1 and x['amount_t-2'] != 0, axis=1)
Thanks in advance,
Chris
source https://stackoverflow.com/questions/71071125/simplifying-collapsing-blocks-of-df-iterrows-where-another-row-is-selected-bas
Comments
Post a Comment