I am working with a large pandas DataFrame (over 1M rows) that I need to group by one column, and then apply a custom function to each group. The DataFrame df has the following structure:
import pandas as pd
data = {
'Category': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'B'],
'Value': [10, 20, 30, 40, 50, 60, 70, 80]
}
df = pd.DataFrame(data)
print(df)
Output:
Category Value
0 A 10
1 B 20
2 A 30
3 B 40
4 A 50
5 A 60
6 B 70
7 B 80
My custom function calculates the sum of squares of the 'Value' for each 'Category'. Here's the function:
def sum_of_squares(group):
return (group**2).sum()
Currently, I'm applying this function to each group like so:
df.groupby('Category')['Value'].apply(sum_of_squares)
But as the DataFrame is very large, this operation takes quite a while to complete. I'm wondering if there is a more efficient way to achieve this, especially when working with large DataFrames.
Would appreciate any guidance on this. Thank you!
source https://stackoverflow.com/questions/76392727/efficiently-applying-custom-function-to-pandas-dataframe-groups
Comments
Post a Comment