I am trying to compare two sets of data, each with a long list of categorical variables using Pandas and Matplotlib. I want to get and somehow store the frequency of values for each variable using the value_counts() method for each data set so that I can later compare the two for statistically significant differences in those frequencies.
As of now I just have a function to display the values and counts for each column in a data frame as pie charts, given a list of columns (cat_columns) which is defined outside of the function:
def getCat(data):
for column in cat_columns:
plt.figure()
df[column].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title(f"Distribution of {column} Patients in {dataname}")
plt.ylabel('')
getCat(df)
Is it possible to append/store the returned values of value_counts() into a new DataFrame object corresponding to the each original data set so I can access and operate on those values later?
TIA!
source https://stackoverflow.com/questions/74174473/how-can-i-best-compare-the-frequency-of-categorical-values-from-two-datasets-wit
Comments
Post a Comment