I have a dataframe for each of my online students shaped like this:
df=pd.DataFrame(columns=["Class","Student","Challenge","Time"])
I create a temporary df for each of the challenges like this:
for ch in challenges:
df=main_df.loc[(df['Challenge'] == ch)]
I want to group the records that are 5 minutes apart from each other. The idea is that I will create a df that shows a group of students working together and answering the questions relatively the same time. I have different aspects of catching cheaters but I need to be able to show that two or more students have been answering the questions relatively the same time.
I was thinking of using resampling or using a grouper method but I'm a bit confused on how to implement it. Could someone guide me to the right direction for solving for this?
Sample df:
df = pd.DataFrame(
data={
"Class": ["A"] * 4 + ["B"] * 2,
"Student": ['Scooby','Daphne','Shaggy','Fred','Velma','Scrappy'],
"Challenge": ["Challenge3"] *6,
"Time": ['07/10/2022 08:22:44','07/10/2022 08:27:22','07/10/2022 08:27:44','07/10/2022 08:29:55','07/10/2022 08:33:14','07/10/2022 08:48:44'],
}
)
EDIT:
For the output I was thinking of having another df with an extra column called 'Grouping' and have an incremental number for the groups that were discovered. The final df would first be empty and then appended or concated with the grouping df.
new_df = pd.dataframe(columns=['Class','Student','Challenge','Time','Grouping'])
new_df = pd.concat([new_df,df])
Class Student Challenge Time Grouping
1 A Daphne Challenge3 07/10/2022 08:27:22 1
2 A Shaggy Challenge3 07/10/2022 08:27:44 1
3 A Fred Challenge3 07/10/2022 08:29:55 1
The purpose of this is so that I may have multiple samples to verify if the same two or more students are sharing answers.
EDIT 2:
I was thinking that I could do a lambda based operation. What if I created another column called "Threshold_Delta" that calculates the difference in time from one answer to another? Then I need to figure out how to group those minutes up.
df['Challenge_Delta'] = (df['Time']-df['Time'].shift())
source https://stackoverflow.com/questions/73081852/similar-time-submissions-for-tests
Comments
Post a Comment