Similar time submissions for tests

I have a dataframe for each of my online students shaped like this:

df=pd.DataFrame(columns=["Class","Student","Challenge","Time"])

I create a temporary df for each of the challenges like this:

for ch in challenges:
    df=main_df.loc[(df['Challenge'] == ch)]

I want to group the records that are 5 minutes apart from each other. The idea is that I will create a df that shows a group of students working together and answering the questions relatively the same time. I have different aspects of catching cheaters but I need to be able to show that two or more students have been answering the questions relatively the same time.

I was thinking of using resampling or using a grouper method but I'm a bit confused on how to implement it. Could someone guide me to the right direction for solving for this?

Sample df:

df = pd.DataFrame(
    data={
        "Class": ["A"] * 4 + ["B"] * 2,
        "Student": ['Scooby','Daphne','Shaggy','Fred','Velma','Scrappy'],
        "Challenge": ["Challenge3"] *6,
        "Time": ['07/10/2022 08:22:44','07/10/2022 08:27:22','07/10/2022 08:27:44','07/10/2022 08:29:55','07/10/2022 08:33:14','07/10/2022 08:48:44'],
    }   
)

EDIT:

For the output I was thinking of having another df with an extra column called 'Grouping' and have an incremental number for the groups that were discovered. The final df would first be empty and then appended or concated with the grouping df.

new_df = pd.dataframe(columns=['Class','Student','Challenge','Time','Grouping'])

new_df = pd.concat([new_df,df])

  Class  Student   Challenge                 Time  Grouping
1     A   Daphne  Challenge3  07/10/2022 08:27:22         1
2     A   Shaggy  Challenge3  07/10/2022 08:27:44         1
3     A     Fred  Challenge3  07/10/2022 08:29:55         1

The purpose of this is so that I may have multiple samples to verify if the same two or more students are sharing answers.

EDIT 2:

I was thinking that I could do a lambda based operation. What if I created another column called "Threshold_Delta" that calculates the difference in time from one answer to another? Then I need to figure out how to group those minutes up.

df['Challenge_Delta'] = (df['Time']-df['Time'].shift())

source https://stackoverflow.com/questions/73081852/similar-time-submissions-for-tests

StacksPedia

Search This Blog

Similar time submissions for tests

Labels

Comments

Post a Comment

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input