Skip to main content

Similar time submissions for tests

I have a dataframe for each of my online students shaped like this:

df=pd.DataFrame(columns=["Class","Student","Challenge","Time"])

I create a temporary df for each of the challenges like this:

for ch in challenges:
    df=main_df.loc[(df['Challenge'] == ch)]

I want to group the records that are 5 minutes apart from each other. The idea is that I will create a df that shows a group of students working together and answering the questions relatively the same time. I have different aspects of catching cheaters but I need to be able to show that two or more students have been answering the questions relatively the same time.

I was thinking of using resampling or using a grouper method but I'm a bit confused on how to implement it. Could someone guide me to the right direction for solving for this?

Sample df:

df = pd.DataFrame(
    data={
        "Class": ["A"] * 4 + ["B"] * 2,
        "Student": ['Scooby','Daphne','Shaggy','Fred','Velma','Scrappy'],
        "Challenge": ["Challenge3"] *6,
        "Time": ['07/10/2022 08:22:44','07/10/2022 08:27:22','07/10/2022 08:27:44','07/10/2022 08:29:55','07/10/2022 08:33:14','07/10/2022 08:48:44'],
    }   
)

EDIT:

For the output I was thinking of having another df with an extra column called 'Grouping' and have an incremental number for the groups that were discovered. The final df would first be empty and then appended or concated with the grouping df.

new_df = pd.dataframe(columns=['Class','Student','Challenge','Time','Grouping'])

new_df = pd.concat([new_df,df])

  Class  Student   Challenge                 Time  Grouping
1     A   Daphne  Challenge3  07/10/2022 08:27:22         1
2     A   Shaggy  Challenge3  07/10/2022 08:27:44         1
3     A     Fred  Challenge3  07/10/2022 08:29:55         1

The purpose of this is so that I may have multiple samples to verify if the same two or more students are sharing answers.

EDIT 2:

I was thinking that I could do a lambda based operation. What if I created another column called "Threshold_Delta" that calculates the difference in time from one answer to another? Then I need to figure out how to group those minutes up.

df['Challenge_Delta'] = (df['Time']-df['Time'].shift())


source https://stackoverflow.com/questions/73081852/similar-time-submissions-for-tests

Comments

Popular posts from this blog

How to split a rinex file if I need 24 hours data

Trying to divide rinex file using the command gfzrnx but getting this error. While doing that getting this error msg 'gfzrnx' is not recognized as an internal or external command Trying to split rinex file using the command gfzrnx. also install'gfzrnx'. my doubt is I need to run this program in 'gfzrnx' or in 'cmdprompt'. I am expecting a rinex file with 24 hrs or 1 day data.I Have 48 hrs data in RINEX format. Please help me to solve this issue. source https://stackoverflow.com/questions/75385367/how-to-split-a-rinex-file-if-i-need-24-hours-data

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input

So, I am trying to predict the model but its throwing error like it has 10 features but it expacts only 1. So I am confused can anyone help me with it? more importantly its not working for me when my friend runs it. It works perfectly fine dose anyone know the reason about it? cv = KFold(n_splits = 10) all_loss = [] for i in range(9): # 1st for loop over polynomial orders poly_order = i X_train = make_polynomial(x, poly_order) loss_at_order = [] # initiate a set to collect loss for CV for train_index, test_index in cv.split(X_train): print('TRAIN:', train_index, 'TEST:', test_index) X_train_cv, X_test_cv = X_train[train_index], X_test[test_index] t_train_cv, t_test_cv = t[train_index], t[test_index] reg.fit(X_train_cv, t_train_cv) loss_at_order.append(np.mean((t_test_cv - reg.predict(X_test_cv))**2)) # collect loss at fold all_loss.append(np.mean(loss_at_order)) # collect loss at order plt.plot(np.log(al...

Sorting large arrays of big numeric stings

I was solving bigSorting() problem from hackerrank: Consider an array of numeric strings where each string is a positive number with anywhere from to digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and return the sorted array. I know it works as follows: def bigSorting(unsorted): return sorted(unsorted, key=int) But I didnt guess this approach earlier. Initially I tried below: def bigSorting(unsorted): int_unsorted = [int(i) for i in unsorted] int_sorted = sorted(int_unsorted) return [str(i) for i in int_sorted] However, for some of the test cases, it was showing time limit exceeded. Why is it so? PS: I dont know exactly what those test cases were as hacker rank does not reveal all test cases. source https://stackoverflow.com/questions/73007397/sorting-large-arrays-of-big-numeric-stings