I am using pandarallel to multiprocess the apply function of a dataframe. But, the progress bar freezes sometimes showing a red bar or without a red bar.
I tried to catch the exception but it did not work always. Sometimes, it shows the error "connection aborted.', ConnectionResetError(54, 'Connection reset by peer)" but sometimes, it does not go into the except block.
import time
from tqdm import tqdm
from pandarallel import pandarallel
def isDeletedOrAForkedRepository(repo_name, t):
response = requests.get('https://api.github.com/repos/{}'.format(repo_name), headers={'Authorization': 'token ' + t})
if statusOk(response.status_code):
return response.json()['fork']
else:
return "Deleted"
tqdm.pandas()
pandarallel.initialize(progress_bar=True)
filtered_df = pd.DataFrame(columns = ['name', 'isPresent'])
total_to_cover = len(data_df)
total_covered = 0
start_index = 0
try:
while(total_covered != total_to_cover):
for t in getElements():
chunk_size = getChunkSize(t)
if(chunk_size == 0):
continue
end_index = start_index + chunk_size
if(end_index > total_to_cover):
end_index = total_to_cover
print("StartIndex: ", start_index, ", EndIndex: ", end_index)
chunk_df = data_df[start_index:end_index]
chunk_df["isPresent"] = chunk_df["repo_name"].parallel_apply(lambda x: customfunction(x, t))
filtered_df = pd.concat([filtered_df, chunk_df], ignore_index = True)
total_covered = total_covered + (end_index - start_index)
start_index = end_index
print("Total Covered: ", total_covered)
print("Going for sleep....")
time.sleep(120)
print("Awake Now....")
except Exception as e:
print(e)
filtered_df.to_csv("test.csv", index = False)
finally:
print("Done..")
filtered_df.to_csv("test.csv", index = False)
What should I do to avoid the deadlock? I am running the code from jupyter notebook.
source https://stackoverflow.com/questions/74362232/pandarallel-freezes-and-no-progressing

Comments
Post a Comment