I'm using Google Colab for finetuning a pre-trained model.
I successfully preprocessed a dataset and created an instance of the Seq2SeqTrainer class:
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
But the problem is training it from last checkpoint after the session is over.
If I run trainer.train()
it runs well. As it takes long time I came back to Colab tab after a few hours. I know that if session got crashed I can continue training from last checkpoint like this: trainer.train("checkpoint-5500")
But the problem is that those checkpoint data no longer exist on Google Colab if I came back too late, so even though I know till what point training has been done, I will have to start all over again?
Is there any way to solve this problem?
source https://stackoverflow.com/questions/75213102/cant-train-model-from-checkpoint-on-google-colab-because-those-all-deleted-afte
Comments
Post a Comment