Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Error while resuming training from saved checkpoint #83

@DeepakLabh

Description

@DeepakLabh

Passing ckpt_path in lightening's .fit() method gives the below error for the line trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt'). The intent is to resume training from saved checkpoints.

Restoring states from the checkpoint path at best.ckpt.ckpt

==================================================================
| Name | Type | Params

0 | spacetimeformer | Spacetimeformer | 4.5 M

4.5 M Trainable params
0 Non-trainable params
4.5 M Total params
18.191 Total estimated model params size (MB)
Restored all states from the checkpoint file at best.ckpt.ckpt
Epoch 0: 75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 105/140 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_vol.py", line 457, in
trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt')
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
self.fit_loop.run()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 297, in on_advance_end
self.trainer._call_callback_hooks("on_train_epoch_end")
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1637, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 179, in on_train_epoch_end
self._run_early_stopping_check(trainer)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 190, in _run_early_stopping_check
if trainer.fast_dev_run or not self._validate_condition_metric( # disable early_stopping with fast_dev_run
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 145, in _validate_condition_metric
raise RuntimeError(error_msg)
RuntimeError: Early stopping conditioned on metric val/loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following: ``

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions