Disable validation during training?

I just started working with RF-DETR, and while training is going smoothly, validation seems to crash on almost every epoch (but not every epoch), not necessarily with a consistent error or at a consistent point, typically with a torch.distributed.elastic.multiprocessing.errors.ChildFailedError (I am training on two GPUs).

Rather than trying to debug this, I'd like to just disable validation entirely, and run val on every checkpoint at the end.  Is that possible?  I don't see any equivalent of the "run_test" argument (e.g. "run_val", or "disable_val").  If there's not functionality for this, is there a recommended workaround, e.g. is an an empty "valid" folder allowable?  Or maybe having a single image in the "valid" folder is the closest I can do?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable validation during training? #449

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Disable validation during training? #449

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions