-
Notifications
You must be signed in to change notification settings - Fork 287
Description
In my model implementation, I would like to freeze the transformer (using roberta-base in a Tok2VecTransformer.v1) for the first 2 epochs during training. From this spacy documentation, it seems like it should be possible to set the grad_factor to 0 in order to disable gradients from one of the listeners. Setting this up per epoch should then be possible, according to the same documentation, by using a scheduler. In my config, I have specified the constant_then scheduler followed by another constant scheduler in the following way:
[components.seq2labels.model.tok2vec]
@architectures = "spacy-transformers.Tok2VecTransformer.v1"
name = "roberta-base"
tokenizer_config = {"use_fast": true}
[components.seq2labels.model.tok2vec.grad_factor]
@schedules = "constant_then.v1"
rate = 0.0
steps = 2000
[components.seq2labels.model.tok2vec.grad_factor.schedule]
@schedules = "constant.v1"
rate = 1.0
When initializing, I get the following error:
=========================== Initializing pipeline ===========================
✘ Config validation error
seq2labels.model.tok2vec -> grad_factor value is not a valid float
It seems to me that the scheduler may be returning and iterator instead of a float that can be used as a value here. Have I overlooked some aspect that should still be implemented/ammended?
Otherwise, if this scheduler does not work with grad_factor, is there another way to freeze the transformer only for the first 2 epochs of training?
Thanks for any help in advance :)