My issue is about: seeing differences in the behavior of add_task and add_transformer methods implemented in the progressive_learner class. Calling add_task and add_transformer for different tasks lead to drastically different BTE behaviors, even though add_task just calls add_transformer.

Figure from @srahul1222.