We trained our models on TPUs, so TPU access is assumed. Please see scripts/tpu-running/README.md for more details on how to set up your TPU environment and train a model.
This repo contains two branches: the main branch implements our mean-pooling approach, and the compression-tokens branch implements the compression-tokens approach.
The usage of the two branches is similar.