Hello!
I'm trying to add new a language and I noticed the code that downloads the pre-trained models.
I created a new test tokenizer and just wanted to know if there is any rule that I should follow to use the pre-trained models?
I mean, I created a new test tokenizer with custom symbols, they don't look like the existing ones so will the pre-trained model work regardless? Also, would it work with any language I try to add?
I'm just not sure what the pre-trained models will do...
Thanks for this awesome work!