-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Csatv2 contribution #2627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Csatv2 contribution #2627
Conversation
b73a79c to
dad1ca1
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@gusdlf93 hey, I used claude to make some additional changes to fit timm norms a bit better, it did require remapping checkpoints though. I verified 80.024% accuracy remains. Unfortunately the diff of the model got messed up (can't see what was changed) because your commit was a mix of CRLF and LF and it got cleaned to LF only which touched every line. An interesting model for higher resolution. |
|
I may add a few more small things like grad checkpointing, and then I guess I'll push a remapped checkpoint to the timm org that references the original |
…inal norm is 2d so we can disable pooling if desired. Still inconsistent line endings
|
Thanks a lot for taking over and polishing the implementation. For reproducibility and detailed training recipes, I’ve documented everything in the Hugging Face model card: |
|
@gusdlf93 okay thanks, I'm probably not going to get a chance to merge this for a few more days, I feel it's in a good state but I have a few days off and wanted to check a few more small things. |
…dynamic for other network shapes, allow drop path option for transformer blocks.
Continuation of work in #2624 by @gusdlf93