Skip to content

Conversation

@rwightman
Copy link
Collaborator

Continuation of work in #2624 by @gusdlf93

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman
Copy link
Collaborator Author

@gusdlf93 hey, I used claude to make some additional changes to fit timm norms a bit better, it did require remapping checkpoints though. I verified 80.024% accuracy remains.

Unfortunately the diff of the model got messed up (can't see what was changed) because your commit was a mix of CRLF and LF and it got cleaned to LF only which touched every line.

An interesting model for higher resolution.

@rwightman
Copy link
Collaborator Author

rwightman commented Dec 9, 2025

I may add a few more small things like grad checkpointing, and then I guess I'll push a remapped checkpoint to the timm org that references the original

@gusdlf93
Copy link

Thanks a lot for taking over and polishing the implementation.
Let me know if you need any additional details about the training setup or checkpoints.

For reproducibility and detailed training recipes, I’ve documented everything in the Hugging Face model card:
Link : https://huggingface.co/Hyunil/CSATv2

@rwightman
Copy link
Collaborator Author

@gusdlf93 okay thanks, I'm probably not going to get a chance to merge this for a few more days, I feel it's in a good state but I have a few days off and wanted to check a few more small things.

…dynamic for other network shapes, allow drop path option for transformer blocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants