-
Notifications
You must be signed in to change notification settings - Fork 610
Enhance loss_compare.py: Add Import/Export Options and Enable CI Comparison with Existing Losses #2063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance loss_compare.py: Add Import/Export Options and Enable CI Comparison with Existing Losses #2063
Changes from all commits
d7beeb3
25d1381
7274b81
296f93b
a699162
30bc03c
40b07a5
0a94bfc
49ce58c
0b8560b
3e3a19f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe put in https://github.com/pytorch/torchtitan/tree/main/tests/assets and just call it
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. n00b q:Is this ground truth loss come from a single GPU run? Or is it FSDP only? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| 1 8.1376 | ||
| 2 7.841 | ||
| 3 7.1815 | ||
| 4 6.3509 | ||
| 5 5.5272 | ||
| 6 4.9244 | ||
| 7 4.5606 | ||
| 8 4.3724 | ||
| 9 4.347 | ||
| 10 4.2004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because you dump something to the folder, so that later runs will complain it's not empty? I think we should make this dump folder optional so that the first run doesn't use it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we need
artifacts-to-be-uploadedis that torchtitan will output something to the output folder and the default isoutputs. But creatingoutputswill fail because the file system is read-only. So, basically if we want to run a TorchTitan job, we will need to redirect the outputs toartifacts-to-be-uploaded.I feel it is too much to make
outputsto be optional because there will be several checks in the trainer. And all these are just for CI. I would rather say the integration tests shouldn't expect the folder to be empty.