decouple backward and step in accelerator/deepspeed #3819

naomili0924 · 2025-10-24T04:33:08Z

What does this PR do?

Addressing this issue: #2951

This PR enables user to debug model gradient after backward+gradient_clipping and before optimizer.step().

To achieve this functionality, we move the gradient clipping logic from deepspeed engine (another public repo) to accelerator. As a result, self.engine.set_gradient_clipping(0.0) must be set in the user's config. Or add an extra line:

self.engine.set_gradient_clipping(0.0) in the step() function.

We didn't replace amp logic with normal pytorch.cuda.amp for clipping fp16 gradient. As a result, you might need to manually clone the apex repo to use amp.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./

Fixes # (issue)
#2951

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

2) add decouple unit test

SunMarc

Thnaks for this PR, just to be sure, this is just there to debug and not meant to be merged is it ?

1) decouple backward and step in accelerator/deepspeed

75999e1

2) add decouple unit test

naomili0924 mentioned this pull request Oct 24, 2025

DeepSpeedEngineWrapper.backward() does a bit too much #2951

Open

SunMarc reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

decouple backward and step in accelerator/deepspeed #3819

decouple backward and step in accelerator/deepspeed #3819

naomili0924 commented Oct 24, 2025 •

edited

Loading

Uh oh!

SunMarc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

decouple backward and step in accelerator/deepspeed #3819

Are you sure you want to change the base?

decouple backward and step in accelerator/deepspeed #3819

Conversation

naomili0924 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

naomili0924 commented Oct 24, 2025 •

edited

Loading