Commit 83b786e
authored
feat: implement proximal log-probability approximation for decoupled PPO (#600)
* feat: implement proximal log-probability approximation for decoupled PPO
Implement proximal log-probability approximation to eliminate expensive forward
passes in decoupled (off-policy) PPO training.
* docs: remove rollout from user-facing documentation
* feat: always log compute_logp metrics and add importance_weight tracking1 parent 1f73719 commit 83b786e
File tree
10 files changed
+1899
-63
lines changed- areal
- api
- engine/ppo
- tests
- utils
- fsdp
- docs
- algorithms
- figures
- examples/experimental/prox_approx
10 files changed
+1899
-63
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
| |||
639 | 643 | | |
640 | 644 | | |
641 | 645 | | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
642 | 658 | | |
643 | 659 | | |
644 | 660 | | |
| |||
Large diffs are not rendered by default.
0 commit comments