Skip to content

Commit 1de31a6

Browse files
authored
Support Over Rollout (#376)
1 parent a690509 commit 1de31a6

File tree

13 files changed

+474
-139
lines changed

13 files changed

+474
-139
lines changed

docs/sphinx_doc/source/tutorial/develop_workflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -513,7 +513,7 @@ Here, `<config_file_path>` is the path to a YAML configuration file, which shoul
513513
Once started, the model will keep running and wait for debug instructions; it will not exit automatically. You can then run the following command in another terminal to debug your workflow:
514514

515515
```bash
516-
trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>
516+
trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>
517517
```
518518

519519
- `<config_file_path>`: Path to the YAML configuration file, usually the same as used for starting the inference model.

docs/sphinx_doc/source/tutorial/faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ ray start --head
9494

9595
**A:** The following parameters may be helpful:
9696

97-
- For trainer, adjust `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` when `actor_rollout_ref.actor.use_dynamic_bsz=false`; adjust `actor_rollout_ref.actor.ppo_max_token_len_per_gpu` and `actor_rollout_ref.actor.ulysses_sequence_parallel_size` when `actor_rollout_ref.actor.use_dynamic_bsz=true`. Setting `actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` may also help.
97+
- For trainer, adjust `trainer.max_token_len_per_gpu` when `trainer.use_dynamic_bsz=false`; adjust `trainer.ppo_max_token_len_per_gpu` and `trainer.ulysses_sequence_parallel_size` when `trainer.use_dynamic_bsz=true`. Setting `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` may also help.
9898
- For explorer, adjust `explorer.rollout_model.tensor_parallel_size`.
9999

100100

@@ -113,7 +113,7 @@ To debug a new workflow, use Trinity-RFT's debug mode with the following steps:
113113

114114
1. Launch the inference model via `trinity debug --config <config_file_path> --module inference_model`
115115

116-
2. Debug the workflow in another terminal via `trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>`
116+
2. Debug the workflow in another terminal via `trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>`
117117

118118
Please refer to {ref}`Workflow Development Guide <Workflows>` section for details.
119119

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,12 @@ explorer:
371371
tensor_parallel_size: 1
372372
eval_interval: 100
373373
eval_on_startup: True
374+
over_rollout:
375+
ratio: 0.0
376+
wait_after_min: 30.0
377+
dynamic_timeout:
378+
enable: false
379+
ratio: 3.0
374380
```
375381

376382
- `name`: Name of the explorer. This name will be used as the Ray actor's name, so it must be unique.
@@ -385,6 +391,12 @@ explorer:
385391
- `auxiliary_models`: Additional models used for custom workflows.
386392
- `eval_interval`: Interval (in steps) for evaluating the model.
387393
- `eval_on_startup`: Whether to evaluate the model on startup. More precisely, at step 0 with the original model, so it will not be triggered when restarting.
394+
- `over_rollout`: [Experimental] Configurations for over-rollout mechanism, which allows the explorer to proceed with fewer tasks than the full batch size. It effectively increases throughput in scenarios where some tasks take significantly longer to complete than others. Only applicable when dynamic synchronization (`synchronizer.sync_style` is not `fixed`) is used.
395+
- `ratio`: Explorer will only wait for `(1 - ratio) * batch_size` of tasks at each step. Default is `0.0`, meaning waiting for all tasks.
396+
- `wait_after_min`: After reaching the minimum task threshold, wait for this many seconds before proceeding. Default is `30.0` seconds.
397+
- `dynamic_timeout`: [Experimental] Configurations for dynamic timeout mechanism, which adjusts the timeout for each task based on the average time taken for successful tasks.
398+
- `enable`: Whether to enable dynamic timeout. Default is `false`.
399+
- `ratio`: The timeout for each task is dynamically set to `average_time_per_success_task * ratio`. Default is `3.0`.
388400

389401
---
390402

@@ -398,6 +410,7 @@ synchronizer:
398410
sync_interval: 10
399411
sync_offset: 0
400412
sync_timeout: 1200
413+
sync_style: 'fixed'
401414
```
402415

403416
- `sync_method`: Method of synchronization. Options:
@@ -406,6 +419,9 @@ synchronizer:
406419
- `sync_interval`: Interval (in steps) of model weight synchronization between trainer and explorer.
407420
- `sync_offset`: Offset (in steps) of model weight synchronization between trainer and explorer. The explorer can run `sync_offset` steps before the trainer starts training.
408421
- `sync_timeout`: Timeout duration for synchronization.
422+
- `sync_style`: Style of synchronization. Options:
423+
- `fixed`: The explorer and trainer synchronize weights every `sync_interval` steps.
424+
- `dynamic_by_explorer`: The explorer notifies the trainer to synchronize weights after completing `sync_interval` steps, regardless of how many steps the trainer has completed at this point.
409425

410426
---
411427

docs/sphinx_doc/source_zh/tutorial/develop_workflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -509,7 +509,7 @@ trinity debug --config <config_file_path> --module inference_model
509509
模型启动后会持续运行并等待调试指令,不会自动退出。此时,你可在另一个终端执行如下命令进行 Workflow 调试:
510510

511511
```bash
512-
trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>
512+
trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>
513513
```
514514

515515
- `config_file_path`:YAML 配置文件路径,通常与启动推理模型时使用的配置文件相同。

docs/sphinx_doc/source_zh/tutorial/faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ ray start --head
9393

9494
**A:** 以下参数可能有所帮助:
9595

96-
- 对于 trainer:当 `actor_rollout_ref.actor.use_dynamic_bsz=false` 时,调整 `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`;当 `actor_rollout_ref.actor.use_dynamic_bsz=true` 时,调整 `actor_rollout_ref.actor.ppo_max_token_len_per_gpu``actor_rollout_ref.actor.ulysses_sequence_parallel_size`。设置 `actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。
96+
- 对于 trainer:当 `trainer.use_dynamic_bsz=false` 时,调整 `trainer.max_token_len_per_gpu`;当 `trainer.use_dynamic_bsz=true` 时,调整 `trainer.ppo_max_token_len_per_gpu``trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。
9797
- 对于 explorer:调整 `explorer.rollout_model.tensor_parallel_size`
9898

9999
## 第三部分:调试方法
@@ -113,7 +113,7 @@ trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
113113

114114
1. 启动推理模型: `trinity debug --config <config_file_path> --module inference_model`
115115

116-
2. 在另一个终端中进行工作流的调试:`trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>`
116+
2. 在另一个终端中进行工作流的调试:`trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>`
117117

118118
更多详细信息,请参阅{ref}`工作流开发指南 <Workflows>`章节。
119119

docs/sphinx_doc/source_zh/tutorial/trinity_configs.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,12 @@ explorer:
382382
- `auxiliary_models`: 用于自定义工作流的额外模型。
383383
- `eval_interval`: 模型评估的间隔(以步为单位)。
384384
- `eval_on_startup`: 是否在启动时评估模型。更准确地说,是在第 0 步使用原始模型评估,因此重启时不会触发。
385+
- `over_rollout`: [实验性] 超量 rollout 机制的配置,允许 explorer 在每个步骤中使用少于完整批次大小的任务继续进行。这在某些任务显著耗时较长的场景中能有效地提高吞吐量。仅当使用动态同步(`synchronizer.sync_style` 不是 `fixed`)时适用。
386+
- `ratio`: explorer 在每个步骤中仅等待 `(1 - ratio) * batch_size` 的任务。默认为 `0.0`,表示等待所有任务。
387+
- `wait_after_min`: 达到最小任务阈值后,等待此秒数后再继续。
388+
- `dynamic_timeout`: [实验性] 动态超时机制的配置,根据成功任务的平均耗时调整每个任务的超时时间。
389+
- `enable`: 是否启用动态超时。默认为 `false`。
390+
- `ratio`: 每个任务的超时时间动态设置为 `average_time_per_success_task * ratio`。默认为 `3.0`。
385391

386392
---
387393

@@ -395,6 +401,7 @@ synchronizer:
395401
sync_interval: 10
396402
sync_offset: 0
397403
sync_timeout: 1200
404+
sync_style: 'fixed'
398405
```
399406

400407
- `sync_method`: 同步方法。选项:
@@ -403,6 +410,9 @@ synchronizer:
403410
- `sync_interval`: trainer 和 explorer 之间模型权重同步的间隔(步)。
404411
- `sync_offset`: trainer 和 explorer 之间模型权重同步的偏移量(步)。explorer 可在 trainer 开始训练前运行 `sync_offset` 步。
405412
- `sync_timeout`: 同步超时时间。
413+
- `sync_style`: 同步风格。选项:
414+
- `fixed`: explorer 和 trainer 每隔 `sync_interval` 步同步一次权重。
415+
- `dynamic_by_explorer`: explorer 在完成 `sync_interval` 步后通知 trainer 同步权重,而不管此时 trainer 已完成多少步。
406416

407417
---
408418

0 commit comments

Comments
 (0)