-
Notifications
You must be signed in to change notification settings - Fork 42
[Example] Frozen_Lake #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1764513
fix bugs
hiyuchang beb8c45
fix prompt_len and response_len
hiyuchang ee7d8ef
fix render
hiyuchang f21184c
fix seed
hiyuchang dc2153a
remove debug lines and fix pre-commit error
hiyuchang 3c6fb6f
fix temp config
hiyuchang 1e349f8
Merge branch 'main' into example/frozen_lake
hiyuchang 9519ab0
tiny fix on workflow, fix on vllm
hiyuchang f76f06b
fix workflow
hiyuchang 9a827d9
Merge branch 'main' into example/frozen_lake
hiyuchang 55599d7
fix yaml to qwen25
hiyuchang 34bac77
Merge branch 'main' into example/frozen_lake
hiyuchang a715980
fix comment
hiyuchang 902ce2e
add results, enable_prompt_tokens and unittest
hiyuchang 0a92e71
Merge branch 'main' into example/frozen_lake
hiyuchang 315f1d0
fix unittest
hiyuchang b13d424
fix import error
hiyuchang d074344
fix logprob error and add env_steps
hiyuchang a69ec19
add map_max_size
hiyuchang 75908af
add note
hiyuchang 5d9fec8
Merge branch 'main' into example/frozen_lake
hiyuchang 1525bb6
update results
hiyuchang 2a4e458
add enable_prompt_truncation to tutorial
hiyuchang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # Frozen Lake | ||
|
|
||
| This example shows the usage of GRPO on the [Frozen Lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) task. | ||
|
|
||
| ## Data and Environment Preparation | ||
|
|
||
| After setting up the basic environment following the [installation section of Quickstart](../../docs/sphinx_doc/source/tutorial/example_reasoning_basic.md#step-0-environment-preparation), you need to install the additional dependencies by running the following command: | ||
|
|
||
| ```bash | ||
| pip install gymnasium[toy_text] | ||
| ``` | ||
|
|
||
| Then, we prepare the dataset by running the following command: | ||
|
|
||
| ```bash | ||
| cd examples/grpo_frozen_lake | ||
| python get_frozen_lake_data.py | ||
| ``` | ||
|
|
||
| This command will save the dataset to the local directory `{DATA_ROOT_DIR}/frozenlake`, and print the path of the dataset. Afterwards, make sure to set the environment variable `TRINITY_TASKSET_PATH` to the path of the dataset. | ||
hiyuchang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ```bash | ||
| export TRINITY_TASKSET_PATH={DATA_ROOT_DIR}/frozenlake | ||
| ``` | ||
|
|
||
|
|
||
| ## Workflow Configuration and Training | ||
|
|
||
| We use a concatenated multi-turn workflow `FrozenLakeWorkflow` to solve the Frozen Lake task. For each rollout, the multi-turn interaction in between the agent and feedback from the environment are stored in a single `Experience` object. | ||
| The specific configuration is located in [`frozen_lake.yaml`](frozen_lake.yaml). | ||
|
|
||
| To run this example, you can use the following command: | ||
|
|
||
| ```bash | ||
| trinity run --config examples/grpo_frozen_lake/frozen_lake.yaml | ||
| ``` | ||
|
|
||
| ## Results | ||
| We show the result with a Qwen2.5-3B-Instruct model in the following. The figures demonstrate the reward increases over training steps. | ||
|
|
||
|  | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| project: "FrozenLake" | ||
| name: "trinity-frozen-lake" | ||
| checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} | ||
| algorithm: | ||
| algorithm_type: grpo | ||
| repeat_times: 8 | ||
| optimizer: | ||
| lr: 1e-6 | ||
| policy_loss_fn_args: | ||
| loss_agg_mode: "seq-mean-token-sum" | ||
| clip_range_low: 0.2 | ||
| clip_range_high: 0.28 | ||
| kl_loss_fn_args: | ||
| kl_coef: 0.0 | ||
| model: | ||
| model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-3B-Instruct} | ||
| enable_prompt_truncation: false | ||
| max_response_tokens: 10240 | ||
| max_model_len: 14436 | ||
| temperature: 0.7 | ||
| cluster: | ||
| node_num: 1 | ||
| gpu_per_node: 8 | ||
| buffer: | ||
| total_epochs: 1 | ||
| batch_size: 64 | ||
| explorer_input: | ||
| taskset: | ||
| name: frozenlake | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH} | ||
| split: train | ||
| workflow_args: | ||
| env_max_steps: 8 | ||
| agent_max_steps: 10 | ||
| is_slippery: false | ||
| eval_tasksets: | ||
| - name: frozenlake | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH} | ||
| split: test | ||
| workflow_args: | ||
| env_max_steps: 8 | ||
| agent_max_steps: 10 | ||
| is_slippery: false | ||
| rollout_args: | ||
| n: 4 | ||
| top_p: 0.8 | ||
| top_k: 20 | ||
| default_workflow_type: 'frozen_lake_workflow' | ||
| explorer: | ||
| eval_on_startup: true | ||
| eval_interval: 10 | ||
| runner_per_model: 8 | ||
| rollout_model: | ||
| engine_num: 6 | ||
| tensor_parallel_size: 1 | ||
| enable_chunked_prefill: true | ||
| enforce_eager: false | ||
| dtype: bfloat16 | ||
| seed: 42 | ||
| gpu_memory_utilization: 0.85 | ||
| trainer: | ||
| trainer_type: 'verl' | ||
| save_interval: 1000 | ||
| use_dynamic_bsz: true | ||
| max_token_len_per_gpu: 16384 | ||
| ulysses_sequence_parallel_size: 1 | ||
| trainer_config: | ||
| actor_rollout_ref: | ||
| hybrid_engine: true | ||
| model: | ||
| use_remove_padding: true | ||
| enable_gradient_checkpointing: true | ||
| actor: | ||
| fsdp_config: | ||
| param_offload: true | ||
| optimizer_offload: true | ||
| ref: | ||
| fsdp_config: | ||
| param_offload: true | ||
| synchronizer: | ||
| sync_method: nccl | ||
| sync_interval: 1 | ||
| sync_timeout: 1200 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| """ | ||
| Modified from https://github.com/rllm-org/rllm/blob/main/examples/frozenlake/prepare_frozenlake_data.py | ||
| """ | ||
| import os | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
|
|
||
| from trinity.common.constants import TASKSET_PATH_ENV_VAR | ||
|
|
||
| path_from_env = os.environ.get(TASKSET_PATH_ENV_VAR) | ||
| if path_from_env is not None: | ||
| DATA_ROOT_DIR = os.path.dirname(path_from_env) | ||
| else: | ||
| DATA_ROOT_DIR = os.path.join(os.path.dirname(__file__), "data") | ||
|
|
||
|
|
||
| def save_dataset_to_local(name: str, data: list[dict], split: str = "default") -> str: | ||
| """Save dataset directly to local DATA_PATH. | ||
|
|
||
| Args: | ||
| name: Name of the dataset | ||
| data: List of dictionaries containing the dataset examples | ||
| split: Split name (e.g., 'train', 'test', 'default') | ||
|
|
||
| Returns: | ||
| str: Path to the saved parquet file | ||
| """ | ||
| dataset_dir = os.path.join(DATA_ROOT_DIR, name) | ||
| os.makedirs(dataset_dir, exist_ok=True) | ||
|
|
||
| # Convert to DataFrame and save | ||
| data_df = pd.DataFrame(data) | ||
| dataset_path = os.path.join(dataset_dir, f"{split}.parquet") | ||
| data_df.to_parquet(dataset_path) | ||
|
|
||
| print( | ||
| f"Saved dataset '{name}' split '{split}' with {len(data)} examples at {dataset_path}. Make sure to set the environment variable {TASKSET_PATH_ENV_VAR} to {DATA_ROOT_DIR}/{name}." | ||
| ) | ||
|
|
||
| return dataset_path | ||
|
|
||
|
|
||
| def prepare_frozenlake_data(train_size=10000, test_size=100, map_max_size=6): | ||
| """ | ||
| Prepare and save FrozenLake datasets for training and testing. | ||
|
|
||
| Args: | ||
| train_size (int): Number of training examples to generate | ||
| test_size (int): Number of test examples to generate | ||
|
|
||
| Returns: | ||
| tuple: (train_data, test_data) - Lists of data dictionaries | ||
| """ | ||
| # Set random seed for reproducibility | ||
| np.random.seed(42) | ||
|
|
||
| # Generate random parameters for train and test sets | ||
| train_seeds = np.random.randint(0, 100000, size=train_size) | ||
| test_seeds = np.random.randint(0, 100000, size=test_size) | ||
| train_sizes = np.random.randint(2, map_max_size, size=train_size) | ||
| test_sizes = np.random.randint(2, map_max_size, size=test_size) | ||
| train_ps = np.random.uniform(0.6, 0.85, size=train_size) | ||
| test_ps = np.random.uniform(0.6, 0.85, size=test_size) | ||
|
|
||
| def frozenlake_process_fn(seed, size, p, idx): | ||
| """Process function to create FrozenLake task instances.""" | ||
| return {"seed": seed, "size": size, "p": p, "index": idx, "uid": f"{seed}_{size}_{p}"} | ||
|
|
||
| # Create train and test data | ||
| train_data = [ | ||
| frozenlake_process_fn(seed, train_sizes[idx], train_ps[idx], idx) | ||
| for idx, seed in enumerate(train_seeds) | ||
| ] | ||
| test_data = [ | ||
| frozenlake_process_fn(seed, test_sizes[idx], test_ps[idx], idx) | ||
| for idx, seed in enumerate(test_seeds) | ||
| ] | ||
|
|
||
| # Save datasets directly to local DATA_PATH | ||
| save_dataset_to_local("frozenlake", train_data, "train") | ||
| save_dataset_to_local("frozenlake", test_data, "test") | ||
|
|
||
| return train_data, test_data | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| train_data, test_data = prepare_frozenlake_data() | ||
| print(f"Train dataset: {len(train_data)} examples") | ||
| print(f"Test dataset: {len(test_data)} examples") | ||
| print("Sample train example:", train_data[0]) | ||
| print("Sample test example:", test_data[0]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.