-
Notifications
You must be signed in to change notification settings - Fork 42
[Example] Frozen_Lake #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1764513
fix bugs
hiyuchang beb8c45
fix prompt_len and response_len
hiyuchang ee7d8ef
fix render
hiyuchang f21184c
fix seed
hiyuchang dc2153a
remove debug lines and fix pre-commit error
hiyuchang 3c6fb6f
fix temp config
hiyuchang 1e349f8
Merge branch 'main' into example/frozen_lake
hiyuchang 9519ab0
tiny fix on workflow, fix on vllm
hiyuchang f76f06b
fix workflow
hiyuchang 9a827d9
Merge branch 'main' into example/frozen_lake
hiyuchang 55599d7
fix yaml to qwen25
hiyuchang 34bac77
Merge branch 'main' into example/frozen_lake
hiyuchang a715980
fix comment
hiyuchang 902ce2e
add results, enable_prompt_tokens and unittest
hiyuchang 0a92e71
Merge branch 'main' into example/frozen_lake
hiyuchang 315f1d0
fix unittest
hiyuchang b13d424
fix import error
hiyuchang d074344
fix logprob error and add env_steps
hiyuchang a69ec19
add map_max_size
hiyuchang 75908af
add note
hiyuchang 5d9fec8
Merge branch 'main' into example/frozen_lake
hiyuchang 1525bb6
update results
hiyuchang 2a4e458
add enable_prompt_truncation to tutorial
hiyuchang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Frozen Lake | ||
|
|
||
| # Prepare the environment and data | ||
|
|
||
| ``` | ||
| pip install gymnasium[toy_text] | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # TODO: check this config again after all | ||
| project: "FrozenLake" | ||
| name: "test-trinity-0.6B" | ||
hiyuchang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} | ||
| algorithm: | ||
| algorithm_type: grpo | ||
| repeat_times: 8 | ||
| optimizer: | ||
| lr: 1e-6 | ||
| policy_loss_fn_args: | ||
| loss_agg_mode: "seq-mean-token-sum" | ||
| clip_range_low: 0.2 | ||
| clip_range_high: 0.28 | ||
| kl_loss_fn_args: | ||
| kl_coef: 0.0 | ||
| model: | ||
| model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-0.6B} | ||
| max_prompt_tokens: 20480 | ||
| max_response_tokens: 4096 | ||
| temperature: 0.7 | ||
| cluster: | ||
| node_num: 1 | ||
| gpu_per_node: 8 | ||
| buffer: | ||
| total_epochs: 1 | ||
| batch_size: 32 | ||
| explorer_input: | ||
| taskset: | ||
| name: frozenlake | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH} | ||
| split: train | ||
| workflow_args: | ||
| max_steps: 10 | ||
| is_slippery: false | ||
hiyuchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| eval_tasksets: | ||
| - name: frozenlake | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH} | ||
| split: test | ||
| workflow_args: | ||
| max_steps: 10 | ||
| is_slippery: false | ||
| rollout_args: | ||
| n: 4 | ||
| top_p: 0.8 | ||
| top_k: 20 | ||
| default_workflow_type: 'frozen_lake_workflow' | ||
| explorer: | ||
| eval_on_startup: false | ||
| eval_interval: 10 | ||
| runner_per_model: 8 | ||
| rollout_model: | ||
| engine_num: 2 | ||
| tensor_parallel_size: 2 | ||
| enable_thinking: true | ||
| enable_chunked_prefill: true | ||
| enforce_eager: false | ||
| dtype: bfloat16 | ||
| seed: 42 | ||
| gpu_memory_utilization: 0.85 | ||
| trainer: | ||
| trainer_type: 'verl' | ||
| save_interval: 40 | ||
| use_dynamic_bsz: true | ||
| max_token_len_per_gpu: 16384 | ||
| ulysses_sequence_parallel_size: 2 | ||
| trainer_config: | ||
| actor_rollout_ref: | ||
| hybrid_engine: true | ||
| model: | ||
| use_remove_padding: true | ||
| enable_gradient_checkpointing: true | ||
| actor: | ||
| clip_ratio_high: 0.28 | ||
| fsdp_config: | ||
| param_offload: true | ||
| optimizer_offload: true | ||
| ref: | ||
| fsdp_config: | ||
| param_offload: true | ||
| synchronizer: | ||
| sync_method: nccl | ||
| sync_interval: 2 | ||
| sync_timeout: 1200 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| """ | ||
| Modified from https://github.com/rllm-org/rllm/blob/main/examples/frozenlake/prepare_frozenlake_data.py | ||
| """ | ||
| import os | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
|
|
||
| from trinity.common.constants import TASKSET_PATH_ENV_VAR | ||
|
|
||
| if os.environ.get(TASKSET_PATH_ENV_VAR) is not None: | ||
| DATA_ROOT_DIR = os.path.dirname(os.environ.get(TASKSET_PATH_ENV_VAR)) | ||
| else: | ||
| DATA_ROOT_DIR = os.path.join(os.path.dirname(__file__), "data") | ||
|
|
||
|
|
||
| def save_dataset_to_local(name: str, data: list[dict], split: str = "default") -> str: | ||
| """Save dataset directly to local DATA_PATH. | ||
| Args: | ||
| name: Name of the dataset | ||
| data: List of dictionaries containing the dataset examples | ||
| split: Split name (e.g., 'train', 'test', 'default') | ||
| Returns: | ||
| str: Path to the saved parquet file | ||
| """ | ||
| dataset_dir = os.path.join(DATA_ROOT_DIR, name) | ||
| os.makedirs(dataset_dir, exist_ok=True) | ||
|
|
||
| # Convert to DataFrame and save | ||
| data_df = pd.DataFrame(data) | ||
| dataset_path = os.path.join(dataset_dir, f"{split}.parquet") | ||
| data_df.to_parquet(dataset_path) | ||
|
|
||
| print( | ||
| f"Saved dataset '{name}' split '{split}' with {len(data)} examples at {dataset_path}. Make sure to set the environment variable {TASKSET_PATH_ENV_VAR} to {DATA_ROOT_DIR}/{name}." | ||
| ) | ||
|
|
||
| return dataset_path | ||
|
|
||
|
|
||
| def prepare_frozenlake_data(train_size=10000, test_size=100): | ||
| """ | ||
| Prepare and save FrozenLake datasets for training and testing. | ||
| Args: | ||
| train_size (int): Number of training examples to generate | ||
| test_size (int): Number of test examples to generate | ||
| Returns: | ||
| tuple: (train_data, test_data) - Lists of data dictionaries | ||
| """ | ||
| # Set random seed for reproducibility | ||
| np.random.seed(42) | ||
|
|
||
| # Generate random parameters for train and test sets | ||
| train_seeds = np.random.randint(0, 100000, size=train_size) | ||
| test_seeds = np.random.randint(0, 100000, size=test_size) | ||
| train_sizes = np.random.randint(2, 10, size=train_size) | ||
| test_sizes = np.random.randint(2, 10, size=test_size) | ||
hiyuchang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| train_ps = np.random.uniform(0.6, 0.85, size=train_size) | ||
| test_ps = np.random.uniform(0.6, 0.85, size=test_size) | ||
|
|
||
| def frozenlake_process_fn(seed, size, p, idx): | ||
| """Process function to create FrozenLake task instances.""" | ||
| return {"seed": seed, "size": size, "p": p, "index": idx, "uid": f"{seed}_{size}_{p}"} | ||
|
|
||
| # Create train and test data | ||
| train_data = [ | ||
| frozenlake_process_fn(seed, train_sizes[idx], train_ps[idx], idx) | ||
| for idx, seed in enumerate(train_seeds) | ||
| ] | ||
| test_data = [ | ||
| frozenlake_process_fn(seed, test_sizes[idx], test_ps[idx], idx) | ||
| for idx, seed in enumerate(test_seeds) | ||
| ] | ||
|
|
||
| # Save datasets directly to local DATA_PATH | ||
| save_dataset_to_local("frozenlake", train_data, "train") | ||
| save_dataset_to_local("frozenlake", test_data, "test") | ||
|
|
||
| return train_data, test_data | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| train_data, test_data = prepare_frozenlake_data() | ||
| print(f"Train dataset: {len(train_data)} examples") | ||
| print(f"Test dataset: {len(test_data)} examples") | ||
| print("Sample train example:", train_data[0]) | ||
| print("Sample test example:", test_data[0]) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.