Skip to content

Commit 9d283a0

Browse files
authored
Merge pull request #90 from automl/test-fix
readme + example check
2 parents fd679b9 + dcd0247 commit 9d283a0

File tree

5 files changed

+65
-69
lines changed

5 files changed

+65
-69
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ Before you start training, however, please follow the installation instructions
8686
Then use the same command as before, but provide the CARL environment, in this example CARLCartPoleEnv,
8787
and information about the context distribution as keywords:
8888
```bash
89-
python mighty/run_mighty.py 'algorithm=dqn' 'env=CARLCartPole' 'num_envs=10' '+env_kwargs.num_contexts=10' '+env_kwargs.context_feature_args.gravity=[normal, 9.8, 1.0, -100.0, 100.0]' 'env_wrappers=[mighty.mighty_utils.wrappers.FlattenVecObs]'
89+
python mighty/run_mighty.py 'algorithm=ppo' 'env=CARLCartPole' '+env_kwargs.num_contexts=10' '+env_kwargs.context_feature_args.gravity=[normal, 9.8, 1.0, -100.0, 100.0]' 'env_wrappers=[mighty.mighty_utils.wrappers.FlattenVecObs]' 'algorithm_kwargs.rollout_buffer_kwargs.buffer_size=2048'
9090
```
9191

9292
For more complex configurations like this, we recommend making an environment configuration file. Check out our [CARL Ant](mighty/configs/environment/carl_walkers/ant_goals.yaml) file to see how this simplifies the process of working with configurable environments.

examples/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ python mighty/run_mighty.py 'env=CartPole-v1'
7373
We can also be more specific, e.g. by adding our desired number of interaction steps and the number of parallel environments we want to run:
7474

7575
```bash
76-
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10'
76+
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16'
7777
```
7878
For some environments, including CartPole-1, these details are pre-configured in the Mighty configs, meaning we can use the environment keyword to set them all at once:
7979

@@ -98,7 +98,7 @@ python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' 'al
9898
Or to use e.g. an ez-greedy exploration policy for DQN:
9999

100100
```bash
101-
python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' '+algorithm_kwargs.policy_class=mighty.mighty_exploration.EZGreedy'
101+
python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' 'algorithm_kwargs.policy_class=mighty.mighty_exploration.EZGreedy' 'algorithm_kwargs.policy_kwargs=null'
102102
```
103103
You can see that in this case, the value we pass to the script is a class name string which can take the value of any function you want, including custom ones as we'll see further down.
104104
</details>
@@ -109,7 +109,7 @@ You can see that in this case, the value we pass to the script is a class name s
109109
The meta components are a bit more complex, since they are a list of class names and optional keyword arguments:
110110

111111
```bash
112-
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10' '+algorithm_kwargs.meta_methods=[mighty.mighty_meta.RND]'
112+
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16' '+algorithm_kwargs.meta_methods=[mighty.mighty_meta.RND]'
113113
```
114114
As this can become complex, we recommend configuring these in Hydra config files.
115115
</details>
@@ -121,7 +121,7 @@ Hydra has a multirun functionality with which you can specify a grid of argument
121121
Its best use is probably for easily running multiple seeds at once like this:
122122

123123
```bash
124-
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10' 'seed=0,1,2,3,4' 'output_dir=examples/multiple_runs' -m
124+
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16' 'seed=0,1,2,3,4' 'output_dir=examples/multiple_runs' -m
125125
```
126126
</details>
127127

@@ -196,7 +196,7 @@ Compare their structure: the custom policy has a fixed set of methods inherited
196196

197197
If you want to run these custom modules, you can do so by adding them by their import path:
198198
```bash
199-
python mighty/run_mighty.py 'algorithm=dqn' '+algorithm_kwargs.policy_class=examples.custom_policy.QValueUCB' '+algorithm_kwargs.policy_kwargs={}'
199+
python mighty/run_mighty.py 'algorithm=dqn' 'algorithm_kwargs.policy_class=examples.custom_policy.QValueUCB' 'algorithm_kwargs.policy_kwargs=null'
200200
```
201201
For the meta-module, it works exactly the same way:
202202
```bash

examples/hypersweeper_smac_example_config.yaml

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -17,49 +17,46 @@ env_kwargs: {}
1717
env_wrappers: []
1818
num_envs: 64
1919

20-
# @package _global_
2120
algorithm: PPO
2221

2322
algorithm_kwargs:
24-
# Hyperparameters
25-
n_policy_units: 128
26-
n_critic_units: 128
27-
soft_update_weight: 0.01
23+
rescale_action: False
24+
tanh_squash: False
2825

2926
rollout_buffer_class:
30-
_target_: mighty.mighty_replay.MightyRolloutBuffer # Using rollout buffer
27+
_target_: mighty.mighty_replay.MightyRolloutBuffer
28+
3129
rollout_buffer_kwargs:
32-
buffer_size: 4096 # Size of the rollout buffer.
33-
gamma: 0.99 # Discount factor for future rewards.
34-
gae_lambda: 0.95 # GAE lambda.
35-
obs_shape: ??? # Placeholder for observation shape
36-
act_dim: ??? # Placeholder for action dimension
30+
buffer_size: 128 # (16 × 128 = 2048 total)
31+
gamma: 0.99
32+
gae_lambda: 0.95
33+
obs_shape: ???
34+
act_dim: ???
3735
n_envs: ???
38-
36+
discrete_action: ???
3937

40-
# Training
41-
learning_rate: 3e-4
42-
batch_size: 1024 # Batch size for training.
43-
gamma: 0.99 # The amount by which to discount future rewards.
44-
n_gradient_steps: 3 # Number of epochs for updating policy.
45-
ppo_clip: 0.2 # Clipping parameter for PPO.
46-
value_loss_coef: 0.5 # Coefficient for value loss.
47-
entropy_coef: 0.01 # Coefficient for entropy loss.
48-
max_grad_norm: 0.5 # Maximum value for gradient clipping.
49-
38+
# Optimiser and update settings
39+
learning_rate: 3e-4
40+
batch_size: 2048 # 16 environments × 128 steps = 2048 total samples
41+
gamma: 0.99
42+
ppo_clip: 0.2
43+
value_loss_coef: 0.5
44+
entropy_coef: 0.01
45+
max_grad_norm: 0.5 # gradient clipping
5046

51-
hidden_sizes: [64, 64]
52-
activation: 'tanh'
47+
hidden_sizes: [256, 256]
48+
activation: "tanh"
5349

54-
n_epochs: 10
55-
minibatch_size: 64
56-
kl_target: 0.01
57-
use_value_clip: True
58-
value_clip_eps: 0.2
50+
n_gradient_steps: 1 # one gradient step per rollout
51+
n_epochs: 10 # ten update epochs per rollout
52+
minibatch_size: 128 # 2048 ÷ 64 = 32 minibatches
53+
kl_target: null # disable KL‑based early stopping
54+
use_value_clip: true
5955

60-
policy_class: mighty.mighty_exploration.StochasticPolicy # Policy class for exploration
56+
policy_class: mighty.mighty_exploration.StochasticPolicy
6157
policy_kwargs:
62-
entropy_coefficient: 0.0 # Coefficient for entropy-based exploration.
58+
entropy_coefficient: 0.0
59+
6360

6461
# Training
6562
eval_every_n_steps: 1e4 # After how many steps to evaluate.

examples/optuna_example_config.yaml

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -18,49 +18,46 @@ env_kwargs: {}
1818
env_wrappers: []
1919
num_envs: 64
2020

21-
# @package _global_
2221
algorithm: PPO
2322

2423
algorithm_kwargs:
25-
# Hyperparameters
26-
n_policy_units: 128
27-
n_critic_units: 128
28-
soft_update_weight: 0.01
24+
rescale_action: False
25+
tanh_squash: False
2926

3027
rollout_buffer_class:
31-
_target_: mighty.mighty_replay.MightyRolloutBuffer # Using rollout buffer
28+
_target_: mighty.mighty_replay.MightyRolloutBuffer
29+
3230
rollout_buffer_kwargs:
33-
buffer_size: 4096 # Size of the rollout buffer.
34-
gamma: 0.99 # Discount factor for future rewards.
35-
gae_lambda: 0.95 # GAE lambda.
36-
obs_shape: ??? # Placeholder for observation shape
37-
act_dim: ??? # Placeholder for action dimension
31+
buffer_size: 128 # (16 × 128 = 2048 total)
32+
gamma: 0.99
33+
gae_lambda: 0.95
34+
obs_shape: ???
35+
act_dim: ???
3836
n_envs: ???
39-
37+
discrete_action: ???
4038

41-
# Training
42-
learning_rate: 3e-4
43-
batch_size: 1024 # Batch size for training.
44-
gamma: 0.99 # The amount by which to discount future rewards.
45-
n_gradient_steps: 3 # Number of epochs for updating policy.
46-
ppo_clip: 0.2 # Clipping parameter for PPO.
47-
value_loss_coef: 0.5 # Coefficient for value loss.
48-
entropy_coef: 0.01 # Coefficient for entropy loss.
49-
max_grad_norm: 0.5 # Maximum value for gradient clipping.
50-
39+
# Optimiser and update settings
40+
learning_rate: 3e-4
41+
batch_size: 2048 # 16 environments × 128 steps = 2048 total samples
42+
gamma: 0.99
43+
ppo_clip: 0.2
44+
value_loss_coef: 0.5
45+
entropy_coef: 0.01
46+
max_grad_norm: 0.5 # gradient clipping
5147

52-
hidden_sizes: [64, 64]
53-
activation: 'tanh'
48+
hidden_sizes: [256, 256]
49+
activation: "tanh"
5450

55-
n_epochs: 10
56-
minibatch_size: 64
57-
kl_target: 0.01
58-
use_value_clip: True
59-
value_clip_eps: 0.2
51+
n_gradient_steps: 1 # one gradient step per rollout
52+
n_epochs: 10 # ten update epochs per rollout
53+
minibatch_size: 128 # 2048 ÷ 64 = 32 minibatches
54+
kl_target: null # disable KL‑based early stopping
55+
use_value_clip: true
6056

61-
policy_class: mighty.mighty_exploration.StochasticPolicy # Policy class for exploration
57+
policy_class: mighty.mighty_exploration.StochasticPolicy
6258
policy_kwargs:
63-
entropy_coefficient: 0.0 # Coefficient for entropy-based exploration.
59+
entropy_coefficient: 0.0
60+
6461

6562
# Training
6663
eval_every_n_steps: 1e4 # After how many steps to evaluate.

mighty/mighty_agents/dqn.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,10 @@ def __init__(
121121

122122
# Policy Class
123123
policy_class = retrieve_class(cls=policy_class, default_cls=EpsilonGreedy) # type: ignore
124-
if policy_kwargs is None:
124+
if policy_kwargs is None and isinstance(policy_class, EpsilonGreedy):
125125
policy_kwargs = {"epsilon": 0.1} # type: ignore
126+
elif policy_kwargs is None:
127+
policy_kwargs = {}
126128
self.policy_class = policy_class
127129
self.policy_kwargs = policy_kwargs
128130

0 commit comments

Comments
 (0)