PPO Batch/Buffer Size Handling is confusing

PPO Batch size handling is a bit hacky. The big problem is the interaction between batch size and the buffer size. The buffer currently counts the number of added steps towards its size (len() is correct, but we compare self.pos), meaning that there is a hidden dimension of n_parallel_envs. What happens:

buffer_size: 128
num_envs: 1

We add 128 times 1 step. This is smaller than the default batch size, but we get a buffer overflow error and don't know it's related to batch size.

buffer_size: 128
num_env: 64

We add actually add too many steps since each has 64 and the buffer/batch size is optimized for 16. Better errors as well as maybe dynamic settings or warnings might be nice here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PPO Batch/Buffer Size Handling is confusing #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PPO Batch/Buffer Size Handling is confusing #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions