-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
PPO Batch size handling is a bit hacky. The big problem is the interaction between batch size and the buffer size. The buffer currently counts the number of added steps towards its size (len() is correct, but we compare self.pos), meaning that there is a hidden dimension of n_parallel_envs. What happens:
buffer_size: 128
num_envs: 1
We add 128 times 1 step. This is smaller than the default batch size, but we get a buffer overflow error and don't know it's related to batch size.
buffer_size: 128
num_env: 64
We add actually add too many steps since each has 64 and the buffer/batch size is optimized for 16. Better errors as well as maybe dynamic settings or warnings might be nice here.
Metadata
Metadata
Assignees
Labels
No labels