Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?
Thanks.