Thank you for your impressive work!
I tried to use 4 A40 GPUs (48GB each) to reproduce your results. Due to the memory limit, I applied DeepSpeed-ZeRO2. However, the performance on APPS and livecodebench dropped significantly. Do you have any suggestions?