-
Notifications
You must be signed in to change notification settings - Fork 242
Description
I've been comparing the training efficiency of AReaL and verl in text generation tasks, and encountered a confusion that I hope to get some insights on. Here's the detailed context:
Experiment Setup
- Task: Text generation training (comparing AReaL and verl)
- Reward Function: Simplified to directly return 1 (no complex evaluation logic)
- Dataset: Identical dataset used for both frameworks
- Text Length: All sequences (including prompt and response) are within 8k tokens
- Inference Acceleration: Both use SGLang for inference speedup
- Training Algorithm: GRPO
Observation
Under the above identical settings, the training duration of AReaL is significantly longer than that of verl.
After profiling, I found that the most time-consuming part in AReaL's model_worker is the loss calculation and model update step, specifically the code around this line in ppo_interface.py.
Question
Could anyone help explain why AReaL takes longer in this scenario, especially regarding the performance bottleneck in the loss calculation and model update stage? Are there any configuration adjustments or optimizations I might be missing for AReaL under such experimental settings?
Thanks in advance for your insights!