Light-R1-32B复现stage2出现问题

基于Qwen2.5-32B-Instruct 模型在stage1阶段的指标和论文上的三个指标相差大概1个点左右，可以认为是完成复现了吧。但是在stage2阶段模型过拟合的很严重，aime指标均在69分以下，
![Image](https://github.com/user-attachments/assets/63d6687d-a4fd-4254-9962-d895a063f3f3)
以下是我在stage2阶段的训练配置，BS保持在32，学习率1e-5
torchrun $DISTRIBUTED_ARGS src/train.py \
    --stage sft \
    --do_train \
    --max_steps -1 \
    --model_name_or_path output/Qwen-stage1-sft-32B-20k-759k/checkpoint-1028 \
    --dataset sft_stage2 \
    --template qwen \
    --finetuning_type full \
    --output_dir output/Qwen-stage2_sft-32B-ckp1028-759k_20k-bs32-lr1e5 \
    --preprocessing_num_workers 16 \
    --sequence_parallel_size 1 \
    --gradient_checkpointing True \
    --flash_attn fa2  \
    --cache_dir .cache \
    --overwrite_cache \
    --cutoff_len 20000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --lr_scheduler_type cosine \
    --save_strategy epoch \
    --logging_steps 1 \
    --adam_beta1 0.9 \
    --adam_beta2 0.95 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 0.1 \
    --warmup_ratio 0.01 \
    --save_total_limit 20 \
    --learning_rate 1e-5 \
    --save_only_model false \
    --num_train_epochs 10 \
    --bf16 true \
    --plot_loss \
    --seed 42 \
    --do_eval false \
    --deepspeed ./examples/deepspeed/ds_z3_config.json \
    --report_to tensorboard \
    --overwrite_output_dir \
    --ddp_timeout 288000000 \
    --enable_liger_kernel

大佬可以帮忙分析下原因吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Light-R1-32B复现stage2出现问题 #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Light-R1-32B复现stage2出现问题 #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions