Skip to content

Gemma 3 270M DPO Training Uses More Memory Than Expected #3181

@71cj34

Description

@71cj34

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

DPO training should begin after datasets are tokenized/processed.

Current behaviour

Training immediately fails with an out of memory error.

Steps to reproduce

  1. Use a config with DPO and Gemma 3 270M. (this config is confirmed to be problematic.)
  2. Use axolotl train.

Config yaml

base_model: unsloth/gemma-3-270m

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001

load_in_4bit: true
adapter: qlora

sequence_len: 4096

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

bf16: true
tf32: false

logging_steps: 1
eager_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

rl: dpo
datasets:
  - path: nbeerbower/gutenberg-moderne-dpo
    split: train
    type: chatml.prompt_pairs
dataset_prepared_path: last_run_prepared
val_set_size: 0.08
output_dir: ./outputs/lora-out

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.11

axolotl branch-commit

main/b3b9268

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions