Gemma 3 270M DPO Training Uses More Memory Than Expected

### Please check that this issue hasn't been reported before.

- [x] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.

### Expected Behavior

DPO training should begin after datasets are tokenized/processed.

### Current behaviour

Training immediately fails with an out of memory error.

### Steps to reproduce

1. Use a config with DPO and Gemma 3 270M. ([this config](https://gist.github.com/ofteless/075bdd528c8aef20fe7d2a7583f09cfe) is confirmed to be problematic.)
2. Use `axolotl train`.

### Config yaml

```yaml
base_model: unsloth/gemma-3-270m

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001

load_in_4bit: true
adapter: qlora

sequence_len: 4096

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

bf16: true
tf32: false

logging_steps: 1
eager_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

rl: dpo
datasets:
  - path: nbeerbower/gutenberg-moderne-dpo
    split: train
    type: chatml.prompt_pairs
dataset_prepared_path: last_run_prepared
val_set_size: 0.08
output_dir: ./outputs/lora-out
```

### Possible solution

_No response_

### Which Operating Systems are you using?

- [x] Linux
- [ ] macOS
- [ ] Windows

### Python Version

3.11

### axolotl branch-commit

main/b3b9268

### Acknowledgements

- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Gemma 3 270M DPO Training Uses More Memory Than Expected #3181

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Gemma 3 270M DPO Training Uses More Memory Than Expected #3181

Description

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions