-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Please check that this issue hasn't been reported before.
- I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
DPO training should begin after datasets are tokenized/processed.
Current behaviour
Training immediately fails with an out of memory error.
Steps to reproduce
- Use a config with DPO and Gemma 3 270M. (this config is confirmed to be problematic.)
- Use
axolotl train.
Config yaml
base_model: unsloth/gemma-3-270m
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001
load_in_4bit: true
adapter: qlora
sequence_len: 4096
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
bf16: true
tf32: false
logging_steps: 1
eager_attention: true
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3
rl: dpo
datasets:
- path: nbeerbower/gutenberg-moderne-dpo
split: train
type: chatml.prompt_pairs
dataset_prepared_path: last_run_prepared
val_set_size: 0.08
output_dir: ./outputs/lora-outPossible solution
No response
Which Operating Systems are you using?
- Linux
- macOS
- Windows
Python Version
3.11
axolotl branch-commit
main/b3b9268
Acknowledgements
- My issue title is concise, descriptive, and in title casing.
- I have searched the existing issues to make sure this bug has not been reported yet.
- I am using the latest version of axolotl.
- I have provided enough information for the maintainers to reproduce and diagnose the issue.
yhfwww
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working