generated from caikit/caikit-template
-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
Describe the bug
reported by Alan Braz
There are failures seen when fine tuning llama-7b-model with certain set of parameters :
{
"modelName": "test-llama2",
"parameters": {
"baseModel": "/data/base_models/models/meta-llama/Llama-2-7b",
"trainStream": {
"file": {
"filename": "/data/base_models/input/train_rte_small.json"
}
},
"torchDtype": "float32",
"batchSize": "1",
"numEpochs": "1",
"accumulateSteps": "1",
"lr": 0.1,
"maxSourceLength": "128",
"maxTargetLength": "64",
"randomSeed": "1"
}
}
Platform
Please provide details about the environment you are using, including the following:
- Library version: latest
Sample Code
run examples/run_fine_tuning.py script with any dataset and above config
Expected behavior
Fine tuning succeeds
Observed behavior
return inner_training_loop(
File "/dccstor/ssharma/caikit_nlp_env_new/lib/python3.9/site-packages/accelerate/utils/memory.py", line 134, in decorator
raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
No labels