Skip to content

Fine tuning with llama-7b-model failure #256

@Ssukriti

Description

@Ssukriti

Describe the bug

reported by Alan Braz

There are failures seen when fine tuning llama-7b-model with certain set of parameters :

{
  "modelName": "test-llama2",
  "parameters": {
    "baseModel": "/data/base_models/models/meta-llama/Llama-2-7b",
    "trainStream": {
      "file": {
        "filename": "/data/base_models/input/train_rte_small.json"
      }
    },
    "torchDtype": "float32",
    "batchSize": "1",
    "numEpochs": "1",
    "accumulateSteps": "1",
    "lr": 0.1,
    "maxSourceLength": "128",
    "maxTargetLength": "64",
    "randomSeed": "1"
  }
}

Platform

Please provide details about the environment you are using, including the following:

  • Library version: latest

Sample Code

run examples/run_fine_tuning.py script with any dataset and above config

Expected behavior

Fine tuning succeeds

Observed behavior

return inner_training_loop(
   File "/dccstor/ssharma/caikit_nlp_env_new/lib/python3.9/site-packages/accelerate/utils/memory.py", line 134, in decorator
     raise RuntimeError("No executable batch size found, reached zero.")
 RuntimeError: No executable batch size found, reached zero.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions