Skip to content

Conversation

@Highsky7
Copy link

Fixes meta tensor copy issue in factory.py:418.

What this does

This PR fixes a NotImplementedError: Cannot copy out of meta tensor that occurs when training the SmolVLA policy with load_vlm_weights=True.

The error happened because device_map=device in SmolVLMWithExpertModel caused accelerate to place some parameters on the meta device. LeRobot's factory later attempts to move the entire policy using .to(), which fails for meta tensors.

Changes:

  • Remove device_map parameter from VLM model loading to prevent automatic meta device placement.
  • Change torch_dtype from string "bfloat16" to torch.bfloat16 object.
  • Add explicit .to(device) calls after initialization to ensure the model and lm_expert are correctly placed on the target device.

This resolves NotImplementedError when training SmolVLA policy.

How it was tested

Tested locally with the LibERO dataset (libero_10 task).

  • Verified that training starts successfully without the meta tensor error.
  • Verified that the model loads to CUDA correctly.
  • Verified that loss computation works as expected for 10 steps.

How to checkout & try? (for the reviewer)

python3 src/lerobot/scripts/lerobot_train.py \
  --policy.type=smolvla \
  --policy.load_vlm_weights=true \
  --policy.device=cuda \
  --dataset.repo_id=HuggingFaceVLA/libero \
  --env.type=libero \
  --env.task=libero_10 \
  --output_dir=./outputs/train/test \
  --steps=10

- Remove device_map parameter from VLM model loading
- Change torch_dtype from string to torch.bfloat16
- Add explicit .to(device) calls after initialization

This resolves NotImplementedError when training SmolVLA policy.
Fixes meta tensor copy issue in factory.py:418.
@Highsky7 Highsky7 changed the title Fix SmolVLA meta tensor error by removing device_map fix: Solve SmolVLA meta tensor error by removing device_map Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant