v1.10.0: N-D Parallelism
N-D Parallelism
Training large models across multiple GPUs can be complex, especially when combining different parallelism strategies (e.g TP, CP, DP). To simplify this process, we've collaborated with Axolotl to introduce an easy-to-use integration that allows you to apply any combination of parallelism strategies directly in your training script. Just pass a ParallelismConfig specifying the size of each parallelism type—it's that simple.
Learn more about how it works in our latest blogpost.
parallelism_config = ParallelismConfig(
dp_shard_size=2,
dp_replicate_size=2,
cp_size=2,
tp_size=2,
)
accelerator = Accelerator(
parallelism_config=parallelism_config,
...
)
model = AutoModelForCausalLM.from_pretrained("your-model-name", device_mesh=accelerator.torch_device_mesh)
model = accelerator.prepare(model)- Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) by @SalmanMohammadi in #3682
- Feat: context parallel v2.0 by @S1ro1 in #3700
- set default submesh_tp_size to prevent unset local variable error by @winglian in #3687
- Add Parallelism getter property to Accelerator class by @WoosungMyung in #3703
- Fix: prepare works even if nothing except tp specified (rare) by @S1ro1 in #3707
- Set parallelism_config in constructor due to Trainer reset of State by @winglian in #3713
- Fix: tp size wouldn't read from env by @S1ro1 in #3716
- Remove
ParallelismConfigfromPartialStateby @SunMarc in #3720
FSDP improvements
We've fixed ignored modules attribute. With this, it is now possible to train PEFT model that moe layers that contrains q_proj and v_proj parameters. This is especially important for fine-tuning gpt-oss model.
- ENH: Allow FSDP ignored modules to be regex by @BenjaminBossan in #3698
- TST Add test for FSDP ignored_modules as str by @BenjaminBossan in #3719
Minor improvements
- feature: CpuOffload pre_forward don't attempt to move if already on device by @JoeGaffney in #3695
- Fix: Ensure environment variable values are case-insensitive in Accelerate by @jp1924 in #3712
- remove use_ipex by @SunMarc in #3721
New Contributors
- @SalmanMohammadi made their first contribution in #3682
- @WoosungMyung made their first contribution in #3703
- @jp1924 made their first contribution in #3712
- @JoeGaffney made their first contribution in #3695
Full Changelog: v1.9.0...v1.10.0