Skip to content

v1.11.0: TE MXFP8, FP16/BF16 with MPS, Python 3.10

Latest

Choose a tag to compare

@SunMarc SunMarc released this 20 Oct 16:08
· 7 commits to main since this release

TE MXFP8 support

We've added support for MXFP8 in our TransformerEngine integration. To use that, you need to set use_mxfp8_block_scaling in fp8_config. See nvidia docs [here]. (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling)

  • Add support for TE MXFP8 recipe in accelerate by @pstjohn in #3688

FP16/BF16 Training for MPS devices

BF16 and FP16 support for MPS devices is finally here. You can now pass mixed_precision = "fp16" or "bf16" when training on a mac (fp16 requires torch 2.8 and bf16 requires torch 2.6)

  • Add bf16/fp16 support for amp with mps device by @SunMarc in #3373

FSDP updates

The following PRs add respectively support to ignored_params and no_sync() for FSDPv2:

  • feat: add ignored_params support for fsdp2 by @kmehant in #3731
  • fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 by @EquationWalker in #3762

Mixed precision can now be passed as a dtype string from accelerate cli flag or fsdp_config in accelerate config file:

  • feat: allow mixed precision policy as dtype by @kmehant in #3751

Nd-parallel updates

Some minor updates concerning nd-parallelism.

Bump to Python 3.10

We've dropped support for python 3.9 as it reached EOL in October.

Lots of minor fixes:

New Contributors

Full Changelog: v1.10.1...v1.11.0