Ray bf16 availability: the check does not happen in the gpu worker, so it always says bf16 is not available

### Please check that this issue hasn't been reported before.

- [x] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.

### Expected Behavior

This check here is always false in settings in which a ray cluster has a gpu node https://github.com/axolotl-ai-cloud/axolotl/blob/b3b92687c4ba8792d343b6b1a616f541840db8b3/src/axolotl/cli/config.py#L222C21-L222C48. Why? because the transformer code does not get executed in a gpu worker.

You should run this inside the RayTrainer

### Current behaviour

```
UDA for now.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 121, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 63, in do_cli
    parsed_cfg = load_cfg(config, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/config.py", line 219, in load_cfg
    cfg = validate_config(
          ^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/config/__init__.py", line 295, in validate_config
    AxolotlConfigWCapabilities(
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for AxolotlConfigWCapabilities
  Value error, bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above. [type=value_error, input_value={'base_model': '/mnt/prol...r_prefetch_factor': 256}, input_type=dict]```

### Steps to reproduce

see above

### Config yaml

```yaml

```

### Possible solution

_No response_

### Which Operating Systems are you using?

- [ ] Linux
- [ ] macOS
- [ ] Windows

### Python Version

3.10

### axolotl branch-commit

master

### Acknowledgements

- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ray bf16 availability: the check does not happen in the gpu worker, so it always says bf16 is not available #3179

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ray bf16 availability: the check does not happen in the gpu worker, so it always says bf16 is not available #3179

Description

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions