-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Please check that this issue hasn't been reported before.
- I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
This check here is always false in settings in which a ray cluster has a gpu node https://github.com/axolotl-ai-cloud/axolotl/blob/b3b92687c4ba8792d343b6b1a616f541840db8b3/src/axolotl/cli/config.py#L222C21-L222C48. Why? because the transformer code does not get executed in a gpu worker.
You should run this inside the RayTrainer
Current behaviour
UDA for now.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/workspace/axolotl/src/axolotl/cli/train.py", line 121, in <module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/cli/train.py", line 63, in do_cli
parsed_cfg = load_cfg(config, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/cli/config.py", line 219, in load_cfg
cfg = validate_config(
^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/config/__init__.py", line 295, in validate_config
AxolotlConfigWCapabilities(
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/main.py", line 214, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for AxolotlConfigWCapabilities
Value error, bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above. [type=value_error, input_value={'base_model': '/mnt/prol...r_prefetch_factor': 256}, input_type=dict]```
### Steps to reproduce
see above
### Config yaml
```yaml
Possible solution
No response
Which Operating Systems are you using?
- Linux
- macOS
- Windows
Python Version
3.10
axolotl branch-commit
master
Acknowledgements
- My issue title is concise, descriptive, and in title casing.
- I have searched the existing issues to make sure this bug has not been reported yet.
- I am using the latest version of axolotl.
- I have provided enough information for the maintainers to reproduce and diagnose the issue.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working