GRPO training on structured data

### Please check that this issue hasn't been reported before.

- [x] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.

### Expected Behavior

I am trying to run GRPO training on tool calls data set.
The configuation looks like :
```
base_model: /opt/ml/model/gpt-oss-20b

use_kernels: false
model_quantization_config: Mxfp4Config
model_quantization_config_kwargs:
  dequantize: true

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

experimental_skip_move_to_device: true  # prevent OOM by NOT putting model to GPU before sharding

chat_template: tokenizer_default

datasets:
  - path: /opt/ml/model/grpo_rows.jsonl
    type: chat_template
    field_messages: prompt
    message_field_role: role
    message_field_content: content
    field_tools: tools

rl: grpo

trl:
  use_vllm: true
  vllm_server_host: 127.0.0.1
  vllm_server_port: 8000
  vllm_server_timeout: 300
  num_generations: 4
  max_completion_length: 6000
  rollout_func: ged.runtime.rollout.tool_rollout
  reward_funcs:
    - ged.runtime.rewards.r_step_keywords
    - ged.runtime.rewards.r_mock_success
    - ged.runtime.rewards.r_json
  reward_weights: [0.4, 0.4, 0.2]

dataset_prepared_path: last_run_prepared
val_set_size: 0.03
eval_steps: 10

```

When running the training, I am getting error :

```
Mapping RL Dataset (num_proc=30):   0%|          | 0/30 [00:10<?, ? examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 586, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3664, in _map_single
    for i, example in iter_outputs(shard_iterable):
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3638, in iter_outputs
    yield i, apply_function(example, i, offset=offset)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3561, in apply_function
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/dpo/chat_template.py", line 63, in transform_fn
    chosen_raw = sample[field_chosen]
                 ~~~~~~^^^^^^^^^^^^^^
  File "/opt/conda/envs/training-env/lib/python3.12/site-packages/datasets/formatting/formatting.py", line 283, in _getitem_
    value = self.data[key]
            ~~~~~~~~~^^^^^
KeyError: 'chosen'
```

Should the data set for GRPO include 
field_chosen: chosen
field_rejected: rejected  # Required for DPO-style
? 

As I undersatand chosen/rejected fields are relevant for DPO, not GRPO.

Thank you

### Current behaviour

Error raised when trying to train GRPO on tools data set.

### Steps to reproduce

1. Preparing data set with function calls
2. Creating Axolotl configuation as specified with GRPO training.
3. Launching training

### Config yaml

```yaml

```

### Possible solution

_No response_

### Which Operating Systems are you using?

- [x] Linux
- [ ] macOS
- [ ] Windows

### Python Version

3.12

### axolotl branch-commit

dd78f2e0

### Acknowledgements

- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GRPO training on structured data #3269

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

GRPO training on structured data #3269

Description

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions