Fix gpt oss tool parser v2 #29236

ShaikAbdulHafeez03 · 2025-11-22T11:27:13Z

Purpose
This PR fixes critical issues with tool call extraction in the OpenAIToolParser (Harmony/gpt-oss), specifically when using Custom Tools and MCP (Model Context Protocol).

Previously, the parser failed to correctly capture tool calls that resided in the active stream buffer (current_content) but hadn't yet been finalized into a message object. This led to two major issues:

Tool Call Failure: valid tool calls were dropped or ignored.

Channel Leakage/Hallucination: The model appeared to hallucinate or leak tool call structures into the final (user-facing) message channel instead of processing them as structured tool calls.

Changes Implemented:

Active Buffer Parsing: Added logic to inspect parser.current_content and parser.current_channel. If the active channel is commentary (used for functions in Harmony), the content is now correctly extracted as a tool call.

Tag Sanitization: Added a sanitization step to strip internal tags (e.g., <|channel|>) from function names.

Robust JSON Handling: Improved JSON parsing for tool arguments to handle partial or slightly malformed buffers.

Test Plan
Scenario: Run a request using a Custom Tool or MCP tool definition (e.g., web_search or a custom python function).

Send a request that forces custom tool call or try calling a custom mcp server (it fail's calling multiple tools because of parsing issue)

Test Result
Before: When using custom tools/MCP, the tool calling would frequently fail. The model output would often "hallucinate" or leak the raw tool call JSON into the final user message string, failing to trigger the actual tool execution on the client side.

After: Tool calls are cleanly extracted from the commentary channel. The final message content remains clean, and custom tools/MCP function correctly without leakage.

Essential Elements of an Effective PR Description Checklist

[x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".

[x] The test plan, such as providing test command.

[x] The test results, such as pasting the results comparison before and after, or e2e results

[ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

[ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

gemini-code-assist

Code Review

This pull request introduces important fixes for tool call parsing with gpt-oss models, addressing issues with dropped tool calls and malformed outputs. The changes involve more robust parsing of the active stream buffer, sanitizing function names, and improving JSON argument handling. The review identifies two high-severity issues in vllm/entrypoints/harmony_utils.py: one related to potential data loss in parse_chat_output due to content overwriting, and another concerning silent error handling in parse_output_into_messages which could lead to truncated outputs. Addressing these will further improve the robustness and debuggability of the parsing logic.

gemini-code-assist · 2025-11-22T11:29:09Z

vllm/entrypoints/harmony_utils.py

+        except Exception:
+            break


The use of a broad except Exception: that silently breaks can hide underlying bugs in the parser or issues with the token stream. If an exception occurs, token processing stops, and the output is silently truncated. This can lead to incomplete results without any warning, making debugging difficult. This is a high-risk pattern that can cause silent data loss. It is highly recommended to log the exception before breaking the loop to provide visibility into why parsing was stopped.

Suggested change

except Exception:

break

except Exception:

# TODO: Add logging here to avoid silent failures. For example:

# logger.warning("Error processing token %d. Stopping parsing.",

# token_id, exc_info=True)

break

gemini-code-assist · 2025-11-22T11:29:09Z

vllm/entrypoints/harmony_utils.py

+    reasoning_parts = []
+    final_content = None
+    is_tool_call = False
+
+    for msg in output_msgs:
+        if msg.channel == "analysis":
+            for content in msg.content:
+                reasoning_parts.append(content.text)
+        elif msg.channel == "final":
+            for content in msg.content:
+                final_content = content.text
+        elif msg.channel == "commentary" and msg.recipient and msg.recipient.startswith("functions."):
+            is_tool_call = True
+            if not final_content:
+                final_content = ""
+            for content in msg.content:
+                 final_content = content.text
+
+    if parser.current_content:
+        if parser.current_channel == "analysis":
+             reasoning_parts.append(parser.current_content)
+        elif parser.current_channel == "final":
+             final_content = parser.current_content
+        elif parser.current_channel == "commentary" and parser.current_recipient and parser.current_recipient.startswith("functions."):
+             is_tool_call = True
+             final_content = parser.current_content
+
+    reasoning = "\n".join(reasoning_parts) if reasoning_parts else None
+
+    return reasoning, final_content, is_tool_call


The implementation for final_content handling has several issues that can lead to data loss by overwriting content. Specifically:

In the loop over output_msgs, final_content is overwritten for each message in the final channel and for each tool call in the commentary channel. If there are multiple such messages, only the content of the last one will be preserved.

When processing parser.current_content, final_content is again overwritten, discarding any content that might have been parsed from output_msgs.

This is particularly problematic for tool calls, as it means only the arguments of the last tool call are returned in final_content, while others are dropped. To prevent this data loss, final_content should be aggregated from all relevant messages and the current buffer, rather than being reassigned.

reasoning_parts = [] final_content = None is_tool_call = False for msg in output_msgs: if msg.channel == "analysis": for content in msg.content: reasoning_parts.append(content.text) elif msg.channel == "final": if final_content is None: final_content = "" for content in msg.content: final_content += content.text elif msg.channel == "commentary" and msg.recipient and msg.recipient.startswith("functions."): is_tool_call = True current_call_args = "".join(c.text for c in msg.content) if current_call_args: if final_content is None: final_content = current_call_args else: final_content += "\n" + current_call_args if parser.current_content: if parser.current_channel == "analysis": reasoning_parts.append(parser.current_content) elif parser.current_channel == "final": if final_content is None: final_content = "" final_content += parser.current_content elif parser.current_channel == "commentary" and parser.current_recipient and parser.current_recipient.startswith("functions."): is_tool_call = True if parser.current_content: if final_content is None: final_content = parser.current_content else: final_content += "\n" + parser.current_content reasoning = "\n".join(reasoning_parts) if reasoning_parts else None return reasoning, final_content, is_tool_call

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-22T11:30:17Z

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py

                if msg.recipient and msg.recipient.startswith("functions."):
-                    # If no content-type is given assume JSON, as that's the
-                    # most common case with gpt-oss models.
                    if not msg.content_type or "json" in msg.content_type:
-                        # load and dump the JSON text to check validity and
-                        # remove any extra newlines or other odd formatting
-                        try:
-                            tool_args = json.dumps(json.loads(msg_text))
-                        except json.JSONDecodeError:
-                            logger.exception(
-                                "Error decoding JSON tool call from response."
-                            )
-                            tool_args = msg_text
-                    else:
-                        tool_args = msg_text
-                    tool_calls.append(
-                        ToolCall(
-                            type="function",
-                            function=FunctionCall(
-                                name=msg.recipient.split("functions.")[1],
-                                arguments=tool_args,
-                            ),
-                        )
-                    )
+                        func_name = msg.recipient.split("functions.")[1]
+                        tool_calls.append(_create_tool_call(func_name, msg_text))


Non-JSON tool calls are dropped

extract_tool_calls now only appends a tool call when msg.content_type is absent or contains "json"; the prior fallback that forwarded other content types has been removed. Any tool invocation emitted with a non-JSON content type (e.g., plain-text arguments or other MIME types) will now leave tools_called false and omit the corresponding tool_calls entry even though the model targeted functions.*, so the client will never execute that tool.

Useful? React with 👍 / 👎.

DarkLight1337 · 2025-11-22T14:49:25Z

Can you merge from main to fix pre-commit?

Signed-off-by: ShaikAbdulHafeez03 <[email protected]>

ShaikAbdulHafeez03 · 2025-11-22T16:18:01Z

Can you merge from main to fix pre-commit?

Done fixed the pre commit issue

bbrowning · 2025-11-22T19:25:12Z

Previously, the parser failed to correctly capture tool calls that resided in the active stream buffer (current_content) but hadn't yet been finalized into a message object. This led to two major issues:

If there is content in current_content and not a finalized message, that means the model stopped mid-generation for some reason. Proper tool calls all end with a <|call|> token, which is considered a stop token and the tool call will end up in a message and not current_content of the parser.

The changes in #28729 will help many of the issues this proposes to fix in the parser. The root cause of many of these is we are prompting gpt-oss models wrong in many multi-turn scenarios, especially ones involving tool calling. So, what the changes in that PR do is fix how we're prompting the models which helps many of the situations where <|channel|> may leak into function recipients or we see other issues with models following their harmony format.

Some level of fuzzy parsing could be useful, but the changes here are large enough that I'd want to see what kind of impact this has on model accuracy and overall parsing success. Do you have any examples of requests that did not work before that this fixes? Have you run this against any tool calling benchmark suites to ensure it doesn't regress in other areas?

I'm pushing on the accuracy point here as I've spent a large amount of time benchmarking tool call accuracy in the gpt-oss models over the last few weeks to come up with the changes in #28729 and want to be careful about potentially regressing on any of that.

ShaikAbdulHafeez03 · 2025-11-23T03:18:19Z

Previously, the parser failed to correctly capture tool calls that resided in the active stream buffer (current_content) but hadn't yet been finalized into a message object. This led to two major issues:

If there is content in current_content and not a finalized message, that means the model stopped mid-generation for some reason. Proper tool calls all end with a <|call|> token, which is considered a stop token and the tool call will end up in a message and not current_content of the parser.

The changes in #28729 will help many of the issues this proposes to fix in the parser. The root cause of many of these is we are prompting gpt-oss models wrong in many multi-turn scenarios, especially ones involving tool calling. So, what the changes in that PR do is fix how we're prompting the models which helps many of the situations where <|channel|> may leak into function recipients or we see other issues with models following their harmony format.

Some level of fuzzy parsing could be useful, but the changes here are large enough that I'd want to see what kind of impact this has on model accuracy and overall parsing success. Do you have any examples of requests that did not work before that this fixes? Have you run this against any tool calling benchmark suites to ensure it doesn't regress in other areas?

I'm pushing on the accuracy point here as I've spent a large amount of time benchmarking tool call accuracy in the gpt-oss models over the last few weeks to come up with the changes in #28729 and want to be careful about potentially regressing on any of that.

Hi @bbrowning ,

Thanks for the feedback and for the context regarding #28729.

To address your question about previous failures: I have been testing specifically with the latest version (vllm 0.11.2) which includes the changes from #28729. While those prompting improvements definitely helped, I am still encountering significant parsing failures in scenarios where the model is provided with a large number of tools.

The Issue When the context is heavy, the model occasionally leaks the analysis/commentary tags directly into the function name string. Because the current parser expects a strict format, it treats the leakage as part of the tool name, causing the call to fail.

Here is a specific trace from my testing with v0.11.2:

[3900697] [Local→Remote] tools/list
[3900697] [Remote→Local] 3
⚠ Tool error attempt 1: Tool get_customer_name<|channel|>commentary not found in agent MCP Agent
[3900697] [Remote→Local] 4
⚠ Tool error attempt 2: Tool get_customer_name<|channel|>commentary not found in agent MCP Agent
As seen above, the parser is interpreting get_customer_name<|channel|>commentary as the tool name, rather than separating the tool call from the commentary tag.

Here is the another issue where it leaked the analysis part into the tool calling

Tool error attempt 3: Error code: 500 - {'error': {'message': 'unexpected tokens remaining in message header: ["need", "to", "create", "Jira", "issue.", "The", "tool", "name", "is", "createJiraIssue.", "The", "error", "says", "tool", "not", "found", "in", "agent", "MCP", "Jira", "Agent.", "Maybe", "we", "need", "to", "call", "createJiraIssue", "with", "correct", "JSON.", "The", "tool", "signature:", "createJiraIssue({cloudId,", "projectKey,", "issueTypeName,", "summary,", "description?,", "parent?,", "assignee_account_id?,", "additional_fields?}).",

As seen above, the parser is interpreting get_customer_name<|channel|>commentary as the tool name, rather than separating the tool call from the commentary tag.

Observed Impact & Improvements Thus This PR introduces a sanitization layer ("fuzzy parsing") to handle these edge cases. The impact on reliability in my tests has been significant:

Before fix: I observed a failure rate of approximately 2 out of 3 tool calls failing due to this parsing error.

After fix: The tool calling is functioning properly 100% of the time in the same scenario.

Scope: These improvements were observed specifically in ChatCompletion mode, ensuring the parser doesn't hang waiting for a format the model failed to provide perfectly.

ApostaC · 2025-11-24T00:40:45Z

cc @aarnphm @chanh

bbrowning · 2025-11-24T14:45:28Z

To address your question about previous failures: I have been testing specifically with the latest version (vllm 0.11.2) which includes the changes from #28729. While those prompting improvements definitely helped, I am still encountering significant parsing failures in scenarios where the model is provided with a large number of tools.

Note that 28729 is not merged yet, which means 0.11.2 does NOT have those changes in. It's expected you'll continue to have prompting and parsing issues until that merges.

ShaikAbdulHafeez03 requested review from aarnphm and chaunceyjiang as code owners November 22, 2025 11:27

mergify bot added frontend gpt-oss Related to GPT-OSS models tool-calling labels Nov 22, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 22, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 22, 2025

github-project-automation bot added this to Tool Calling Nov 22, 2025

gemini-code-assist bot reviewed Nov 22, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 22, 2025

View reviewed changes

ShaikAbdulHafeez03 force-pushed the fix-gpt-oss-tool-parser-v2 branch 2 times, most recently from cd557ce to 251113d Compare November 22, 2025 11:37

ShaikAbdulHafeez03 added 3 commits November 22, 2025 20:48

Fix: enhance OpenAIToolParser and Harmony parser tool-calling robustness

0cf29d5

Signed-off-by: ShaikAbdulHafeez03 <[email protected]>

Fix: Improve tool call parsing for Harmony/GPT-OSS models

31af7b8

Signed-off-by: ShaikAbdulHafeez03 <[email protected]>

DCO Check headder

8d94e30

Signed-off-by: ShaikAbdulHafeez03 <[email protected]>

ShaikAbdulHafeez03 force-pushed the fix-gpt-oss-tool-parser-v2 branch from 32a6d03 to 8d94e30 Compare November 22, 2025 15:19

Fix ruff formatting and linting errors

fb93ad8

Signed-off-by: ShaikAbdulHafeez03 <[email protected]>

Merge branch 'main' into fix-gpt-oss-tool-parser-v2

88a15be

fabienric mentioned this pull request Nov 26, 2025

[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting #28729

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix gpt oss tool parser v2 #29236

Fix gpt oss tool parser v2 #29236

Uh oh!

ShaikAbdulHafeez03 commented Nov 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 22, 2025

Uh oh!

gemini-code-assist bot Nov 22, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 22, 2025

Uh oh!

DarkLight1337 commented Nov 22, 2025

Uh oh!

ShaikAbdulHafeez03 commented Nov 22, 2025

Uh oh!

bbrowning commented Nov 22, 2025 •

edited

Loading

Uh oh!

ShaikAbdulHafeez03 commented Nov 23, 2025 •

edited

Loading

Uh oh!

ApostaC commented Nov 24, 2025

Uh oh!

bbrowning commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        except Exception:
-            break
+        except Exception:
+            # TODO: Add logging here to avoid silent failures. For example:
+            # logger.warning("Error processing token %d. Stopping parsing.",
+            #                token_id, exc_info=True)
+            break

Uh oh!

Fix gpt oss tool parser v2 #29236

Are you sure you want to change the base?

Fix gpt oss tool parser v2 #29236

Uh oh!

Conversation

ShaikAbdulHafeez03 commented Nov 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Send a request that forces custom tool call or try calling a custom mcp server (it fail's calling multiple tools because of parsing issue)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 22, 2025

Uh oh!

ShaikAbdulHafeez03 commented Nov 22, 2025

Uh oh!

bbrowning commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShaikAbdulHafeez03 commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ApostaC commented Nov 24, 2025

Uh oh!

bbrowning commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ShaikAbdulHafeez03 commented Nov 22, 2025 •

edited by github-actions bot

Loading

bbrowning commented Nov 22, 2025 •

edited

Loading

ShaikAbdulHafeez03 commented Nov 23, 2025 •

edited

Loading