-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Fix gpt oss tool parser v2 #29236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix gpt oss tool parser v2 #29236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces important fixes for tool call parsing with gpt-oss models, addressing issues with dropped tool calls and malformed outputs. The changes involve more robust parsing of the active stream buffer, sanitizing function names, and improving JSON argument handling. The review identifies two high-severity issues in vllm/entrypoints/harmony_utils.py: one related to potential data loss in parse_chat_output due to content overwriting, and another concerning silent error handling in parse_output_into_messages which could lead to truncated outputs. Addressing these will further improve the robustness and debuggability of the parsing logic.
| except Exception: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a broad except Exception: that silently breaks can hide underlying bugs in the parser or issues with the token stream. If an exception occurs, token processing stops, and the output is silently truncated. This can lead to incomplete results without any warning, making debugging difficult. This is a high-risk pattern that can cause silent data loss. It is highly recommended to log the exception before breaking the loop to provide visibility into why parsing was stopped.
| except Exception: | |
| break | |
| except Exception: | |
| # TODO: Add logging here to avoid silent failures. For example: | |
| # logger.warning("Error processing token %d. Stopping parsing.", | |
| # token_id, exc_info=True) | |
| break |
vllm/entrypoints/harmony_utils.py
Outdated
| reasoning_parts = [] | ||
| final_content = None | ||
| is_tool_call = False | ||
|
|
||
| for msg in output_msgs: | ||
| if msg.channel == "analysis": | ||
| for content in msg.content: | ||
| reasoning_parts.append(content.text) | ||
| elif msg.channel == "final": | ||
| for content in msg.content: | ||
| final_content = content.text | ||
| elif msg.channel == "commentary" and msg.recipient and msg.recipient.startswith("functions."): | ||
| is_tool_call = True | ||
| if not final_content: | ||
| final_content = "" | ||
| for content in msg.content: | ||
| final_content = content.text | ||
|
|
||
| if parser.current_content: | ||
| if parser.current_channel == "analysis": | ||
| reasoning_parts.append(parser.current_content) | ||
| elif parser.current_channel == "final": | ||
| final_content = parser.current_content | ||
| elif parser.current_channel == "commentary" and parser.current_recipient and parser.current_recipient.startswith("functions."): | ||
| is_tool_call = True | ||
| final_content = parser.current_content | ||
|
|
||
| reasoning = "\n".join(reasoning_parts) if reasoning_parts else None | ||
|
|
||
| return reasoning, final_content, is_tool_call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation for final_content handling has several issues that can lead to data loss by overwriting content. Specifically:
- In the loop over
output_msgs,final_contentis overwritten for each message in thefinalchannel and for each tool call in thecommentarychannel. If there are multiple such messages, only the content of the last one will be preserved. - When processing
parser.current_content,final_contentis again overwritten, discarding any content that might have been parsed fromoutput_msgs.
This is particularly problematic for tool calls, as it means only the arguments of the last tool call are returned in final_content, while others are dropped. To prevent this data loss, final_content should be aggregated from all relevant messages and the current buffer, rather than being reassigned.
reasoning_parts = []
final_content = None
is_tool_call = False
for msg in output_msgs:
if msg.channel == "analysis":
for content in msg.content:
reasoning_parts.append(content.text)
elif msg.channel == "final":
if final_content is None:
final_content = ""
for content in msg.content:
final_content += content.text
elif msg.channel == "commentary" and msg.recipient and msg.recipient.startswith("functions."):
is_tool_call = True
current_call_args = "".join(c.text for c in msg.content)
if current_call_args:
if final_content is None:
final_content = current_call_args
else:
final_content += "\n" + current_call_args
if parser.current_content:
if parser.current_channel == "analysis":
reasoning_parts.append(parser.current_content)
elif parser.current_channel == "final":
if final_content is None:
final_content = ""
final_content += parser.current_content
elif parser.current_channel == "commentary" and parser.current_recipient and parser.current_recipient.startswith("functions."):
is_tool_call = True
if parser.current_content:
if final_content is None:
final_content = parser.current_content
else:
final_content += "\n" + parser.current_content
reasoning = "\n".join(reasoning_parts) if reasoning_parts else None
return reasoning, final_content, is_tool_callThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if msg.recipient and msg.recipient.startswith("functions."): | ||
| # If no content-type is given assume JSON, as that's the | ||
| # most common case with gpt-oss models. | ||
| if not msg.content_type or "json" in msg.content_type: | ||
| # load and dump the JSON text to check validity and | ||
| # remove any extra newlines or other odd formatting | ||
| try: | ||
| tool_args = json.dumps(json.loads(msg_text)) | ||
| except json.JSONDecodeError: | ||
| logger.exception( | ||
| "Error decoding JSON tool call from response." | ||
| ) | ||
| tool_args = msg_text | ||
| else: | ||
| tool_args = msg_text | ||
| tool_calls.append( | ||
| ToolCall( | ||
| type="function", | ||
| function=FunctionCall( | ||
| name=msg.recipient.split("functions.")[1], | ||
| arguments=tool_args, | ||
| ), | ||
| ) | ||
| ) | ||
| func_name = msg.recipient.split("functions.")[1] | ||
| tool_calls.append(_create_tool_call(func_name, msg_text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-JSON tool calls are dropped
extract_tool_calls now only appends a tool call when msg.content_type is absent or contains "json"; the prior fallback that forwarded other content types has been removed. Any tool invocation emitted with a non-JSON content type (e.g., plain-text arguments or other MIME types) will now leave tools_called false and omit the corresponding tool_calls entry even though the model targeted functions.*, so the client will never execute that tool.
Useful? React with 👍 / 👎.
cd557ce to
251113d
Compare
|
Can you merge from main to fix pre-commit? |
Signed-off-by: ShaikAbdulHafeez03 <[email protected]>
Signed-off-by: ShaikAbdulHafeez03 <[email protected]>
Signed-off-by: ShaikAbdulHafeez03 <[email protected]>
32a6d03 to
8d94e30
Compare
Signed-off-by: ShaikAbdulHafeez03 <[email protected]>
Done fixed the pre commit issue |
If there is content in current_content and not a finalized message, that means the model stopped mid-generation for some reason. Proper tool calls all end with a The changes in #28729 will help many of the issues this proposes to fix in the parser. The root cause of many of these is we are prompting gpt-oss models wrong in many multi-turn scenarios, especially ones involving tool calling. So, what the changes in that PR do is fix how we're prompting the models which helps many of the situations where Some level of fuzzy parsing could be useful, but the changes here are large enough that I'd want to see what kind of impact this has on model accuracy and overall parsing success. Do you have any examples of requests that did not work before that this fixes? Have you run this against any tool calling benchmark suites to ensure it doesn't regress in other areas? I'm pushing on the accuracy point here as I've spent a large amount of time benchmarking tool call accuracy in the gpt-oss models over the last few weeks to come up with the changes in #28729 and want to be careful about potentially regressing on any of that. |
Hi @bbrowning , Thanks for the feedback and for the context regarding #28729. To address your question about previous failures: I have been testing specifically with the latest version (vllm 0.11.2) which includes the changes from #28729. While those prompting improvements definitely helped, I am still encountering significant parsing failures in scenarios where the model is provided with a large number of tools. The Issue When the context is heavy, the model occasionally leaks the analysis/commentary tags directly into the function name string. Because the current parser expects a strict format, it treats the leakage as part of the tool name, causing the call to fail. Here is a specific trace from my testing with v0.11.2: [3900697] [Local→Remote] tools/list Here is the another issue where it leaked the analysis part into the tool calling Tool error attempt 3: Error code: 500 - {'error': {'message': 'unexpected tokens remaining in message header: ["need", "to", "create", "Jira", "issue.", "The", "tool", "name", "is", "createJiraIssue.", "The", "error", "says", "tool", "not", "found", "in", "agent", "MCP", "Jira", "Agent.", "Maybe", "we", "need", "to", "call", "createJiraIssue", "with", "correct", "JSON.", "The", "tool", "signature:", "createJiraIssue({cloudId,", "projectKey,", "issueTypeName,", "summary,", "description?,", "parent?,", "assignee_account_id?,", "additional_fields?}).", As seen above, the parser is interpreting get_customer_name<|channel|>commentary as the tool name, rather than separating the tool call from the commentary tag. Observed Impact & Improvements Thus This PR introduces a sanitization layer ("fuzzy parsing") to handle these edge cases. The impact on reliability in my tests has been significant: Before fix: I observed a failure rate of approximately 2 out of 3 tool calls failing due to this parsing error. After fix: The tool calling is functioning properly 100% of the time in the same scenario. Scope: These improvements were observed specifically in ChatCompletion mode, ensuring the parser doesn't hang waiting for a format the model failed to provide perfectly. |
Note that 28729 is not merged yet, which means 0.11.2 does NOT have those changes in. It's expected you'll continue to have prompting and parsing issues until that merges. |
Purpose
This PR fixes critical issues with tool call extraction in the OpenAIToolParser (Harmony/gpt-oss), specifically when using Custom Tools and MCP (Model Context Protocol).
Previously, the parser failed to correctly capture tool calls that resided in the active stream buffer (current_content) but hadn't yet been finalized into a message object. This led to two major issues:
Tool Call Failure: valid tool calls were dropped or ignored.
Channel Leakage/Hallucination: The model appeared to hallucinate or leak tool call structures into the final (user-facing) message channel instead of processing them as structured tool calls.
Changes Implemented:
Active Buffer Parsing: Added logic to inspect parser.current_content and parser.current_channel. If the active channel is commentary (used for functions in Harmony), the content is now correctly extracted as a tool call.
Tag Sanitization: Added a sanitization step to strip internal tags (e.g., <|channel|>) from function names.
Robust JSON Handling: Improved JSON parsing for tool arguments to handle partial or slightly malformed buffers.
Test Plan
Scenario: Run a request using a Custom Tool or MCP tool definition (e.g., web_search or a custom python function).
Send a request that forces custom tool call or try calling a custom mcp server (it fail's calling multiple tools because of parsing issue)
Test Result
Before: When using custom tools/MCP, the tool calling would frequently fail. The model output would often "hallucinate" or leak the raw tool call JSON into the final user message string, failing to trigger the actual tool execution on the client side.
After: Tool calls are cleanly extracted from the commentary channel. The final message content remains clean, and custom tools/MCP function correctly without leakage.
Essential Elements of an Effective PR Description Checklist
[x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[x] The test plan, such as providing test command.
[x] The test results, such as pasting the results comparison before and after, or e2e results
[ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
[ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)