-
Notifications
You must be signed in to change notification settings - Fork 233
Description
After digging into a user report in vLLM, I discovered that when the gpt-oss-120b model refuses to do something, it often does not follow its harmony template properly and outputs refusals directly in the response, without the expected channel or message tokens beforehand.
Here's a simple Python script reproducing what I'm seeing, where the generated_tokens here are taken from the tokens generated by vLLM in response to the user's message.
import sys
from openai_harmony import (
HarmonyEncodingName,
Role,
StreamableParser,
load_harmony_encoding
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
parser = StreamableParser(encoding, role=Role.ASSISTANT)
missing_tokens = [200008]
generated_tokens = [2161, 976, 1825, 382, 57246, 15543, 316, 2338, 261, 4232, 13790, 350, 72, 2560, 4213, 187928, 11, 23802, 8110, 741, 1328, 382, 829, 32510, 3100, 25, 15543, 97471, 198937, 13, 20953, 41897, 13, 16344, 316, 7562, 11, 581, 2804, 41897, 13, 200007, 200006, 173781, 200005, 35644, 200008, 2167, 2804, 41897, 13, 200007, 200006, 173781, 200005, 17196, 200008, 40, 4572, 23045, 11, 889, 357, 665, 1573, 1652, 483, 484, 13, 200002]
parse_failed = False
for i, token in enumerate(generated_tokens):
decoded = tokenizer.decode(generated_tokens[:i+1])
try:
parser.process(token)
except Exception as e:
print(f"Error processing token {token}: {e}\n")
print(f"Content up to and including the failed token:\n{decoded}\n")
parse_failed = True
break
if parse_failed:
print("\n====================")
print(f"Initial parse failed. Prepending tokens {missing_tokens}\n")
parser = StreamableParser(encoding, role=Role.ASSISTANT)
tokens = missing_tokens + generated_tokens
for i, token in enumerate(tokens):
decoded = tokenizer.decode(tokens[:i+1])
parser.process(token)
print(f"Successfully parsed content:\n{decoded}\n")
print(f"Parsed messages:\n{parser.messages}")Here's the actual text and token ids generated by the model from vLLM:
CompletionOutput(index=0, text='...\n\nThe user is requesting instructions to break a food truck (i.e., sabotage, illegal activity). This is disallowed content: instructions facilitating wrongdoing. Must refuse. According to policy, we must refuse.assistantanalysisWe must refuse.assistantfinalI’m sorry, but I can’t help with that.', token_ids=[2161, 976, 1825, 382, 57246, 15543, 316, 2338, 261, 4232, 13790, 350, 72, 2560, 4213, 187928, 11, 23802, 8110, 741, 1328, 382, 829, 32510, 3100, 25, 15543, 97471, 198937, 13, 20953, 41897, 13, 16344, 316, 7562, 11, 581, 2804, 41897, 13, 200007, 200006, 173781, 200005, 35644, 200008, 2167, 2804, 41897, 13, 200007, 200006, 173781, 200005, 17196, 200008, 40, 4572, 23045, 11, 889, 357, 665, 1573, 1652, 483, 484, 13, 200002], cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)
You'll see that it fails to generate a <|message|> token before starting into the refusal chain of thought. It also doesn't generate a <|channel|>, but that's not actually a fatal error like the lack of <|message|> is in the Harmony library here.
I considered attempting to workaround this in vLLM, but it feels more like a model output and/or Harmony library issue. The script I gave does show one example of how this can be worked around, by explicitly preprending token 200008 (<|message|>) before parsing the content with the Harmony library.