Skip to content

Harmony library often cannot parse refusals from gpt-oss-120b model #80

@bbrowning

Description

@bbrowning

After digging into a user report in vLLM, I discovered that when the gpt-oss-120b model refuses to do something, it often does not follow its harmony template properly and outputs refusals directly in the response, without the expected channel or message tokens beforehand.

Here's a simple Python script reproducing what I'm seeing, where the generated_tokens here are taken from the tokens generated by vLLM in response to the user's message.

import sys

from openai_harmony import (
    HarmonyEncodingName,
    Role,
    StreamableParser,
    load_harmony_encoding
)
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")

encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
parser = StreamableParser(encoding, role=Role.ASSISTANT)

missing_tokens = [200008]

generated_tokens = [2161, 976, 1825, 382, 57246, 15543, 316, 2338, 261, 4232, 13790, 350, 72, 2560, 4213, 187928, 11, 23802, 8110, 741, 1328, 382, 829, 32510, 3100, 25, 15543, 97471, 198937, 13, 20953, 41897, 13, 16344, 316, 7562, 11, 581, 2804, 41897, 13, 200007, 200006, 173781, 200005, 35644, 200008, 2167, 2804, 41897, 13, 200007, 200006, 173781, 200005, 17196, 200008, 40, 4572, 23045, 11, 889, 357, 665, 1573, 1652, 483, 484, 13, 200002]

parse_failed = False
for i, token in enumerate(generated_tokens):
    decoded = tokenizer.decode(generated_tokens[:i+1])
    try:
        parser.process(token)
    except Exception as e:
        print(f"Error processing token {token}: {e}\n")
        print(f"Content up to and including the failed token:\n{decoded}\n")
        parse_failed = True
        break

if parse_failed:
    print("\n====================")
    print(f"Initial parse failed. Prepending tokens {missing_tokens}\n")
    parser = StreamableParser(encoding, role=Role.ASSISTANT)
    tokens = missing_tokens + generated_tokens
    for i, token in enumerate(tokens):
        decoded = tokenizer.decode(tokens[:i+1])
        parser.process(token)

print(f"Successfully parsed content:\n{decoded}\n")

print(f"Parsed messages:\n{parser.messages}")

Here's the actual text and token ids generated by the model from vLLM:

CompletionOutput(index=0, text='...\n\nThe user is requesting instructions to break a food truck (i.e., sabotage, illegal activity). This is disallowed content: instructions facilitating wrongdoing. Must refuse. According to policy, we must refuse.assistantanalysisWe must refuse.assistantfinalI’m sorry, but I can’t help with that.', token_ids=[2161, 976, 1825, 382, 57246, 15543, 316, 2338, 261, 4232, 13790, 350, 72, 2560, 4213, 187928, 11, 23802, 8110, 741, 1328, 382, 829, 32510, 3100, 25, 15543, 97471, 198937, 13, 20953, 41897, 13, 16344, 316, 7562, 11, 581, 2804, 41897, 13, 200007, 200006, 173781, 200005, 35644, 200008, 2167, 2804, 41897, 13, 200007, 200006, 173781, 200005, 17196, 200008, 40, 4572, 23045, 11, 889, 357, 665, 1573, 1652, 483, 484, 13, 200002], cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)

You'll see that it fails to generate a <|message|> token before starting into the refusal chain of thought. It also doesn't generate a <|channel|>, but that's not actually a fatal error like the lack of <|message|> is in the Harmony library here.

I considered attempting to workaround this in vLLM, but it feels more like a model output and/or Harmony library issue. The script I gave does show one example of how this can be worked around, by explicitly preprending token 200008 (<|message|>) before parsing the content with the Harmony library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions