Skip to content

Conversation

@MadelineAu
Copy link
Collaborator

No description provided.

@vercel
Copy link

vercel bot commented Nov 28, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
eigencloud-docs Ready Ready Preview Dec 10, 2025 3:39am

enum: [system, user, assistant, tool]
content:
type: string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpjunior92 can you comment on how the disable_auto_reasoning_format parameter works / should be documented here?

Also, how is this parameter specified by a non-curl client?

Copy link

@mpjunior92 mpjunior92 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable_auto_reasoning_format is used to control response parsing. Here is how it is used:

            let reasoning_format = if opts.disable_auto_reasoning_format {
                0
            } else {
                1
            };

if disable_auto_reasoning_format is true, then reasoning_format is set to 0. if disable_auto_reasoning_format is false, then reasoning_format is set to 1. reasoning_format is a llama.cpp enum:

// reasoning API response format (not to be confused as chat template's reasoning format)
enum common_reasoning_format {
    COMMON_REASONING_FORMAT_NONE,
    COMMON_REASONING_FORMAT_AUTO,            // Same as deepseek, using `message.reasoning_content`
    COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY, // Extract thinking tag contents and return as `message.reasoning_content`, or leave inline in <think> tags in stream mode
    COMMON_REASONING_FORMAT_DEEPSEEK,        // Extract thinking tag contents and return as `message.reasoning_content`, including in streaming deltas.
    // do not extend this enum unless you absolutely have to
    // in most cases, use COMMON_REASONING_FORMAT_AUTO
    // see: https://github.com/ggml-org/llama.cpp/pull/15408
};

TL;DR: this parameter enable / disable reasoning parsing at llama.cpp level.

@NimaVaziri curl is, among other things, an HTTP client, so setting this field is always the same across any HTTP client: set the field in the body according to the language syntax / tools:

// go
	// Request body as a Go struct
	body := map[string]interface{}{
		"model":                        "gpt-oss-20b-f16",
		"max_tokens":                   500,
		"messages": []map[string]string{
			{"role": "user", "content": "Explain how LLM works"},
		},
		"disable_auto_reasoning_format": false,
	}
// rust
    let body = json!({
        "model": "gpt-oss-20b-f16",
        "max_tokens": 500,
        "messages": [
            {"role": "user", "content": "Explain how LLM works"}
        ],
        "disable_auto_reasoning_format": false
    });

Note: this is a custom parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpjunior92 Right, I was more so asking what the implication of this parameter is for other clients like the OpenAI client, the AI SDK client, etc - how can it be set (if it can be set)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each SDK may have its own way of setting this value, so users must look in the docs of their respective SDKs. Here is an example for OpenAI SDK using extra_body:

from openai import OpenAI

client = OpenAI(
    api_key="unused-but-required",   # still required by SDK, but ignored by your server
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-20b-f16",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Explain how LLM works"}
    ],
    extra_body={
        "disable_auto_reasoning_format": True
    },
    extra_headers={
        "x-eigenai-api-key": "sk-dummy-key"
    }
)

print(response.choices[0].message.content)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MadelineAu let's make sure we include this as part of the docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at this and have a couple of questions @NimaVaziri @mpjunior92

The disable_auto_reasoning_format parameter isn't currently included for the EigenAI API - was it missed? Or has it been added recently?

We don't currently have any concept material that contextualizes what 'parsing at llama.cpp level' means - given the target audience, can I assume they would understand this statement? Or find an explanation to link out to?

This is a custom parameter - custom to us? Or custom as in not part of the OpenAI spec?

Copy link
Contributor

@NimaVaziri NimaVaziri Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added recently.

"parsing at llama.cpp level" we don't need to include this phrasing. All we need to say is that "this parameter is used to control response parsing and separating out the reasoning from the content of the response".

It's not a custom parameter per se, it's a parameter at the llama.cpp level which we expose higher up. But in the context of a client call, you could say it's a custom parameter. Hopefully soon we can have a migration path where the parsed output becomes the default behavior and we can deprecate the parameter entirely.

disable_auto_reasoning_format:
type: boolean
description: >
Controls response parsing and separating out of the reasoning from the content of the response. For client calls, this is a custom parameter. Refer to the relevant client SDK documentation for information on how to set this parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Controls response parsing and separating out of the reasoning from the content of the response. For client calls, this is a custom parameter. Refer to the relevant client SDK documentation for information on how to set this parameter.
Controls response parsing and separating out the reasoning trace from the content of the response. For client calls, this is a custom parameter. For eg, in the OpenAI client, it can be set in the `extra_body` field. Refer to the relevant client SDK documentation for information on how to set this parameter.

@MadelineAu MadelineAu merged commit 8230d70 into main Dec 10, 2025
3 checks passed
@MadelineAu MadelineAu deleted the swaggerExperiment branch December 10, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants