-
Notifications
You must be signed in to change notification settings - Fork 8
Adding swagger for EigenAI API #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| enum: [system, user, assistant, tool] | ||
| content: | ||
| type: string | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpjunior92 can you comment on how the disable_auto_reasoning_format parameter works / should be documented here?
Also, how is this parameter specified by a non-curl client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disable_auto_reasoning_format is used to control response parsing. Here is how it is used:
let reasoning_format = if opts.disable_auto_reasoning_format {
0
} else {
1
};if disable_auto_reasoning_format is true, then reasoning_format is set to 0. if disable_auto_reasoning_format is false, then reasoning_format is set to 1. reasoning_format is a llama.cpp enum:
// reasoning API response format (not to be confused as chat template's reasoning format)
enum common_reasoning_format {
COMMON_REASONING_FORMAT_NONE,
COMMON_REASONING_FORMAT_AUTO, // Same as deepseek, using `message.reasoning_content`
COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY, // Extract thinking tag contents and return as `message.reasoning_content`, or leave inline in <think> tags in stream mode
COMMON_REASONING_FORMAT_DEEPSEEK, // Extract thinking tag contents and return as `message.reasoning_content`, including in streaming deltas.
// do not extend this enum unless you absolutely have to
// in most cases, use COMMON_REASONING_FORMAT_AUTO
// see: https://github.com/ggml-org/llama.cpp/pull/15408
};TL;DR: this parameter enable / disable reasoning parsing at llama.cpp level.
@NimaVaziri curl is, among other things, an HTTP client, so setting this field is always the same across any HTTP client: set the field in the body according to the language syntax / tools:
// go
// Request body as a Go struct
body := map[string]interface{}{
"model": "gpt-oss-20b-f16",
"max_tokens": 500,
"messages": []map[string]string{
{"role": "user", "content": "Explain how LLM works"},
},
"disable_auto_reasoning_format": false,
}// rust
let body = json!({
"model": "gpt-oss-20b-f16",
"max_tokens": 500,
"messages": [
{"role": "user", "content": "Explain how LLM works"}
],
"disable_auto_reasoning_format": false
});Note: this is a custom parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpjunior92 Right, I was more so asking what the implication of this parameter is for other clients like the OpenAI client, the AI SDK client, etc - how can it be set (if it can be set)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each SDK may have its own way of setting this value, so users must look in the docs of their respective SDKs. Here is an example for OpenAI SDK using extra_body:
from openai import OpenAI
client = OpenAI(
api_key="unused-but-required", # still required by SDK, but ignored by your server
base_url="http://localhost:8000/v1"
)
response = client.chat.completions.create(
model="gpt-oss-20b-f16",
max_tokens=500,
messages=[
{"role": "user", "content": "Explain how LLM works"}
],
extra_body={
"disable_auto_reasoning_format": True
},
extra_headers={
"x-eigenai-api-key": "sk-dummy-key"
}
)
print(response.choices[0].message.content)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MadelineAu let's make sure we include this as part of the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look at this and have a couple of questions @NimaVaziri @mpjunior92
The disable_auto_reasoning_format parameter isn't currently included for the EigenAI API - was it missed? Or has it been added recently?
We don't currently have any concept material that contextualizes what 'parsing at llama.cpp level' means - given the target audience, can I assume they would understand this statement? Or find an explanation to link out to?
This is a custom parameter - custom to us? Or custom as in not part of the OpenAI spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added recently.
"parsing at llama.cpp level" we don't need to include this phrasing. All we need to say is that "this parameter is used to control response parsing and separating out the reasoning from the content of the response".
It's not a custom parameter per se, it's a parameter at the llama.cpp level which we expose higher up. But in the context of a client call, you could say it's a custom parameter. Hopefully soon we can have a migration path where the parsed output becomes the default behavior and we can deprecate the parameter entirely.
static/openapi.yaml
Outdated
| disable_auto_reasoning_format: | ||
| type: boolean | ||
| description: > | ||
| Controls response parsing and separating out of the reasoning from the content of the response. For client calls, this is a custom parameter. Refer to the relevant client SDK documentation for information on how to set this parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Controls response parsing and separating out of the reasoning from the content of the response. For client calls, this is a custom parameter. Refer to the relevant client SDK documentation for information on how to set this parameter. | |
| Controls response parsing and separating out the reasoning trace from the content of the response. For client calls, this is a custom parameter. For eg, in the OpenAI client, it can be set in the `extra_body` field. Refer to the relevant client SDK documentation for information on how to set this parameter. |
No description provided.