Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 5 additions & 31 deletions docs/eigenai/howto/use-eigenai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@ See [Try EigenAI](../try-eigenai.md#get-started-for-free) for information on obt

We're starting off with supporting the `gpt-oss-120b-f16` and `qwen3-32b-128k-bf16` models based on initial demand and expanding from there. To get started or request another model, visit our [onboarding page](https://onboarding.eigencloud.xyz/).

## Chat Completions API
## Chat Completions API Reference

Refer to the [swagger documentation for the EigenAI API](https://docs.eigencloud.xyz/api).

## Chat Completions API Examples

<Tabs>
<TabItem value="testnet" label="Testnet Request">
Expand Down Expand Up @@ -233,33 +237,3 @@ We're starting off with supporting the `gpt-oss-120b-f16` and `qwen3-32b-128k-bf
</TabItem>
</Tabs>

## Supported parameters

This list will be expanding to cover the full parameter set of the Chat Completions API.

- `messages: array`
- A list of messages comprising the conversation so far
- `model: string`
- Model ID used to generate the response, like `gpt-oss-120b-f16`
- `max_tokens: (optional) integer`
- The maximum number of [tokens](https://platform.openai.com/tokenizer) that can be generated in the chat completion. This value can be used to control [costs](https://openai.com/api/pricing/) for text generated via API.
- `seed: (optional) integer`
- If specified, our system will run the inference deterministically, such that repeated requests with the same `seed` and parameters should return the same result.
- `stream: (optional) bool`
- If set to true, the model response data will be streamed to the client as it is generated using Server-Side Events (SSE).
- `temperature: (optional) number`
- What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- `top_p: (optional) number`
- An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
- `logprobs: (optional) bool`
- Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`
- `frequency_penalty: (optional) number`
- Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
- `presence_penalty: (optional) number`
- Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
- `tools: array`
- A list of tools ([function tools](https://platform.openai.com/docs/guides/function-calling)) the model may call.
- `tool_choice: (optional) string`
- “auto”, “required”, “none”
- Controls which (if any) tool is called by the model. `none` means the model will not call any tool and instead generates a message. `auto` means the model can pick between generating a message or calling one or more tools. `required` means the model must call one or more tools. Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool.
- `none` is the default when no tools are present. `auto` is the default if tools are present.
6 changes: 6 additions & 0 deletions docs/eigenai/reference/eigenai-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: EigenAI API
sidebar_position: 1
---

Refer to the [swagger documentation for the EigenAI API](https://docs.eigencloud.xyz/api).
4 changes: 0 additions & 4 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -402,10 +402,6 @@ const redirects = [
},

//External references
{
from: '/api',
to: '/eigenlayer/reference/apis-and-dashboards'
},
{
from: '/developers/slashing-background',
to: '/eigenlayer/developers/concepts/slashing/slashing-concept-developers'
Expand Down
19 changes: 18 additions & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@
"rehype-katex": "^7.0.0",
"remark-gfm": "^4.0.0",
"remark-math": "^6.0.0",
"repomix": "^0.3.6"
"repomix": "^0.3.6",
"swagger-ui-dist": "^5.30.3"
},
"devDependencies": {
"@docusaurus/eslint-plugin": "^3.8.1",
Expand Down
19 changes: 19 additions & 0 deletions src/pages/api.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import React, { useEffect } from "react";
import SwaggerUI from "swagger-ui-dist/swagger-ui-es-bundle";
import "swagger-ui-dist/swagger-ui.css";

export default function ApiDocs() {
useEffect(() => {
SwaggerUI({
dom_id: "#swagger-container",
url: "/openapi.yaml",
deepLinking: true,
});
}, []);

return (
<div style={{ height: "100%" }}>
<div id="swagger-container" />
</div>
);
}
178 changes: 178 additions & 0 deletions static/openapi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
openapi: 3.1.0
info:
title: EigenAI Chat API
version: 0.1.0
description: Chat completion API for EigenAI.

servers:
- url: https://api.eigencloud.xyz

paths:
/chat:
post:
summary: Create a chat completion
operationId: createChatCompletion
description: Generates a model response for a given chat conversation.

requestBody:
required: true
content:
application/json:
schema:
type: object
properties:

model:
type: string
description: >
Model ID used to generate the response, e.g. `gpt-oss-120b-f16`.

messages:
type: array
description: A list of messages representing the conversation so far.
items:
type: object
properties:
role:
type: string
enum: [system, user, assistant, tool]
content:
type: string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpjunior92 can you comment on how the disable_auto_reasoning_format parameter works / should be documented here?

Also, how is this parameter specified by a non-curl client?

Copy link

@mpjunior92 mpjunior92 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable_auto_reasoning_format is used to control response parsing. Here is how it is used:

            let reasoning_format = if opts.disable_auto_reasoning_format {
                0
            } else {
                1
            };

if disable_auto_reasoning_format is true, then reasoning_format is set to 0. if disable_auto_reasoning_format is false, then reasoning_format is set to 1. reasoning_format is a llama.cpp enum:

// reasoning API response format (not to be confused as chat template's reasoning format)
enum common_reasoning_format {
    COMMON_REASONING_FORMAT_NONE,
    COMMON_REASONING_FORMAT_AUTO,            // Same as deepseek, using `message.reasoning_content`
    COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY, // Extract thinking tag contents and return as `message.reasoning_content`, or leave inline in <think> tags in stream mode
    COMMON_REASONING_FORMAT_DEEPSEEK,        // Extract thinking tag contents and return as `message.reasoning_content`, including in streaming deltas.
    // do not extend this enum unless you absolutely have to
    // in most cases, use COMMON_REASONING_FORMAT_AUTO
    // see: https://github.com/ggml-org/llama.cpp/pull/15408
};

TL;DR: this parameter enable / disable reasoning parsing at llama.cpp level.

@NimaVaziri curl is, among other things, an HTTP client, so setting this field is always the same across any HTTP client: set the field in the body according to the language syntax / tools:

// go
	// Request body as a Go struct
	body := map[string]interface{}{
		"model":                        "gpt-oss-20b-f16",
		"max_tokens":                   500,
		"messages": []map[string]string{
			{"role": "user", "content": "Explain how LLM works"},
		},
		"disable_auto_reasoning_format": false,
	}
// rust
    let body = json!({
        "model": "gpt-oss-20b-f16",
        "max_tokens": 500,
        "messages": [
            {"role": "user", "content": "Explain how LLM works"}
        ],
        "disable_auto_reasoning_format": false
    });

Note: this is a custom parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpjunior92 Right, I was more so asking what the implication of this parameter is for other clients like the OpenAI client, the AI SDK client, etc - how can it be set (if it can be set)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each SDK may have its own way of setting this value, so users must look in the docs of their respective SDKs. Here is an example for OpenAI SDK using extra_body:

from openai import OpenAI

client = OpenAI(
    api_key="unused-but-required",   # still required by SDK, but ignored by your server
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-20b-f16",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Explain how LLM works"}
    ],
    extra_body={
        "disable_auto_reasoning_format": True
    },
    extra_headers={
        "x-eigenai-api-key": "sk-dummy-key"
    }
)

print(response.choices[0].message.content)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MadelineAu let's make sure we include this as part of the docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at this and have a couple of questions @NimaVaziri @mpjunior92

The disable_auto_reasoning_format parameter isn't currently included for the EigenAI API - was it missed? Or has it been added recently?

We don't currently have any concept material that contextualizes what 'parsing at llama.cpp level' means - given the target audience, can I assume they would understand this statement? Or find an explanation to link out to?

This is a custom parameter - custom to us? Or custom as in not part of the OpenAI spec?

Copy link
Contributor

@NimaVaziri NimaVaziri Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added recently.

"parsing at llama.cpp level" we don't need to include this phrasing. All we need to say is that "this parameter is used to control response parsing and separating out the reasoning from the content of the response".

It's not a custom parameter per se, it's a parameter at the llama.cpp level which we expose higher up. But in the context of a client call, you could say it's a custom parameter. Hopefully soon we can have a migration path where the parsed output becomes the default behavior and we can deprecate the parameter entirely.

disable_auto_reasoning_format:
type: boolean
description: >
Controls response parsing and separating out the reasoning trace from the content of the response. For client calls, this is a custom parameter. For example, in the OpenAI client, it's set in the `extra_body` field. Refer to the relevant client SDK documentation for information on how to set this parameter.

max_tokens:
type: integer
nullable: true
description: >
Optional. Maximum number of tokens to generate.

seed:
type: integer
nullable: true
description: >
Optional. If provided, inference becomes deterministic for repeated (seed + params).

stream:
type: boolean
nullable: true
description: >
Optional. If true, response is streamed using Server-Sent Events (SSE).

temperature:
type: number
format: float
nullable: true
description: >
Optional. Sampling temperature between 0 and 2.

top_p:
type: number
format: float
nullable: true
description: >
Optional. Nucleus sampling threshold (top-p).

logprobs:
type: boolean
nullable: true
description: >
Optional. If true, includes token-level log probabilities in the response.

frequency_penalty:
type: number
format: float
nullable: true
description: >
Optional. Number between -2.0 and 2.0 penalizing token repetition frequency.

presence_penalty:
type: number
format: float
nullable: true
description: >
Optional. Number between -2.0 and 2.0 penalizing previously seen tokens.

tools:
type: array
description: A list of tools the model may call.
items:
type: object
properties:
type:
type: string
enum: [function]
function:
type: object
properties:
name:
type: string
description: Name of the function.
description:
type: string
parameters:
type: object
description: JSON schema of function parameters.

tool_choice:
description: >
Optional. Controls how the model uses tools.
- `none`: never call tools
- `auto`: model decides (default if tools exist)
- `required`: must call tools
- or specify a particular function
oneOf:
- type: string
enum: [none, auto, required]
- type: object
properties:
type:
type: string
enum: [function]
function:
type: object
properties:
name:
type: string

required:
- model
- messages

responses:
"200":
description: Successful completion response. The response includes a cryptographic signature field that proves the response was generated by the EigenAI Operator (see [Verify Signature](https://docs.eigencloud.xyz/eigenai/howto/verify-signature) for more information).
content:
application/json:
schema:
type: object
properties:
id:
type: string
object:
type: string
created:
type: integer
model:
type: string
choices:
type: array
items:
type: object
properties:
index:
type: integer
message:
type: object
properties:
role:
type: string
content:
type: string
finish_reason:
type: string
signature:
type: string