merge main #17145

Sameerlite · 2025-11-26T16:18:08Z

Title

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Fixes #16810 ## Problem When using completion() with models that have mode: "responses" (like o3-pro, gpt-5-codex), the response_format parameter with JSON schemas was being ignored or incorrectly handled, causing: - Large schemas (>512 chars) to fail with "metadata.schema_dict_json: string too long" error - Structured outputs to be silently dropped - Users' code to break unexpectedly ## Root Cause The completion -> responses bridge in litellm/completion_extras/litellm_responses_transformation/transformation.py was missing the conversion of response_format (Chat Completion format) to text.format (Responses API format). The inverse bridge (responses -> completion) already had this conversion implemented in commit 29f0ed2, but the completion -> responses direction was incomplete. ## Solution Added _transform_response_format_to_text_format() method that converts: - response_format with json_schema → text.format with json_schema - response_format with json_object → text.format with json_object - response_format with text → text.format with text Updated transform_request() to detect and convert response_format parameter before sending to litellm.responses(). ## Changes - Added _transform_response_format_to_text_format() method (lines 592-647) - Modified transform_request() to handle response_format (lines 199-203) - Added comprehensive tests to validate the conversion ## Testing - 5 new unit tests covering all conversion scenarios - Real API test with OpenAI confirming large schemas (>512 chars) work - No more metadata.schema_dict_json errors ## Impact Users can now use completion() with models that have mode: "responses" and: - Use large JSON schemas without hitting metadata 512 char limit - Get proper structured outputs - Have their existing code continue working

* add AWS fields for KeyManagementSettings * docs IAM roles * use aws iam auth on secret manager v2 * fix: load_aws_secret_manager * test_secret_manager_with_iam_role_settings

* feat: mcp prompts support * feat: mcp resources support

…16916)

Updated pydantic version to 2.11.0 for compatibility.

…16898) * TestPromptRequest * add prompts/test endpoint for testing prompt * TestPromptTestEndpoint * feat: working v1 of this ui * workig prompt endpoints * add chat ui for prompts * add conversation panel * add init chat ui

Co-authored-by: Cursor Agent <[email protected]> Co-authored-by: ishaan <[email protected]>

* TestPromptRequest * add prompts/test endpoint for testing prompt * TestPromptTestEndpoint * feat: working v1 of this ui * workig prompt endpoints * add chat ui for prompts * add conversation panel * add init chat ui * allow clicking edit prompt * fix use get_base_prompt_id * add endpoints for viewing prompt versions * TestPromptVersioning * add getPromptVersions * add VersionHistorySidePanel * allow viewing version history * add version history

* fix images being dropped from tool results for bedrock * type fixes

* though signature tool call id * [stripe] refactor and tests * [stripe] remove md and move to factory * [stripe] remove redudant test * [stripe] ran black formatting * [stripe] add thought signature docs * [stripe] remove unused import

* Attempt CI/CD Fix * Adding test for coverage * Adding max depth to copilot and vertex * Fixing mypy lint and docker database * Fixing UI build issues * Update playwright test

…16929) * add _get_prompt_data_from_dotprompt_content * fix pre call hook for prompt template * fix: get_latest_version_prompt_id * fix get_latest_version_prompt_id * test_get_latest_version_prompt_id

#16932) * add _get_prompt_data_from_dotprompt_content * fix pre call hook for prompt template * fix: get_latest_version_prompt_id * fix get_latest_version_prompt_id * test_get_latest_version_prompt_id * fx info and delete lookup for prompts * refactor prompt table

…ish of showing version history (#16941) * add _get_prompt_data_from_dotprompt_content * fix pre call hook for prompt template * fix: get_latest_version_prompt_id * fix get_latest_version_prompt_id * test_get_latest_version_prompt_id * fx info and delete lookup for prompts * refactor prompt table * - rename to prompt studio * fix get_prompt_info * fix endpoints * add PromptCodeSnippets * prompt info view * add prompt info view * show correct version for prompts * fix version selector * fix endpoints and version * fix get_prompt_info * fix version display

Change model identifier from cerebras/openai/gpt-oss-120b to cerebras/gpt-oss-120b to match Cerebras API requirements. The Cerebras API only accepts 'gpt-oss-120b' as the model ID, not 'openai/gpt-oss-120b'. The previous name was causing "Model does not exist" errors when users tried to use it. Tested with real API calls to confirm: - cerebras/gpt-oss-120b → sends 'gpt-oss-120b' → ✅ works - cerebras/openai/gpt-oss-120b → sends 'openai/gpt-oss-120b' → ❌ fails Fixes #16924

* new model - add together_ai/zai-org/GLM-4.6 * together_ai/zai-org/GLM-4.6

…ody" (#16943) * add search_tool_name in litellm params * test_search_tool_name_in_all_litellm_params * bump config

* docs: fix mcp url format * fix: update Cursor MCP example to use url instead of server_url

Add gemini-3-pro-image-preview model configuration for Google's new image generation model (aka "Nano Banana Pro 🍌"). Model details: - Input: $2.00/1M tokens (text), $0.0011/image - Output: $12.00/1M tokens (text), $0.134/image (1K/2K) - Context: 65k input / 32k output tokens - Capabilities: structured outputs, web search, caching, thinking - No function calling support - Available on both Gemini API and Vertex AI Added variants: - gemini-3-pro-image-preview (base, uses Vertex AI) - gemini/gemini-3-pro-image-preview (Gemini API) - vertex_ai/gemini-3-pro-image-preview (Vertex AI) Source: https://ai.google.dev/gemini-api/docs/pricing Fixes: #16925

* feat: Add support for Grok 4.1 Fast models Add new xAI Grok 4.1 Fast models optimized for high-performance agentic tool calling: - xai/grok-4-1-fast (alias for grok-4-1-fast-reasoning) - xai/grok-4-1-fast-reasoning (with reasoning capabilities) - xai/grok-4-1-fast-reasoning-latest - xai/grok-4-1-fast-non-reasoning (without reasoning for faster responses) - xai/grok-4-1-fast-non-reasoning-latest Features: - Context window: 2,000,000 tokens - Pricing: $0.20/1M input, $0.50/1M output tokens - Cached tokens: $0.05/1M tokens - Supports: Function calling, Structured outputs, Vision, Audio input, Web search, Reasoning Fixes #16927 * docs: Add comprehensive Grok models documentation - Add 'Supported Models' section highlighting new Grok 4.1 Fast models - Include comparison guide for reasoning vs non-reasoning models - Add complete model family table (Grok 4.1, 4, 3, Code, 2) - Add features legend explaining capabilities - Remove pricing details (link to xAI docs instead for current rates) - Improve documentation clarity and consistency Related to #16927 * docs: Minor corrections to xai.md

…sponse (#16875) This fix addresses the same issue that was resolved for OpenAI video in PR #16708. The GeminiVideoConfig class was importing BaseVideoConfig only within TYPE_CHECKING, causing it to be 'Any' at runtime. This prevented the async_transform_video_content_response method from being available during video content downloads. Changes: - Moved BaseVideoConfig import from TYPE_CHECKING to top-level imports - Added test_gemini_video_config_has_async_transform() to verify the fix - Ensures GeminiVideoConfig properly inherits BaseVideoConfig at runtime Fixes video generation errors for Gemini Veo models: 'GeminiVideoConfig' object has no attribute 'async_transform_video_content_response'

* add DOCKER_MODEL_RUNNER * add DockerModelRunnerChatConfig Transorm * add docker_model_runner * add docker_model_runner * docs docker model runner * add DockerModelRunnerChatConfig * add docker_model_runner to providers * test_completion_hits_correct_url_and_body * fix sidebar * TestDockerModelRunnerIntegration * test_completion_with_custom_engine_and_host * docs docker model runner * docs fix

) * test_bedrock_openai_imported_model * AmazonBedrockOpenAIConfig * add openai route for bedrock * docs fix * fix code qa check

…#17101)

* include server_tool_use in streaming usage * add test

* fix transcription exception handling * reraise the exception

* init RAG api types * add RAG endpoints * init main.py for RAG ingest API * init RecursiveCharacterTextSplitter * add BaseRAGIngestion * fix OpenAIRAGIngestion * fix img handler * init OpenAIRAGIngestion * init BedrockRAGIngestion * init BedrockRAGIngestion * init rag tests * init BedrockVectorStoreOptions * implement BedrockRAGIngestion * add BaseRAGAPI * add endpoint for RAG ingest * add ingest RAG endpoints * add test doc * add parse_rag_ingest_request * update endpoints * docs add docs for new RAG API * fix qa check * fix linting * docs ficx * docs * add max depth checks * docs anthropic

…rmat-bridge-conversion fix: Support response_format parameter in completion -> responses bridge

…ng tokens (#17116)

- Automatically pass LiteLLM virtual key context as X-LiteLLM-* headers - Includes key_alias, user_id, team_id, org_id, and user_email - No configuration required - always enabled for application/user tracking - Excludes sensitive data (metadata, API tokens) for security - Add comprehensive tests (30 tests, all passing) - Update documentation with header details

This should allow postgres to perform a more efficient index scan instead of a sequential table scan. These two queries consistently show up in the longest-running ones in our instance, and are a major latency source for the usage page on the admin UI.

…meters (#17019) - Add model identifier to FLASH_IMAGE_PREVIEW_MODEL_IDENTIFIERS - Add imageSize parameter support (1K, 2K, 4K) with GeminiImageSize type - Add tests for imageSize parameter transformation - Update documentation with new model

[Feature] UI - Disable edit, delete, info, for dynamically generated spend tags

[Feature] UI - User Table Sort by All

[Feature] UI - Org Admin Team Permissions Fix

Fix videos lint errors

vercel · 2025-11-26T16:18:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Nov 26, 2025 4:18pm