-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangGraph.js documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangGraph.js rather than my code.
- The bug is not resolved by updating to the latest stable version of LangGraph (or the specific integration package).
Example Code
import { z } from "zod";
import { ChatOpenAI } from "@langchain/openai";
import { createAgent, summarizationMiddleware } from "langchain";
import { MemorySaver, MessagesZodState } from "@langchain/langgraph";
// --- Main agent model (intentional streaming for UI) ---
const mainModel = new ChatOpenAI({
model: "gpt-4o-mini",
streaming: true,
temperature: 0,
});
// --- Internal summarization model (should NOT stream) ---
const summaryModel = new ChatOpenAI({
model: "gpt-4o-mini",
streaming: false, // <-- disabled
callbacks: [], // <-- none
temperature: 0,
});
// --- Summarization middleware with low thresholds so it triggers instantly ---
const summarizeMiddleware = summarizationMiddleware({
model: summaryModel, // <-- internal model
trigger: { tokens: 200, messages: 2 },
keep: { messages: 1 },
trimTokensToSummarize: 128,
summaryPrefix: "Here is a summary of the conversation to date:",
});
// --- Minimal agent state ---
const agentState = z.object({
messages: MessagesZodState.shape.messages,
});
// --- Agent with only summarization middleware ---
const agent = createAgent({
model: mainModel,
tools: [],
middleware: [summarizeMiddleware],
stateSchema: agentState,
checkpointer: new MemorySaver(),
});
// --- Helper: produce enough text to trigger summarization ---
function longPrompt() {
const base =
"Explain in detail how to build a production-grade Node.js REST API with authentication, " +
"queues, monitoring, error handling, and scaling considerations.";
return Array(10).fill(base).join("\n\n"); // trigger summarization threshold
}
async function main() {
console.log("Starting stream… Watch for a leaked summary BEFORE final answer.\n");
const stream = await agent.stream(
{
messages: [{ role: "user", content: longPrompt() }],
},
{
streamMode: ["messages"], // <-- required to observe the bug
configurable: { thread_id: "summarization-repro" },
}
);
for await (const [mode, chunk] of stream) {
if (mode !== "messages") continue;
const [msg] = chunk;
const role = (msg as any).role ?? "unknown";
const name = (msg as any).name ?? "";
const content =
typeof msg.content === "string"
? msg.content
: JSON.stringify(msg.content);
console.log(`\n[STREAMED MESSAGE] role=${role} name=${name}`);
console.log(content.slice(0, 200) + (content.length > 200 ? "..." : ""));
// Bug behavior:
//
// 1. You will see a "summary" message appear FIRST
// (coming from model.invoke inside summarizationMiddleware)
//
// 2. Then you will see the real agent reply
//
// The summary message should NEVER appear because it is an internal
// compression step and summaryModel.streaming = false.
}
console.log("\nStream complete.");
}
main().catch((err) => console.error("Repro error:", err));Error Message and Stack Trace (if applicable)
There is no thrown exception.
The bug is incorrect behavior: unwanted streamed assistant output.
Description
What I am doing
Using the official JS summarizationMiddleware with an agent that streams events (stream() or stream_mode="messages").
What I expect
Internal summarization LLM calls should behave like LLM calls inside tools:
- completely isolated
- not streamed
- invisible to the user
- not producing assistant messages
What happens instead
The internal line:
await model.invoke(formattedPrompt);inside the middleware is always streamed to the UI as if it were a normal assistant response.
This causes:
- an extra assistant message to appear
- summary tokens showing up before the real response
- impossible to hide the summary model call
- impossible to fix with custom filtering
Key insight
Even if I remove:
return { messages: [...] }from beforeModel, the streamed events continue because:
❗ The leak does NOT come from the middleware return value.
❗ It comes directly from the internal model.invoke() call.
LangGraph intercepts all LLM invocations inside the main execution context, which is where middleware runs.
Tools run in isolated contexts — middleware does not.
Why this is a problem
Summarization is supposed to be an internal housekeeping step.
Users should never see:
- model starts
- streamed tokens
- model ends
from that step.
Currently, JS summarization middleware behaves like a visible extra assistant turn.
Workaround
Only one thing works:
await fetch("https://api.openai.com/v1/chat/completions", ...)This bypasses LangChain entirely and produces zero streamed events.
But this breaks consistency and prevents using LCEL or BaseChatModel.
System Info
Platform: Linux
Package Manager: - Package Manager: npm (10.9.2)
- langchain-js: 1.0.4
- langgraph-js: 1.0.1
- Node: 22.17.1

