JS Summarization Middleware: internal model.invoke() is streamed to UI (cannot be suppressed, unlike tools)

### Checked other resources

- [x] I added a very descriptive title to this issue.
- [x] I searched the LangGraph.js documentation with the integrated search.
- [x] I used the GitHub search to find a similar question and didn't find it.
- [x] I am sure that this is a bug in LangGraph.js rather than my code.
- [x] The bug is not resolved by updating to the latest stable version of LangGraph (or the specific integration package).

### Example Code


```typescript
import { z } from "zod";
import { ChatOpenAI } from "@langchain/openai";
import { createAgent, summarizationMiddleware } from "langchain";
import { MemorySaver, MessagesZodState } from "@langchain/langgraph";

// --- Main agent model (intentional streaming for UI) ---
const mainModel = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: true,
  temperature: 0,
});

// --- Internal summarization model (should NOT stream) ---
const summaryModel = new ChatOpenAI({
  model: "gpt-4o-mini",
  streaming: false, // <-- disabled
  callbacks: [],    // <-- none
  temperature: 0,
});

// --- Summarization middleware with low thresholds so it triggers instantly ---
const summarizeMiddleware = summarizationMiddleware({
  model: summaryModel,        // <-- internal model
  trigger: { tokens: 200, messages: 2 },
  keep: { messages: 1 },
  trimTokensToSummarize: 128,
  summaryPrefix: "Here is a summary of the conversation to date:",
});

// --- Minimal agent state ---
const agentState = z.object({
  messages: MessagesZodState.shape.messages,
});

// --- Agent with only summarization middleware ---
const agent = createAgent({
  model: mainModel,
  tools: [],
  middleware: [summarizeMiddleware],
  stateSchema: agentState,
  checkpointer: new MemorySaver(),
});

// --- Helper: produce enough text to trigger summarization ---
function longPrompt() {
  const base =
    "Explain in detail how to build a production-grade Node.js REST API with authentication, " +
    "queues, monitoring, error handling, and scaling considerations.";
  return Array(10).fill(base).join("\n\n"); // trigger summarization threshold
}

async function main() {
  console.log("Starting stream… Watch for a leaked summary BEFORE final answer.\n");

  const stream = await agent.stream(
    {
      messages: [{ role: "user", content: longPrompt() }],
    },
    {
      streamMode: ["messages"], // <-- required to observe the bug
      configurable: { thread_id: "summarization-repro" },
    }
  );

  for await (const [mode, chunk] of stream) {
    if (mode !== "messages") continue;

    const [msg] = chunk;
    const role = (msg as any).role ?? "unknown";
    const name = (msg as any).name ?? "";
    const content =
      typeof msg.content === "string"
        ? msg.content
        : JSON.stringify(msg.content);

    console.log(`\n[STREAMED MESSAGE] role=${role} name=${name}`);
    console.log(content.slice(0, 200) + (content.length > 200 ? "..." : ""));

    // Bug behavior:
    //
    // 1. You will see a "summary" message appear FIRST
    //    (coming from model.invoke inside summarizationMiddleware)
    //
    // 2. Then you will see the real agent reply
    //
    // The summary message should NEVER appear because it is an internal
    // compression step and summaryModel.streaming = false.
  }

  console.log("\nStream complete.");
}

main().catch((err) => console.error("Repro error:", err));

```

### Error Message and Stack Trace (if applicable)

There is no thrown exception.
The bug is incorrect behavior: **unwanted streamed assistant output**.

**Before**
<img width="626" height="394" alt="Image" src="https://github.com/user-attachments/assets/9b0a612e-7f8e-48bb-b062-8d142a6752b7" />

**After I fixed it**
<img width="632" height="412" alt="Image" src="https://github.com/user-attachments/assets/457cfedd-7cb1-4e36-b2b0-1f4363a3bff4" />

### Description

### **What I am doing**

Using the official JS `summarizationMiddleware` with an agent that streams events (`stream()` or `stream_mode="messages"`).

### **What I expect**

Internal summarization LLM calls should behave like LLM calls inside tools:

* completely isolated
* not streamed
* invisible to the user
* not producing assistant messages

### **What happens instead**

The internal line:

```ts
await model.invoke(formattedPrompt);
```

inside the middleware is **always streamed to the UI** as if it were a normal assistant response.

This causes:

* an **extra assistant message** to appear
* summary tokens showing up before the real response
* impossible to hide the summary model call
* impossible to fix with custom filtering

### **Key insight**

Even if I remove:

```ts
return { messages: [...] }
```

from `beforeModel`, the streamed events continue because:

❗ **The leak does NOT come from the middleware return value.**
❗ **It comes directly from the internal model.invoke() call.**

LangGraph intercepts all LLM invocations inside the main execution context, which is where middleware runs.

Tools run in isolated contexts — middleware does not.

### **Why this is a problem**

Summarization is supposed to be an internal housekeeping step.
Users should never see:

* model starts
* streamed tokens
* model ends

from that step.

Currently, JS summarization middleware behaves like a visible extra assistant turn.

### **Workaround**

Only one thing works:

```ts
await fetch("https://api.openai.com/v1/chat/completions", ...)
```

This bypasses LangChain entirely and produces zero streamed events.

But this breaks consistency and prevents using LCEL or BaseChatModel.



### System Info


Platform: Linux 
Package Manager: - **Package Manager**: npm (10.9.2)

- **langchain-js**: 1.0.4
- **langgraph-js**: 1.0.1
- **Node**: 22.17.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JS Summarization Middleware: internal model.invoke() is streamed to UI (cannot be suppressed, unlike tools) #9455

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

What I am doing

What I expect

What happens instead

Key insight

Why this is a problem

Workaround

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JS Summarization Middleware: internal model.invoke() is streamed to UI (cannot be suppressed, unlike tools) #9455

Description

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

What I am doing

What I expect

What happens instead

Key insight

Why this is a problem

Workaround

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions