Skip to main content
Alpha Notice: These docs cover the v1-alpha release. Content is incomplete and subject to change.For the latest stable version, see the v0 LangChain Python or LangChain JavaScript docs.
Middleware provides a way to more tightly control what happens inside the agent. The core agent loop involves calling a model, letting it choose tools to execute, and then finishing when it calls no more tools:
Core agent loop diagram
Middleware exposes hooks before and after each of those steps:
Middleware flow diagram

What can middleware do?

Monitor

Track agent behavior with logging, analytics, and debugging

Modify

Transform prompts, tool selection, and output formatting

Control

Add retries, fallbacks, and early termination logic

Enforce

Apply rate limits, guardrails, and PII detection
Add middleware by passing it to create_agent:
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware, HumanInTheLoopMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[SummarizationMiddleware(), HumanInTheLoopMiddleware()],
)

Built-in middleware

LangChain provides prebuilt middleware for common use cases:

Summarization

Automatically summarize conversation history when approaching token limits.
Perfect for:
  • Long-running conversations that exceed context windows
  • Multi-turn dialogues with extensive history
  • Applications where preserving full conversation context matters
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="openai:gpt-4o-mini",
            max_tokens_before_summary=4000,  # Trigger summarization at 4000 tokens
            messages_to_keep=20,  # Keep last 20 messages after summary
            summary_prompt="Custom prompt for summarization...",  # Optional
        ),
    ],
)
ParameterDescriptionDefault
modelModel for generating summariesRequired
max_tokens_before_summaryToken threshold for triggering summarization-
messages_to_keepRecent messages to preserve20
token_counterCustom token counting functionCharacter-based
summary_promptCustom prompt templateBuilt-in
summary_prefixPrefix for summary messages"## Previous conversation summary:"

Human-in-the-loop

Pause agent execution for human approval, editing, or rejection of tool calls before they execute.
Perfect for:
  • High-stakes operations requiring human approval (database writes, financial transactions)
  • Compliance workflows where human oversight is mandatory
  • Long running conversations where human feedback is used to guide the agent
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[read_email_tool, send_email_tool],
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                # Require approval, editing, or rejection for sending emails
                "send_email": {
                    "allowed_decisions": ["approve", "edit", "reject"],
                },
                # Auto-approve reading emails
                "read_email": False,
            }
        ),
    ],
)
ParameterDescriptionDefault
interrupt_onMapping of tool names to approval configs (True, False, or InterruptOnConfig)Required
description_prefixPrefix for action request descriptions"Tool execution requires approval"
InterruptOnConfig options:
  • allowed_decisions: List of allowed decisions ("approve", "edit", "reject")
  • description: Static string or callable for custom description
Important: Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.See the human-in-the-loop documentation for complete examples and integration patterns.

Anthropic prompt caching

Reduce costs by caching repetitive prompt prefixes with Anthropic models.
Perfect for:
  • Applications with long, repeated system prompts
  • Agents that reuse the same context across invocations
  • Reducing API costs for high-volume deployments
Learn more about Anthropic Prompt Caching strategies and limitations.
from langchain_anthropic import ChatAnthropic
from langchain.agents.middleware.prompt_caching import AnthropicPromptCachingMiddleware
from langchain.agents import create_agent

LONG_PROMPT = """
Please be a helpful assistant.

<Lots more context ...>
"""

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-latest"),
    system_prompt=LONG_PROMPT,
    middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],
)

# cache store
agent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})

# cache hit, system prompt is cached
agent.invoke({"messages": [HumanMessage("What's my name?")]})
ParameterDescriptionDefault
typeCache type (only "ephemeral" supported)"ephemeral"
ttlTime to live ("5m" or "1h")"5m"
min_messages_to_cacheMinimum messages before caching starts0
unsupported_model_behaviorBehavior for non-Anthropic models ("ignore", "warn", "raise")"warn"

Model call limit

Limit the number of model calls to prevent infinite loops or excessive costs.
Perfect for:
  • Preventing runaway agents from making too many API calls
  • Enforcing cost controls on production deployments
  • Testing agent behavior within specific call budgets
from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[
        ModelCallLimitMiddleware(
            thread_limit=10,  # Max 10 calls per thread (across runs)
            run_limit=5,  # Max 5 calls per run (single invocation)
            exit_behavior="end",  # Or "error" to raise exception
        ),
    ],
)
ParameterDescriptionDefault
thread_limitMax calls across all runs in threadNone (no limit)
run_limitMax calls per single invocationNone (no limit)
exit_behavior"end" (graceful) or "error" (exception)"end"

Tool call limit

Limit the number of tool calls to specific tools or all tools.
Perfect for:
  • Preventing excessive calls to expensive external APIs
  • Limiting web searches or database queries
  • Enforcing rate limits on specific tool usage
from langchain.agents import create_agent
from langchain.agents.middleware import ToolCallLimitMiddleware

# Limit all tool calls
global_limiter = ToolCallLimitMiddleware(thread_limit=20, run_limit=10)

# Limit specific tool
search_limiter = ToolCallLimitMiddleware(
    tool_name="search",
    thread_limit=5,
    run_limit=3,
)

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[global_limiter, search_limiter],
)
ParameterDescriptionDefault
tool_nameSpecific tool to limit (None = all tools)None
thread_limitMax calls across all runs in threadNone (no limit)
run_limitMax calls per single invocationNone (no limit)
exit_behavior"end" (graceful) or "error" (exception)"end"

Model fallback

Automatically fallback to alternative models when the primary model fails.
Perfect for:
  • Building resilient agents that handle model outages
  • Cost optimization by falling back to cheaper models
  • Provider redundancy across OpenAI, Anthropic, etc.
from langchain.agents import create_agent
from langchain.agents.middleware import ModelFallbackMiddleware

agent = create_agent(
    model="openai:gpt-4o",  # Primary model
    tools=[...],
    middleware=[
        ModelFallbackMiddleware(
            "openai:gpt-4o-mini",  # Try first on error
            "anthropic:claude-3-5-sonnet-20241022",  # Then this
        ),
    ],
)
ParameterDescriptionDefault
first_modelFirst fallback model (string or BaseChatModel instance)Required
*additional_modelsAdditional fallback models in order-

PII detection

Detect and handle Personally Identifiable Information in conversations.
Perfect for:
  • Healthcare and financial applications with compliance requirements
  • Customer service agents that need to sanitize logs
  • Any application handling sensitive user data
from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[
        # Redact emails in user input
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        # Mask credit cards (show last 4 digits)
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        # Custom PII type with regex
        PIIMiddleware(
            "api_key",
            detector=r"sk-[a-zA-Z0-9]{32}",
            strategy="block",  # Raise error if detected
        ),
    ],
)
ParameterDescriptionDefault
pii_typeType of PII to detect (built-in or custom)Required
strategyHow to handle detected PII ("block", "redact", "mask", "hash")"redact"
detectorCustom detector function or regex patternNone (uses built-in)
apply_to_inputCheck user messages before model callTrue
apply_to_outputCheck AI messages after model callFalse
apply_to_tool_resultsCheck tool result messages after executionFalse
Built-in PII types:
  • email - Email addresses
  • credit_card - Credit card numbers (Luhn validated)
  • ip - IP addresses
  • mac_address - MAC addresses
  • url - URLs
Strategies:
  • block - Raise exception when detected
  • redact - Replace with [REDACTED_TYPE]
  • mask - Partially mask (e.g., ****-****-****-1234)
  • hash - Replace with deterministic hash

Planning

Add todo list management capabilities for complex multi-step tasks.
This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.
from langchain.agents import create_agent
from langchain.agents.middleware import PlanningMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[PlanningMiddleware()],
)

result = agent.invoke({"messages": [HumanMessage("Help me refactor my codebase")]})
print(result["todos"])  # Array of todo items with status tracking
ParameterDescriptionDefault
system_promptCustom system prompt for guiding todo usageBuilt-in prompt
tool_descriptionCustom description for the write_todos toolBuilt-in description

LLM tool selector

Use an LLM to intelligently select relevant tools before calling the main model.
Perfect for:
  • Agents with many tools (10+) where most aren’t relevant per query
  • Reducing token usage by filtering irrelevant tools
  • Improving model focus and accuracy
from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolSelectorMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[tool1, tool2, tool3, tool4, tool5, ...],  # Many tools
    middleware=[
        LLMToolSelectorMiddleware(
            model="openai:gpt-4o-mini",  # Use cheaper model for selection
            max_tools=3,  # Limit to 3 most relevant tools
            always_include=["search"],  # Always include certain tools
        ),
    ],
)
ParameterDescriptionDefault
modelModel for tool selection (string or BaseChatModel instance)Uses agent’s main model
system_promptInstructions for the selection modelBuilt-in prompt
max_toolsMaximum number of tools to selectNone (no limit)
always_includeTool names to always includeNone

Context editing

Manage conversation context by trimming, summarizing, or clearing tool uses.
Perfect for:
  • Long conversations that need periodic context cleanup
  • Removing failed tool attempts from context
  • Custom context management strategies
from langchain.agents import create_agent
from langchain.agents.middleware import ContextEditingMiddleware, ClearToolUsesEdit

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(max_tokens=1000),  # Clear old tool uses
            ],
        ),
    ],
)
ParameterDescriptionDefault
editsList of ContextEdit strategies to apply[ClearToolUsesEdit()]
token_count_methodToken counting method ("approximate" or "model")"approximate"
ClearToolUsesEdit options:
  • trigger: Token count that triggers the edit (default: 100000)
  • clear_at_least: Minimum tokens to reclaim (default: 0)
  • keep: Number of recent tool results to preserve (default: 3)
  • clear_tool_inputs: Whether to clear tool call parameters (default: False)
  • exclude_tools: List of tool names to exclude from clearing (default: ())
  • placeholder: Placeholder text for cleared outputs (default: "[cleared]")

Custom middleware

Build custom middleware by implementing hooks that run at specific points in the agent execution flow. You can create middleware in two ways:
  1. Decorator-based - Quick and simple for single-hook middleware
  2. Class-based - More powerful for complex middleware with multiple hooks

Decorator-based middleware

For simple middleware that only needs a single hook, decorators provide the quickest way to add functionality:
from langchain.agents.middleware import before_model, after_model, wrap_model_call
from langchain.agents.middleware import AgentState, ModelRequest, ModelResponse
from langgraph.runtime import Runtime
from typing import Any, Callable

# Node-style: logging before model calls
@before_model
def log_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"About to call model with {len(state['messages'])} messages")
    return None

# Node-style: validation after model calls
@after_model(can_jump_to=["end"])
def validate_output(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    last_message = state["messages"][-1]
    if "BLOCKED" in last_message.content:
        return {
            "messages": [AIMessage("I cannot respond to that request.")],
            "jump_to": "end"
        }
    return None

# Wrap-style: retry logic
@wrap_model_call
def retry_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception as e:
            if attempt == 2:
                raise
            print(f"Retry {attempt + 1}/3 after error: {e}")

# Wrap-style: dynamic prompts
@dynamic_prompt
def personalized_prompt(request: ModelRequest) -> str:
    user_id = request.runtime.context.get("user_id", "guest")
    return f"You are a helpful assistant for user {user_id}. Be concise and friendly."

# Use decorators in agent
agent = create_agent(
    model="openai:gpt-4o",
    middleware=[log_before_model, validate_output, retry_model, personalized_prompt],
    tools=[...],
)

Available decorators

Node-style (run at specific execution points):
  • @before_agent - Before agent starts (once per invocation)
  • @before_model - Before each model call
  • @after_model - After each model response
  • @after_agent - After agent completes (once per invocation)
Wrap-style (intercept and control execution):
  • @wrap_model_call - Around each model call
  • @wrap_tool_call - Around each tool call
Convenience decorators:
  • @dynamic_prompt - Generates dynamic system prompts (equivalent to @wrap_model_call that modifies the prompt)

When to use decorators

Use decorators when

  • You need a single hook
  • No complex configuration

Use classes when

  • Multiple hooks needed
  • Complex configuration
  • Reusable across projects (config on init)

Class-based middleware

Two hook styles

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates.

Wrap-style hooks

Intercept execution with full control over handler calls. Use for retries, caching, and transformation.

Node-style hooks

Run at specific points in the execution flow:
  • before_agent - Before agent starts (once per invocation)
  • before_model - Before each model call
  • after_model - After each model response
  • after_agent - After agent completes (up to once per invocation)
Example: Logging middleware
from langchain.agents.middleware import AgentMiddleware, AgentState
from langgraph.runtime import Runtime
from typing import Any

class LoggingMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"About to call model with {len(state['messages'])} messages")
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"Model returned: {state['messages'][-1].content}")
        return None
Example: Conversation length limit
from langchain.agents.middleware import AgentMiddleware, AgentState
from langchain.messages import AIMessage
from langgraph.runtime import Runtime
from typing import Any

class MessageLimitMiddleware(AgentMiddleware):
    def __init__(self, max_messages: int = 50):
        super().__init__()
        self.max_messages = max_messages

    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        if len(state["messages"]) == self.max_messages:
            return {
                "messages": [AIMessage("Conversation limit reached.")],
                "jump_to": "end"
            }
        return None

Wrap-style hooks

Intercept execution and control when the handler is called:
  • wrap_model_call - Around each model call
  • wrap_tool_call - Around each tool call
You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Example: Model retry middleware
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from typing import Callable

class RetryMiddleware(AgentMiddleware):
    def __init__(self, max_retries: int = 3):
        super().__init__()
        self.max_retries = max_retries

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        for attempt in range(self.max_retries):
            try:
                return handler(request)
            except Exception as e:
                if attempt == self.max_retries - 1:
                    raise
                print(f"Retry {attempt + 1}/{self.max_retries} after error: {e}")
Example: Dynamic model selection
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from langchain.chat_models import init_chat_model
from typing import Callable

class DynamicModelMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        # Use different model based on conversation length
        if len(request.messages) > 10:
            request.model = init_chat_model("openai:gpt-4o")
        else:
            request.model = init_chat_model("openai:gpt-4o-mini")

        return handler(request)
Example: Tool call monitoring
from langchain.tools.tool_node import ToolCallRequest
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import ToolMessage
from langgraph.types import Command
from typing import Callable

class ToolMonitoringMiddleware(AgentMiddleware):
    def wrap_tool_call(
        self,
        request: ToolCallRequest,
        handler: Callable[[ToolCallRequest], ToolMessage | Command],
    ) -> ToolMessage | Command:
        print(f"Executing tool: {request.tool_call['name']}")
        print(f"Arguments: {request.tool_call['args']}")

        try:
            result = handler(request)
            print(f"Tool completed successfully")
            return result
        except Exception as e:
            print(f"Tool failed: {e}")
            raise

Custom state schema

Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as the state_schema:
from langchain.agents.middleware import AgentState, AgentMiddleware
from typing_extensions import NotRequired
from typing import Any

class CustomState(AgentState):
    model_call_count: NotRequired[int]
    user_id: NotRequired[str]

class CallCounterMiddleware(AgentMiddleware[CustomState]):
    state_schema = CustomState

    def before_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Access custom state properties
        count = state.get("model_call_count", 0)

        if count > 10:
            return {"jump_to": "end"}

        return None

    def after_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Update custom state
        return {"model_call_count": state.get("model_call_count", 0) + 1}
agent = create_agent(
    model="openai:gpt-4o",
    middleware=[CallCounterMiddleware()],
    tools=[...],
)

# Invoke with custom state
result = agent.invoke({
    "messages": [HumanMessage("Hello")],
    "model_call_count": 0,
    "user_id": "user-123",
})

Execution order

When using multiple middleware, understanding execution order is important:
agent = create_agent(
    model="openai:gpt-4o",
    middleware=[middleware1, middleware2, middleware3],
    tools=[...],
)
Before hooks run in order:
  1. middleware1.before_agent()
  2. middleware2.before_agent()
  3. middleware3.before_agent()
Agent loop starts
  1. middleware1.before_model()
  2. middleware2.before_model()
  3. middleware3.before_model()
Wrap hooks nest like function calls:
  1. middleware1.wrap_model_call()middleware2.wrap_model_call()middleware3.wrap_model_call() → model
After hooks run in reverse order:
  1. middleware3.after_model()
  2. middleware2.after_model()
  3. middleware1.after_model()
Agent loop ends
  1. middleware3.after_agent()
  2. middleware2.after_agent()
  3. middleware1.after_agent()
Key rules:
  • before_* hooks: First to last
  • after_* hooks: Last to first (reverse)
  • wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to:
class EarlyExitMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        # Check some condition
        if should_exit(state):
            return {
                "messages": [AIMessage("Exiting early due to condition.")],
                "jump_to": "end"
            }
        return None
Available jump targets:
  • "end": Jump to the end of the agent execution
  • "tools": Jump to the tools node
  • "model": Jump to the model node (or the first before_model hook)
Important: When jumping from before_model or after_model, jumping to "model" will cause all before_model middleware to run again. To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):
from langchain.agents.middleware import AgentMiddleware, hook_config
from typing import Any

class ConditionalMiddleware(AgentMiddleware):
    @hook_config(can_jump_to=["end", "tools"])
    def after_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        if some_condition(state):
            return {"jump_to": "end"}
        return None

Best practices

  1. Keep middleware focused - Each middleware should do one thing well
  2. Handle errors gracefully - Don’t let middleware errors crash the agent
  3. Use appropriate hook types:
    • Node-style for sequential logic (logging, validation)
    • Wrap-style for control flow (retry, fallback, caching)
  4. Document state requirements - Clearly document any custom state properties
  5. Test middleware independently - Unit test middleware before integrating
  6. Consider execution order - Place critical middleware first in the list
  7. Use built-in middleware when possible - Don’t reinvent the wheel

Examples

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy.
Benefits:
  • Shorter prompts - Reduce complexity by exposing only relevant tools
  • Better accuracy - Models choose correctly from fewer options
  • Permission control - Dynamically filter tools based on user access
from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware, ModelRequest
from typing import Callable

class ToolSelectorMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Middleware to select relevant tools based on state/context."""
        # Select a small, relevant subset of tools based on state/context
        relevant_tools = select_relevant_tools(request.state, request.runtime)
        request.tools = relevant_tools
        return handler(request)

agent = create_agent(
    model="openai:gpt-4o",
    tools=all_tools,  # All available tools need to be registered upfront
    # Middleware can be used to select a smaller subset that's relevant for the given run.
    middleware=[ToolSelectorMiddleware()],
)

I