Alpha Notice: These docs cover the v1-alpha release. Content is incomplete and subject to change.For the latest stable version, see the v0 LangChain Python or LangChain JavaScript docs.


What can middleware do?
Monitor
Track agent behavior with logging, analytics, and debugging
Modify
Transform prompts, tool selection, and output formatting
Control
Add retries, fallbacks, and early termination logic
Enforce
Apply rate limits, guardrails, and PII detection
create_agent
:
Built-in middleware
LangChain provides prebuilt middleware for common use cases:Summarization
Automatically summarize conversation history when approaching token limits.Perfect for:
- Long-running conversations that exceed context windows
- Multi-turn dialogues with extensive history
- Applications where preserving full conversation context matters
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
model | Model for generating summaries | Required |
max_tokens_before_summary | Token threshold for triggering summarization | - |
messages_to_keep | Recent messages to preserve | 20 |
token_counter | Custom token counting function | Character-based |
summary_prompt | Custom prompt template | Built-in |
summary_prefix | Prefix for summary messages | "## Previous conversation summary:" |
Human-in-the-loop
Pause agent execution for human approval, editing, or rejection of tool calls before they execute.Perfect for:
- High-stakes operations requiring human approval (database writes, financial transactions)
- Compliance workflows where human oversight is mandatory
- Long running conversations where human feedback is used to guide the agent
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
interrupt_on | Mapping of tool names to approval configs (True , False , or InterruptOnConfig ) | Required |
description_prefix | Prefix for action request descriptions | "Tool execution requires approval" |
InterruptOnConfig
options:allowed_decisions
: List of allowed decisions ("approve"
,"edit"
,"reject"
)description
: Static string or callable for custom description
Important: Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.See the human-in-the-loop documentation for complete examples and integration patterns.
Anthropic prompt caching
Reduce costs by caching repetitive prompt prefixes with Anthropic models.Perfect for:
- Applications with long, repeated system prompts
- Agents that reuse the same context across invocations
- Reducing API costs for high-volume deployments
Learn more about Anthropic Prompt Caching strategies and limitations.
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
type | Cache type (only "ephemeral" supported) | "ephemeral" |
ttl | Time to live ("5m" or "1h" ) | "5m" |
min_messages_to_cache | Minimum messages before caching starts | 0 |
unsupported_model_behavior | Behavior for non-Anthropic models ("ignore" , "warn" , "raise" ) | "warn" |
Model call limit
Limit the number of model calls to prevent infinite loops or excessive costs.Perfect for:
- Preventing runaway agents from making too many API calls
- Enforcing cost controls on production deployments
- Testing agent behavior within specific call budgets
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
thread_limit | Max calls across all runs in thread | None (no limit) |
run_limit | Max calls per single invocation | None (no limit) |
exit_behavior | "end" (graceful) or "error" (exception) | "end" |
Tool call limit
Limit the number of tool calls to specific tools or all tools.Perfect for:
- Preventing excessive calls to expensive external APIs
- Limiting web searches or database queries
- Enforcing rate limits on specific tool usage
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
tool_name | Specific tool to limit (None = all tools) | None |
thread_limit | Max calls across all runs in thread | None (no limit) |
run_limit | Max calls per single invocation | None (no limit) |
exit_behavior | "end" (graceful) or "error" (exception) | "end" |
Model fallback
Automatically fallback to alternative models when the primary model fails.Perfect for:
- Building resilient agents that handle model outages
- Cost optimization by falling back to cheaper models
- Provider redundancy across OpenAI, Anthropic, etc.
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
first_model | First fallback model (string or BaseChatModel instance) | Required |
*additional_models | Additional fallback models in order | - |
PII detection
Detect and handle Personally Identifiable Information in conversations.Perfect for:
- Healthcare and financial applications with compliance requirements
- Customer service agents that need to sanitize logs
- Any application handling sensitive user data
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
pii_type | Type of PII to detect (built-in or custom) | Required |
strategy | How to handle detected PII ("block" , "redact" , "mask" , "hash" ) | "redact" |
detector | Custom detector function or regex pattern | None (uses built-in) |
apply_to_input | Check user messages before model call | True |
apply_to_output | Check AI messages after model call | False |
apply_to_tool_results | Check tool result messages after execution | False |
email
- Email addressescredit_card
- Credit card numbers (Luhn validated)ip
- IP addressesmac_address
- MAC addressesurl
- URLs
block
- Raise exception when detectedredact
- Replace with[REDACTED_TYPE]
mask
- Partially mask (e.g.,****-****-****-1234
)hash
- Replace with deterministic hash
Planning
Add todo list management capabilities for complex multi-step tasks.This middleware automatically provides agents with a
write_todos
tool and system prompts to guide effective task planning.Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
system_prompt | Custom system prompt for guiding todo usage | Built-in prompt |
tool_description | Custom description for the write_todos tool | Built-in description |
LLM tool selector
Use an LLM to intelligently select relevant tools before calling the main model.Perfect for:
- Agents with many tools (10+) where most aren’t relevant per query
- Reducing token usage by filtering irrelevant tools
- Improving model focus and accuracy
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
model | Model for tool selection (string or BaseChatModel instance) | Uses agent’s main model |
system_prompt | Instructions for the selection model | Built-in prompt |
max_tools | Maximum number of tools to select | None (no limit) |
always_include | Tool names to always include | None |
Context editing
Manage conversation context by trimming, summarizing, or clearing tool uses.Perfect for:
- Long conversations that need periodic context cleanup
- Removing failed tool attempts from context
- Custom context management strategies
Configuration options
Configuration options
Parameter | Description | Default |
---|---|---|
edits | List of ContextEdit strategies to apply | [ClearToolUsesEdit()] |
token_count_method | Token counting method ("approximate" or "model" ) | "approximate" |
ClearToolUsesEdit
options:trigger
: Token count that triggers the edit (default:100000
)clear_at_least
: Minimum tokens to reclaim (default:0
)keep
: Number of recent tool results to preserve (default:3
)clear_tool_inputs
: Whether to clear tool call parameters (default:False
)exclude_tools
: List of tool names to exclude from clearing (default:()
)placeholder
: Placeholder text for cleared outputs (default:"[cleared]"
)
Custom middleware
Build custom middleware by implementing hooks that run at specific points in the agent execution flow. You can create middleware in two ways:- Decorator-based - Quick and simple for single-hook middleware
- Class-based - More powerful for complex middleware with multiple hooks
Decorator-based middleware
For simple middleware that only needs a single hook, decorators provide the quickest way to add functionality:Available decorators
Node-style (run at specific execution points):@before_agent
- Before agent starts (once per invocation)@before_model
- Before each model call@after_model
- After each model response@after_agent
- After agent completes (once per invocation)
@wrap_model_call
- Around each model call@wrap_tool_call
- Around each tool call
@dynamic_prompt
- Generates dynamic system prompts (equivalent to@wrap_model_call
that modifies the prompt)
When to use decorators
Use decorators when
- You need a single hook
- No complex configuration
Use classes when
- Multiple hooks needed
- Complex configuration
- Reusable across projects (config on init)
Class-based middleware
Two hook styles
Node-style hooks
Run sequentially at specific execution points. Use for logging, validation, and state updates.
Wrap-style hooks
Intercept execution with full control over handler calls. Use for retries, caching, and transformation.
Node-style hooks
Run at specific points in the execution flow:before_agent
- Before agent starts (once per invocation)before_model
- Before each model callafter_model
- After each model responseafter_agent
- After agent completes (up to once per invocation)
Wrap-style hooks
Intercept execution and control when the handler is called:wrap_model_call
- Around each model callwrap_tool_call
- Around each tool call
Custom state schema
Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as thestate_schema
:
Execution order
When using multiple middleware, understanding execution order is important:Execution flow (click to expand)
Execution flow (click to expand)
Before hooks run in order:
middleware1.before_agent()
middleware2.before_agent()
middleware3.before_agent()
middleware1.before_model()
middleware2.before_model()
middleware3.before_model()
middleware1.wrap_model_call()
→middleware2.wrap_model_call()
→middleware3.wrap_model_call()
→ model
middleware3.after_model()
middleware2.after_model()
middleware1.after_model()
middleware3.after_agent()
middleware2.after_agent()
middleware1.after_agent()
before_*
hooks: First to lastafter_*
hooks: Last to first (reverse)wrap_*
hooks: Nested (first middleware wraps all others)
Agent jumps
To exit early from middleware, return a dictionary withjump_to
:
"end"
: Jump to the end of the agent execution"tools"
: Jump to the tools node"model"
: Jump to the model node (or the firstbefore_model
hook)
before_model
or after_model
, jumping to "model"
will cause all before_model
middleware to run again.
To enable jumping, decorate your hook with @hook_config(can_jump_to=[...])
:
Best practices
- Keep middleware focused - Each middleware should do one thing well
- Handle errors gracefully - Don’t let middleware errors crash the agent
- Use appropriate hook types:
- Node-style for sequential logic (logging, validation)
- Wrap-style for control flow (retry, fallback, caching)
- Document state requirements - Clearly document any custom state properties
- Test middleware independently - Unit test middleware before integrating
- Consider execution order - Place critical middleware first in the list
- Use built-in middleware when possible - Don’t reinvent the wheel
Examples
Dynamically selecting tools
Select relevant tools at runtime to improve performance and accuracy.Benefits:
- Shorter prompts - Reduce complexity by exposing only relevant tools
- Better accuracy - Models choose correctly from fewer options
- Permission control - Dynamically filter tools based on user access