Add automatic compaction of historical messages for agents #338

bhouston · 2025-03-21T19:11:39Z

Problem

When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded.

Proposed Solution

Implement automatic compaction of historical messages to prevent context window overflow:

Enhance the LLM abstraction to track and return:
- Total tokens used in the last completion request
- Maximum allowed tokens for the model/provider
Monitor token usage and trigger compaction when it approaches a configurable threshold (e.g., 50% of maximum)
When triggered, compact older messages (excluding recent ones, perhaps 10 messages back) into a single summarized message
Use a prompt like: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next."

Benefits

Prevents context window overflow errors
Maintains important context for agent operation
Enables longer-running agent sessions
Makes the system more robust for complex tasks

Questions

Should the compaction threshold be configurable?
How many recent messages should be preserved from compaction?
Should we implement different compaction strategies for different agent types?

bhouston · 2025-03-21T19:13:10Z

Implementation Plan for Message Compaction

After analyzing the codebase, here's a detailed plan for implementing automatic compaction of historical messages for agents:

1. Enhance LLM Abstraction to Track Token Limits

A. Modify `LLMResponse` in `packages/agent/src/core/llm/types.ts`:

export interface LLMResponse {
  text: string;
  toolCalls: ToolCall[];
  tokenUsage: TokenUsage;
  // Add new fields
  totalTokens: number;  // Total tokens used in this request
  maxTokens: number;    // Maximum allowed tokens for this model
}

B. Update Provider Implementations:

Update each provider (Anthropic, OpenAI, etc.) to return these additional metrics
For each provider, determine how to get the model's maximum token limit
Example for Anthropic in packages/agent/src/core/llm/providers/anthropic.ts:

// Add a map of model context window sizes
const ANTHROPIC_MODEL_LIMITS = {
  'claude-3-opus-20240229': 200000,
  'claude-3-sonnet-20240229': 200000,
  'claude-3-haiku-20240307': 200000,
  'claude-3-7-sonnet-20250219': 200000,
  // Add other models
};

// Update tokenUsageFromMessage function
function tokenUsageFromMessage(message: Anthropic.Message, model: string) {
  const usage = new TokenUsage();
  usage.input = message.usage.input_tokens;
  usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
  usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
  usage.output = message.usage.output_tokens;
  
  return {
    usage,
    totalTokens: usage.input + usage.output,
    maxTokens: ANTHROPIC_MODEL_LIMITS[model] || 100000, // Default fallback
  };
}

// Update generateText method
async generateText(options: GenerateOptions): Promise<LLMResponse> {
  // ...existing code...
  
  return {
    text: content,
    toolCalls: toolCalls,
    tokenUsage: tokenUsageFromResponse.usage,
    totalTokens: tokenUsageFromResponse.totalTokens,
    maxTokens: tokenUsageFromResponse.maxTokens,
  };
}

2. Add Compaction Configuration to AgentConfig

A. Update `AgentConfig` in `packages/agent/src/core/toolAgent/config.ts`:

export type AgentConfig = {
  maxIterations: number;
  getSystemPrompt: (toolContext: ToolContext) => string;
  // Add message compaction configuration
  messageCompaction: {
    enabled: boolean;
    thresholdPercentage: number; // e.g., 50 (%) 
    preserveRecentMessages: number; // e.g., 10 messages
    compactionPrompt: string; // Prompt for summarizing messages
  };
};

B. Update `DEFAULT_CONFIG` with reasonable defaults:

export const DEFAULT_CONFIG: AgentConfig = {
  maxIterations: 200,
  getSystemPrompt: getDefaultSystemPrompt,
  messageCompaction: {
    enabled: true,
    thresholdPercentage: 50,
    preserveRecentMessages: 10,
    compactionPrompt: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next.",
  },
};

3. Implement Message Compaction Logic

A. Create a new file `packages/agent/src/core/toolAgent/messageCompaction.ts`:

import { Message } from '../llm/types.js';
import { generateText } from '../llm/core.js';
import { LLMProvider } from '../llm/provider.js';
import { TokenTracker } from '../tokens.js';

/**
 * Determines if message compaction is needed based on current usage
 */
export function shouldCompactMessages(
  totalTokens: number,
  maxTokens: number,
  thresholdPercentage: number
): boolean {
  return totalTokens >= (maxTokens * thresholdPercentage) / 100;
}

/**
 * Compacts messages by summarizing older messages
 */
export async function compactMessages(
  messages: Message[],
  provider: LLMProvider,
  preserveRecentMessages: number,
  compactionPrompt: string,
  tokenTracker: TokenTracker
): Promise<Message[]> {
  if (messages.length <= preserveRecentMessages) {
    return messages; // Not enough messages to compact
  }

  // Split messages into those to compact and those to preserve
  const messagesToCompact = messages.slice(0, messages.length - preserveRecentMessages);
  const messagesToPreserve = messages.slice(messages.length - preserveRecentMessages);
  
  // Create a system message with instructions for summarization
  const systemMessage: Message = {
    role: 'system',
    content: 'You are an AI assistant tasked with summarizing a conversation. Provide a concise but informative summary that captures the key points, decisions, and context needed to continue the conversation effectively.',
  };
  
  // Create a user message with the compaction prompt
  const userMessage: Message = {
    role: 'user',
    content: `${compactionPrompt}\n\nHere's the conversation to summarize:\n${messagesToCompact.map(m => `${m.role}: ${m.content}`).join('\n')}`,
  };
  
  // Generate the summary
  const { text, tokenUsage } = await generateText(provider, {
    messages: [systemMessage, userMessage],
    temperature: 0.3, // Lower temperature for more consistent summaries
  });
  
  // Add token usage to tracker
  tokenTracker.tokenUsage.add(tokenUsage);
  
  // Create a new message with the summary
  const summaryMessage: Message = {
    role: 'system',
    content: `[COMPACTED MESSAGE HISTORY]: ${text}`,
  };
  
  // Return the compacted messages (summary + recent messages)
  return [summaryMessage, ...messagesToPreserve];
}

B. Modify `toolAgentCore.ts` to integrate message compaction:

Update the main loop in toolAgent function to check if compaction is needed before each LLM call:

// Import the new functions
import { shouldCompactMessages, compactMessages } from './messageCompaction.js';

// In the toolAgent function:
for (let i = 0; i < config.maxIterations; i++) {
  // ...existing code...

  // Check if message compaction is needed
  if (
    config.messageCompaction.enabled &&
    i > 0 && // Don't compact on the first iteration
    lastResponseTotalTokens && 
    lastResponseMaxTokens
  ) {
    const shouldCompact = shouldCompactMessages(
      lastResponseTotalTokens,
      lastResponseMaxTokens,
      config.messageCompaction.thresholdPercentage
    );
    
    if (shouldCompact) {
      logger.info('Compacting message history to reduce token usage');
      messages = await compactMessages(
        messages,
        provider,
        config.messageCompaction.preserveRecentMessages,
        config.messageCompaction.compactionPrompt,
        tokenTracker
      );
      logger.info(`Message history compacted: ${messages.length} messages remaining`);
    }
  }

  // Generate text using our LLM abstraction
  const generateOptions = {
    messages: messagesWithSystem,
    functions: functionDefinitions,
    temperature: localContext.temperature,
    maxTokens: localContext.maxTokens,
  };

  const { text, toolCalls, tokenUsage, totalTokens, maxTokens } = await generateText(
    provider,
    generateOptions,
  );
  
  // Store token information for next iteration
  lastResponseTotalTokens = totalTokens;
  lastResponseMaxTokens = maxTokens;
  
  // ...rest of existing code...
}

4. Test Cases

Create tests to verify the message compaction functionality:

Unit tests for the compaction threshold calculation
Unit tests for the message compaction logic
Integration tests to ensure the agent works correctly with compaction enabled
Performance tests to verify token usage reduction

5. Documentation Updates

Update the README.md to mention the new message compaction feature
Add documentation about the configuration options for message compaction
Add examples of how to customize the compaction behavior

Implementation Phases

Phase 1: Enhance the LLM abstraction to track token limits
Phase 2: Implement the message compaction logic
Phase 3: Integrate compaction into the toolAgentCore
Phase 4: Add configuration options
Phase 5: Testing and documentation

Questions for Discussion

Should we implement different compaction strategies for different types of agents?
Should we provide a way to disable compaction for specific use cases?
Should we consider more sophisticated compaction strategies (e.g., semantic clustering)?
How should we handle tool messages during compaction?

bhouston · 2025-03-21T19:19:10Z

Revised Implementation Plan: Agent Self-Managed Message Compaction

Based on feedback, we're revising the approach to give the agent more self-awareness and control over its context window usage. Instead of automatic compaction, we'll:

Implement a status update mechanism to inform the agent about resource usage
Create a compaction tool that the agent can call when needed
Make the system more transparent about available resources

1. Enhance LLM Abstraction to Track Token Limits

Same as previous plan, we need to enhance the LLM abstraction to track and return:

Total tokens used in the current completion request
Maximum allowed tokens for the model/provider

2. Create a Status Update Mechanism

A. Create a new file `packages/agent/src/core/toolAgent/statusUpdates.ts`:

import { Message } from '../llm/types.js';
import { TokenTracker } from '../tokens.js';
import { ToolContext } from '../types.js';

/**
 * Generate a status update message for the agent
 */
export function generateStatusUpdate(
  totalTokens: number,
  maxTokens: number,
  tokenTracker: TokenTracker,
  context: ToolContext
): Message {
  // Calculate token usage percentage
  const usagePercentage = Math.round((totalTokens / maxTokens) * 100);
  
  // Get active sub-agents
  const activeAgents = context.agentTracker 
    ? context.agentTracker.getActiveAgents() 
    : [];
  
  // Get active shell processes (if available)
  const activeShells = context.shellTracker 
    ? context.shellTracker.getActiveShells() 
    : [];
  
  // Get active browser sessions (if available)
  const activeSessions = context.sessionTracker 
    ? context.sessionTracker.getActiveSessions() 
    : [];
  
  // Format the status message
  const statusContent = [
    `--- STATUS UPDATE ---`,
    `Token Usage: ${totalTokens}/${maxTokens} (${usagePercentage}%)`,
    `Cost So Far: ${tokenTracker.getTotalCost()}`,
    ``,
    `Active Sub-Agents: ${activeAgents.length}`,
    ...activeAgents.map(a => `- ${a.id}: ${a.description}`),
    ``,
    `Active Shell Processes: ${activeShells.length}`,
    ...activeShells.map(s => `- ${s.id}: ${s.description}`),
    ``,
    `Active Browser Sessions: ${activeSessions.length}`,
    ...activeSessions.map(s => `- ${s.id}: ${s.description}`),
    ``,
    `If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.`,
    `--- END STATUS ---`,
  ].join('\n');
  
  return {
    role: 'system',
    content: statusContent,
  };
}

B. Modify `toolAgentCore.ts` to send periodic status updates:

// Import the new function
import { generateStatusUpdate } from './statusUpdates.js';

// In the toolAgent function:
let statusUpdateCounter = 0;
const STATUS_UPDATE_FREQUENCY = 5; // Send status every 5 iterations

for (let i = 0; i < config.maxIterations; i++) {
  // ...existing code...
  
  // Generate text using our LLM abstraction
  const { text, toolCalls, tokenUsage, totalTokens, maxTokens } = await generateText(
    provider,
    generateOptions,
  );
  
  // Store token information
  lastResponseTotalTokens = totalTokens;
  lastResponseMaxTokens = maxTokens;
  
  // Send periodic status updates
  statusUpdateCounter++;
  if (statusUpdateCounter >= STATUS_UPDATE_FREQUENCY && totalTokens && maxTokens) {
    statusUpdateCounter = 0;
    
    const statusMessage = generateStatusUpdate(
      totalTokens,
      maxTokens,
      tokenTracker,
      localContext
    );
    
    messages.push(statusMessage);
    logger.debug('Sent status update to agent');
  }
  
  // ...rest of existing code...
}

3. Implement a Message Compaction Tool

A. Create a new file `packages/agent/src/tools/utility/compactHistory.ts`:

import { z } from 'zod';

import { generateText } from '../../core/llm/core.js';
import { Message } from '../../core/llm/types.js';
import { Tool, ToolContext, ToolFunction } from '../../core/types.js';

/**
 * Schema for the compactHistory tool parameters
 */
export const CompactHistorySchema = z.object({
  preserveRecentMessages: z
    .number()
    .min(1)
    .max(50)
    .default(10)
    .describe('Number of recent messages to preserve unchanged'),
  customPrompt: z
    .string()
    .optional()
    .describe('Optional custom prompt for the summarization'),
});

/**
 * Default compaction prompt
 */
const DEFAULT_COMPACTION_PROMPT = 
  "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next.";

/**
 * Implementation of the compactHistory tool
 */
export const compactHistory: ToolFunction<typeof CompactHistorySchema> = async (
  params,
  context
) => {
  const { preserveRecentMessages, customPrompt } = params;
  const { messages, provider, tokenTracker, logger } = context;
  
  // Need at least preserveRecentMessages + 1 to do any compaction
  if (!messages || messages.length <= preserveRecentMessages) {
    return "Not enough messages to compact. No changes made.";
  }
  
  logger.info(`Compacting message history, preserving ${preserveRecentMessages} recent messages`);
  
  // Split messages into those to compact and those to preserve
  const messagesToCompact = messages.slice(0, messages.length - preserveRecentMessages);
  const messagesToPreserve = messages.slice(messages.length - preserveRecentMessages);
  
  // Create a system message with instructions for summarization
  const systemMessage: Message = {
    role: 'system',
    content: 'You are an AI assistant tasked with summarizing a conversation. Provide a concise but informative summary that captures the key points, decisions, and context needed to continue the conversation effectively.',
  };
  
  // Create a user message with the compaction prompt
  const userMessage: Message = {
    role: 'user',
    content: `${customPrompt || DEFAULT_COMPACTION_PROMPT}\n\nHere's the conversation to summarize:\n${messagesToCompact.map(m => `${m.role}: ${m.content}`).join('\n')}`,
  };
  
  // Generate the summary
  const { text, tokenUsage } = await generateText(provider, {
    messages: [systemMessage, userMessage],
    temperature: 0.3, // Lower temperature for more consistent summaries
  });
  
  // Add token usage to tracker
  tokenTracker.tokenUsage.add(tokenUsage);
  
  // Create a new message with the summary
  const summaryMessage: Message = {
    role: 'system',
    content: `[COMPACTED MESSAGE HISTORY]: ${text}`,
  };
  
  // Replace the original messages array with compacted version
  // This modifies the array in-place
  messages.splice(0, messages.length, summaryMessage, ...messagesToPreserve);
  
  // Calculate token reduction (approximate)
  const originalLength = messagesToCompact.reduce((sum, m) => sum + m.content.length, 0);
  const newLength = summaryMessage.content.length;
  const reductionPercentage = Math.round(((originalLength - newLength) / originalLength) * 100);
  
  return `Successfully compacted ${messagesToCompact.length} messages into a summary, preserving the ${preserveRecentMessages} most recent messages. Reduced message history size by approximately ${reductionPercentage}%.`;
};

/**
 * CompactHistory tool definition
 */
export const CompactHistoryTool: Tool = {
  name: 'compactHistory',
  description: 'Compacts the message history by summarizing older messages to reduce token usage',
  parameters: CompactHistorySchema,
  execute: compactHistory,
};

B. Register the tool in `packages/agent/src/tools/index.ts`:

import { CompactHistoryTool } from './utility/compactHistory.js';

// Add to the tool registry
export const tools: Tool[] = [
  // ...existing tools
  CompactHistoryTool,
];

4. Enhance Agent Trackers for Status Updates

We need to expose methods to get active agents, shells, and sessions for the status updates:

A. Update `AgentTracker` interface:

export interface AgentTracker {
  // Existing methods
  
  /**
   * Get list of active agents
   */
  getActiveAgents(): Array<{
    id: string;
    description: string;
    status: 'running' | 'completed' | 'error';
  }>;
}

B. Similarly update `ShellTracker` and `SessionTracker` interfaces.

5. Update Agent Documentation

Update the agent documentation to inform it about the new status updates and compaction tool:

// In the default system prompt in config.ts, add:
'You will receive periodic status updates showing your token usage and active background tasks.',
'If your token usage approaches 70% of the maximum, consider using the compactHistory tool to reduce context size.',
'The compactHistory tool will summarize older messages while preserving recent context.',

6. Test Cases

Unit tests for the status update generation
Unit tests for the compactHistory tool
Integration tests to ensure the agent properly responds to status updates
End-to-end tests with long-running agents to verify they can manage their context effectively

Implementation Phases

Phase 1: Enhance the LLM abstraction to track token limits
Phase 2: Implement the status update mechanism
Phase 3: Implement the compactHistory tool
Phase 4: Update agent trackers to expose active task information
Phase 5: Update agent documentation
Phase 6: Testing and validation

This approach gives the agent more self-awareness and control over its context window, while also providing useful information about background tasks that can help it make better decisions about resource management.

bhouston · 2025-03-21T19:22:03Z

Example Status Update

Here's an example of what the status update would look like for the agent:

--- STATUS UPDATE ---
Token Usage: 45,235/100,000 (45%)
Cost So Far: $0.23

Active Sub-Agents: 2
- sa_12345: Analyzing project structure and dependencies [Running, 3 unread messages]
- sa_67890: Implementing unit tests for compactHistory tool [Running, 0 unread messages]

Active Shell Processes: 3
- sh_abcde: Running npm test [Running, 152 unread lines]
  Command: npm test -- --watch packages/agent/src/tools/utility
- sh_fghij: Watching file changes [Running, 0 unread lines]
  Command: npm run watch
- sh_klmno: Git operations [Idle, 0 unread lines]
  Command: git status

Active Browser Sessions: 1
- bs_12345: TypeScript documentation [Active]
  URL: https://www.typescriptlang.org/docs/handbook/utility-types.html

Memory Usage: 45% of context window used
If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
--- END STATUS ---

This status update provides:

Overall token usage and cost information
List of active sub-agents with their IDs, descriptions, status, and unread message count
List of active shell processes with their IDs, descriptions, status, unread line count, and the actual commands
List of active browser sessions with their IDs, descriptions, status, and current URLs
A reminder about the compactHistory tool when token usage gets high

The status update will be sent periodically (e.g., every 5 agent interactions) to keep the agent informed about its resource usage and background tasks.

Implements #338 - Agent self-managed message compaction: 1. Enhanced LLM abstraction to track token limits for all providers 2. Added status update mechanism to inform agents about resource usage 3. Created compactHistory tool for summarizing older messages 4. Updated agent documentation and system prompt 5. Added tests for the new functionality 6. Created documentation for the message compaction feature This feature helps prevent context window overflow errors by giving agents awareness of their token usage and tools to manage their context window.

drinkredwine · 2025-03-21T19:40:34Z

would you consider sliding window or some kind of Dynamic Context Trimming? or maybe an approach from recent cursor update - making regular checkpoints, being able to return to them, and they can also contain smaller summaries of previous steps. or some kind of memory. so you can recover from crashes easily.

https://cline.bot/blog/understanding-the-new-context-window-progress-bar-in-cline

# [mycoder-agent-v1.7.0](mycoder-agent-v1.6.0...mycoder-agent-v1.7.0) (2025-03-21) ### Bug Fixes * Fix TypeScript errors and tests for message compaction feature ([d4f1fb5](d4f1fb5)) ### Features * Add automatic compaction of historical messages for agents ([a5caf46](a5caf46)), closes [#338](#338) * Improve message compaction with proactive suggestions ([6276bc0](6276bc0))

github-actions · 2025-03-21T20:19:23Z

🎉 This issue has been resolved in version mycoder-agent-v1.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

bhouston mentioned this issue Mar 21, 2025

Add automatic compaction of historical messages for agents #339

Merged

bhouston closed this as completed in #339 Mar 21, 2025

github-actions bot added the released label Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic compaction of historical messages for agents #338

Add automatic compaction of historical messages for agents #338

bhouston commented Mar 21, 2025

bhouston commented Mar 21, 2025

bhouston commented Mar 21, 2025

bhouston commented Mar 21, 2025

drinkredwine commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025

Add automatic compaction of historical messages for agents #338

Add automatic compaction of historical messages for agents #338

Comments

bhouston commented Mar 21, 2025

Problem

Proposed Solution

Benefits

Questions

bhouston commented Mar 21, 2025

Implementation Plan for Message Compaction

1. Enhance LLM Abstraction to Track Token Limits

A. Modify LLMResponse in packages/agent/src/core/llm/types.ts:

B. Update Provider Implementations:

2. Add Compaction Configuration to AgentConfig

A. Update AgentConfig in packages/agent/src/core/toolAgent/config.ts:

B. Update DEFAULT_CONFIG with reasonable defaults:

3. Implement Message Compaction Logic

A. Create a new file packages/agent/src/core/toolAgent/messageCompaction.ts:

B. Modify toolAgentCore.ts to integrate message compaction:

4. Test Cases

5. Documentation Updates

Implementation Phases

Questions for Discussion

bhouston commented Mar 21, 2025

Revised Implementation Plan: Agent Self-Managed Message Compaction

1. Enhance LLM Abstraction to Track Token Limits

2. Create a Status Update Mechanism

A. Create a new file packages/agent/src/core/toolAgent/statusUpdates.ts:

B. Modify toolAgentCore.ts to send periodic status updates:

3. Implement a Message Compaction Tool

A. Create a new file packages/agent/src/tools/utility/compactHistory.ts:

B. Register the tool in packages/agent/src/tools/index.ts:

4. Enhance Agent Trackers for Status Updates

A. Update AgentTracker interface:

B. Similarly update ShellTracker and SessionTracker interfaces.

5. Update Agent Documentation

6. Test Cases

Implementation Phases

bhouston commented Mar 21, 2025

Example Status Update

drinkredwine commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025

A. Modify `LLMResponse` in `packages/agent/src/core/llm/types.ts`:

A. Update `AgentConfig` in `packages/agent/src/core/toolAgent/config.ts`:

B. Update `DEFAULT_CONFIG` with reasonable defaults:

A. Create a new file `packages/agent/src/core/toolAgent/messageCompaction.ts`:

B. Modify `toolAgentCore.ts` to integrate message compaction:

A. Create a new file `packages/agent/src/core/toolAgent/statusUpdates.ts`:

B. Modify `toolAgentCore.ts` to send periodic status updates:

A. Create a new file `packages/agent/src/tools/utility/compactHistory.ts`:

B. Register the tool in `packages/agent/src/tools/index.ts`:

A. Update `AgentTracker` interface:

B. Similarly update `ShellTracker` and `SessionTracker` interfaces.

drinkredwine commented Mar 21, 2025 •

edited

Loading