Skip to content

Add automatic compaction of historical messages for agents #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bhouston opened this issue Mar 21, 2025 · 5 comments · Fixed by #339
Closed

Add automatic compaction of historical messages for agents #338

bhouston opened this issue Mar 21, 2025 · 5 comments · Fixed by #339
Labels

Comments

@bhouston
Copy link
Member

Problem

When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded.

Proposed Solution

Implement automatic compaction of historical messages to prevent context window overflow:

  1. Enhance the LLM abstraction to track and return:

    • Total tokens used in the last completion request
    • Maximum allowed tokens for the model/provider
  2. Monitor token usage and trigger compaction when it approaches a configurable threshold (e.g., 50% of maximum)

  3. When triggered, compact older messages (excluding recent ones, perhaps 10 messages back) into a single summarized message

  4. Use a prompt like: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next."

Benefits

  • Prevents context window overflow errors
  • Maintains important context for agent operation
  • Enables longer-running agent sessions
  • Makes the system more robust for complex tasks

Questions

  • Should the compaction threshold be configurable?
  • How many recent messages should be preserved from compaction?
  • Should we implement different compaction strategies for different agent types?
@bhouston
Copy link
Member Author

Implementation Plan for Message Compaction

After analyzing the codebase, here's a detailed plan for implementing automatic compaction of historical messages for agents:

1. Enhance LLM Abstraction to Track Token Limits

A. Modify LLMResponse in packages/agent/src/core/llm/types.ts:

export interface LLMResponse {
  text: string;
  toolCalls: ToolCall[];
  tokenUsage: TokenUsage;
  // Add new fields
  totalTokens: number;  // Total tokens used in this request
  maxTokens: number;    // Maximum allowed tokens for this model
}

B. Update Provider Implementations:

  • Update each provider (Anthropic, OpenAI, etc.) to return these additional metrics
  • For each provider, determine how to get the model's maximum token limit
  • Example for Anthropic in packages/agent/src/core/llm/providers/anthropic.ts:
// Add a map of model context window sizes
const ANTHROPIC_MODEL_LIMITS = {
  'claude-3-opus-20240229': 200000,
  'claude-3-sonnet-20240229': 200000,
  'claude-3-haiku-20240307': 200000,
  'claude-3-7-sonnet-20250219': 200000,
  // Add other models
};

// Update tokenUsageFromMessage function
function tokenUsageFromMessage(message: Anthropic.Message, model: string) {
  const usage = new TokenUsage();
  usage.input = message.usage.input_tokens;
  usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
  usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
  usage.output = message.usage.output_tokens;
  
  return {
    usage,
    totalTokens: usage.input + usage.output,
    maxTokens: ANTHROPIC_MODEL_LIMITS[model] || 100000, // Default fallback
  };
}

// Update generateText method
async generateText(options: GenerateOptions): Promise<LLMResponse> {
  // ...existing code...
  
  return {
    text: content,
    toolCalls: toolCalls,
    tokenUsage: tokenUsageFromResponse.usage,
    totalTokens: tokenUsageFromResponse.totalTokens,
    maxTokens: tokenUsageFromResponse.maxTokens,
  };
}

2. Add Compaction Configuration to AgentConfig

A. Update AgentConfig in packages/agent/src/core/toolAgent/config.ts:

export type AgentConfig = {
  maxIterations: number;
  getSystemPrompt: (toolContext: ToolContext) => string;
  // Add message compaction configuration
  messageCompaction: {
    enabled: boolean;
    thresholdPercentage: number; // e.g., 50 (%) 
    preserveRecentMessages: number; // e.g., 10 messages
    compactionPrompt: string; // Prompt for summarizing messages
  };
};

B. Update DEFAULT_CONFIG with reasonable defaults:

export const DEFAULT_CONFIG: AgentConfig = {
  maxIterations: 200,
  getSystemPrompt: getDefaultSystemPrompt,
  messageCompaction: {
    enabled: true,
    thresholdPercentage: 50,
    preserveRecentMessages: 10,
    compactionPrompt: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next.",
  },
};

3. Implement Message Compaction Logic

A. Create a new file packages/agent/src/core/toolAgent/messageCompaction.ts:

import { Message } from '../llm/types.js';
import { generateText } from '../llm/core.js';
import { LLMProvider } from '../llm/provider.js';
import { TokenTracker } from '../tokens.js';

/**
 * Determines if message compaction is needed based on current usage
 */
export function shouldCompactMessages(
  totalTokens: number,
  maxTokens: number,
  thresholdPercentage: number
): boolean {
  return totalTokens >= (maxTokens * thresholdPercentage) / 100;
}

/**
 * Compacts messages by summarizing older messages
 */
export async function compactMessages(
  messages: Message[],
  provider: LLMProvider,
  preserveRecentMessages: number,
  compactionPrompt: string,
  tokenTracker: TokenTracker
): Promise<Message[]> {
  if (messages.length <= preserveRecentMessages) {
    return messages; // Not enough messages to compact
  }

  // Split messages into those to compact and those to preserve
  const messagesToCompact = messages.slice(0, messages.length - preserveRecentMessages);
  const messagesToPreserve = messages.slice(messages.length - preserveRecentMessages);
  
  // Create a system message with instructions for summarization
  const systemMessage: Message = {
    role: 'system',
    content: 'You are an AI assistant tasked with summarizing a conversation. Provide a concise but informative summary that captures the key points, decisions, and context needed to continue the conversation effectively.',
  };
  
  // Create a user message with the compaction prompt
  const userMessage: Message = {
    role: 'user',
    content: `${compactionPrompt}\n\nHere's the conversation to summarize:\n${messagesToCompact.map(m => `${m.role}: ${m.content}`).join('\n')}`,
  };
  
  // Generate the summary
  const { text, tokenUsage } = await generateText(provider, {
    messages: [systemMessage, userMessage],
    temperature: 0.3, // Lower temperature for more consistent summaries
  });
  
  // Add token usage to tracker
  tokenTracker.tokenUsage.add(tokenUsage);
  
  // Create a new message with the summary
  const summaryMessage: Message = {
    role: 'system',
    content: `[COMPACTED MESSAGE HISTORY]: ${text}`,
  };
  
  // Return the compacted messages (summary + recent messages)
  return [summaryMessage, ...messagesToPreserve];
}

B. Modify toolAgentCore.ts to integrate message compaction:

Update the main loop in toolAgent function to check if compaction is needed before each LLM call:

// Import the new functions
import { shouldCompactMessages, compactMessages } from './messageCompaction.js';

// In the toolAgent function:
for (let i = 0; i < config.maxIterations; i++) {
  // ...existing code...

  // Check if message compaction is needed
  if (
    config.messageCompaction.enabled &&
    i > 0 && // Don't compact on the first iteration
    lastResponseTotalTokens && 
    lastResponseMaxTokens
  ) {
    const shouldCompact = shouldCompactMessages(
      lastResponseTotalTokens,
      lastResponseMaxTokens,
      config.messageCompaction.thresholdPercentage
    );
    
    if (shouldCompact) {
      logger.info('Compacting message history to reduce token usage');
      messages = await compactMessages(
        messages,
        provider,
        config.messageCompaction.preserveRecentMessages,
        config.messageCompaction.compactionPrompt,
        tokenTracker
      );
      logger.info(`Message history compacted: ${messages.length} messages remaining`);
    }
  }

  // Generate text using our LLM abstraction
  const generateOptions = {
    messages: messagesWithSystem,
    functions: functionDefinitions,
    temperature: localContext.temperature,
    maxTokens: localContext.maxTokens,
  };

  const { text, toolCalls, tokenUsage, totalTokens, maxTokens } = await generateText(
    provider,
    generateOptions,
  );
  
  // Store token information for next iteration
  lastResponseTotalTokens = totalTokens;
  lastResponseMaxTokens = maxTokens;
  
  // ...rest of existing code...
}

4. Test Cases

Create tests to verify the message compaction functionality:

  1. Unit tests for the compaction threshold calculation
  2. Unit tests for the message compaction logic
  3. Integration tests to ensure the agent works correctly with compaction enabled
  4. Performance tests to verify token usage reduction

5. Documentation Updates

  1. Update the README.md to mention the new message compaction feature
  2. Add documentation about the configuration options for message compaction
  3. Add examples of how to customize the compaction behavior

Implementation Phases

  1. Phase 1: Enhance the LLM abstraction to track token limits
  2. Phase 2: Implement the message compaction logic
  3. Phase 3: Integrate compaction into the toolAgentCore
  4. Phase 4: Add configuration options
  5. Phase 5: Testing and documentation

Questions for Discussion

  1. Should we implement different compaction strategies for different types of agents?
  2. Should we provide a way to disable compaction for specific use cases?
  3. Should we consider more sophisticated compaction strategies (e.g., semantic clustering)?
  4. How should we handle tool messages during compaction?

@bhouston
Copy link
Member Author

Revised Implementation Plan: Agent Self-Managed Message Compaction

Based on feedback, we're revising the approach to give the agent more self-awareness and control over its context window usage. Instead of automatic compaction, we'll:

  1. Implement a status update mechanism to inform the agent about resource usage
  2. Create a compaction tool that the agent can call when needed
  3. Make the system more transparent about available resources

1. Enhance LLM Abstraction to Track Token Limits

Same as previous plan, we need to enhance the LLM abstraction to track and return:

  • Total tokens used in the current completion request
  • Maximum allowed tokens for the model/provider

2. Create a Status Update Mechanism

A. Create a new file packages/agent/src/core/toolAgent/statusUpdates.ts:

import { Message } from '../llm/types.js';
import { TokenTracker } from '../tokens.js';
import { ToolContext } from '../types.js';

/**
 * Generate a status update message for the agent
 */
export function generateStatusUpdate(
  totalTokens: number,
  maxTokens: number,
  tokenTracker: TokenTracker,
  context: ToolContext
): Message {
  // Calculate token usage percentage
  const usagePercentage = Math.round((totalTokens / maxTokens) * 100);
  
  // Get active sub-agents
  const activeAgents = context.agentTracker 
    ? context.agentTracker.getActiveAgents() 
    : [];
  
  // Get active shell processes (if available)
  const activeShells = context.shellTracker 
    ? context.shellTracker.getActiveShells() 
    : [];
  
  // Get active browser sessions (if available)
  const activeSessions = context.sessionTracker 
    ? context.sessionTracker.getActiveSessions() 
    : [];
  
  // Format the status message
  const statusContent = [
    `--- STATUS UPDATE ---`,
    `Token Usage: ${totalTokens}/${maxTokens} (${usagePercentage}%)`,
    `Cost So Far: ${tokenTracker.getTotalCost()}`,
    ``,
    `Active Sub-Agents: ${activeAgents.length}`,
    ...activeAgents.map(a => `- ${a.id}: ${a.description}`),
    ``,
    `Active Shell Processes: ${activeShells.length}`,
    ...activeShells.map(s => `- ${s.id}: ${s.description}`),
    ``,
    `Active Browser Sessions: ${activeSessions.length}`,
    ...activeSessions.map(s => `- ${s.id}: ${s.description}`),
    ``,
    `If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.`,
    `--- END STATUS ---`,
  ].join('\n');
  
  return {
    role: 'system',
    content: statusContent,
  };
}

B. Modify toolAgentCore.ts to send periodic status updates:

// Import the new function
import { generateStatusUpdate } from './statusUpdates.js';

// In the toolAgent function:
let statusUpdateCounter = 0;
const STATUS_UPDATE_FREQUENCY = 5; // Send status every 5 iterations

for (let i = 0; i < config.maxIterations; i++) {
  // ...existing code...
  
  // Generate text using our LLM abstraction
  const { text, toolCalls, tokenUsage, totalTokens, maxTokens } = await generateText(
    provider,
    generateOptions,
  );
  
  // Store token information
  lastResponseTotalTokens = totalTokens;
  lastResponseMaxTokens = maxTokens;
  
  // Send periodic status updates
  statusUpdateCounter++;
  if (statusUpdateCounter >= STATUS_UPDATE_FREQUENCY && totalTokens && maxTokens) {
    statusUpdateCounter = 0;
    
    const statusMessage = generateStatusUpdate(
      totalTokens,
      maxTokens,
      tokenTracker,
      localContext
    );
    
    messages.push(statusMessage);
    logger.debug('Sent status update to agent');
  }
  
  // ...rest of existing code...
}

3. Implement a Message Compaction Tool

A. Create a new file packages/agent/src/tools/utility/compactHistory.ts:

import { z } from 'zod';

import { generateText } from '../../core/llm/core.js';
import { Message } from '../../core/llm/types.js';
import { Tool, ToolContext, ToolFunction } from '../../core/types.js';

/**
 * Schema for the compactHistory tool parameters
 */
export const CompactHistorySchema = z.object({
  preserveRecentMessages: z
    .number()
    .min(1)
    .max(50)
    .default(10)
    .describe('Number of recent messages to preserve unchanged'),
  customPrompt: z
    .string()
    .optional()
    .describe('Optional custom prompt for the summarization'),
});

/**
 * Default compaction prompt
 */
const DEFAULT_COMPACTION_PROMPT = 
  "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next.";

/**
 * Implementation of the compactHistory tool
 */
export const compactHistory: ToolFunction<typeof CompactHistorySchema> = async (
  params,
  context
) => {
  const { preserveRecentMessages, customPrompt } = params;
  const { messages, provider, tokenTracker, logger } = context;
  
  // Need at least preserveRecentMessages + 1 to do any compaction
  if (!messages || messages.length <= preserveRecentMessages) {
    return "Not enough messages to compact. No changes made.";
  }
  
  logger.info(`Compacting message history, preserving ${preserveRecentMessages} recent messages`);
  
  // Split messages into those to compact and those to preserve
  const messagesToCompact = messages.slice(0, messages.length - preserveRecentMessages);
  const messagesToPreserve = messages.slice(messages.length - preserveRecentMessages);
  
  // Create a system message with instructions for summarization
  const systemMessage: Message = {
    role: 'system',
    content: 'You are an AI assistant tasked with summarizing a conversation. Provide a concise but informative summary that captures the key points, decisions, and context needed to continue the conversation effectively.',
  };
  
  // Create a user message with the compaction prompt
  const userMessage: Message = {
    role: 'user',
    content: `${customPrompt || DEFAULT_COMPACTION_PROMPT}\n\nHere's the conversation to summarize:\n${messagesToCompact.map(m => `${m.role}: ${m.content}`).join('\n')}`,
  };
  
  // Generate the summary
  const { text, tokenUsage } = await generateText(provider, {
    messages: [systemMessage, userMessage],
    temperature: 0.3, // Lower temperature for more consistent summaries
  });
  
  // Add token usage to tracker
  tokenTracker.tokenUsage.add(tokenUsage);
  
  // Create a new message with the summary
  const summaryMessage: Message = {
    role: 'system',
    content: `[COMPACTED MESSAGE HISTORY]: ${text}`,
  };
  
  // Replace the original messages array with compacted version
  // This modifies the array in-place
  messages.splice(0, messages.length, summaryMessage, ...messagesToPreserve);
  
  // Calculate token reduction (approximate)
  const originalLength = messagesToCompact.reduce((sum, m) => sum + m.content.length, 0);
  const newLength = summaryMessage.content.length;
  const reductionPercentage = Math.round(((originalLength - newLength) / originalLength) * 100);
  
  return `Successfully compacted ${messagesToCompact.length} messages into a summary, preserving the ${preserveRecentMessages} most recent messages. Reduced message history size by approximately ${reductionPercentage}%.`;
};

/**
 * CompactHistory tool definition
 */
export const CompactHistoryTool: Tool = {
  name: 'compactHistory',
  description: 'Compacts the message history by summarizing older messages to reduce token usage',
  parameters: CompactHistorySchema,
  execute: compactHistory,
};

B. Register the tool in packages/agent/src/tools/index.ts:

import { CompactHistoryTool } from './utility/compactHistory.js';

// Add to the tool registry
export const tools: Tool[] = [
  // ...existing tools
  CompactHistoryTool,
];

4. Enhance Agent Trackers for Status Updates

We need to expose methods to get active agents, shells, and sessions for the status updates:

A. Update AgentTracker interface:

export interface AgentTracker {
  // Existing methods
  
  /**
   * Get list of active agents
   */
  getActiveAgents(): Array<{
    id: string;
    description: string;
    status: 'running' | 'completed' | 'error';
  }>;
}

B. Similarly update ShellTracker and SessionTracker interfaces.

5. Update Agent Documentation

Update the agent documentation to inform it about the new status updates and compaction tool:

// In the default system prompt in config.ts, add:
'You will receive periodic status updates showing your token usage and active background tasks.',
'If your token usage approaches 70% of the maximum, consider using the compactHistory tool to reduce context size.',
'The compactHistory tool will summarize older messages while preserving recent context.',

6. Test Cases

  1. Unit tests for the status update generation
  2. Unit tests for the compactHistory tool
  3. Integration tests to ensure the agent properly responds to status updates
  4. End-to-end tests with long-running agents to verify they can manage their context effectively

Implementation Phases

  1. Phase 1: Enhance the LLM abstraction to track token limits
  2. Phase 2: Implement the status update mechanism
  3. Phase 3: Implement the compactHistory tool
  4. Phase 4: Update agent trackers to expose active task information
  5. Phase 5: Update agent documentation
  6. Phase 6: Testing and validation

This approach gives the agent more self-awareness and control over its context window, while also providing useful information about background tasks that can help it make better decisions about resource management.

@bhouston
Copy link
Member Author

Example Status Update

Here's an example of what the status update would look like for the agent:

--- STATUS UPDATE ---
Token Usage: 45,235/100,000 (45%)
Cost So Far: $0.23

Active Sub-Agents: 2
- sa_12345: Analyzing project structure and dependencies [Running, 3 unread messages]
- sa_67890: Implementing unit tests for compactHistory tool [Running, 0 unread messages]

Active Shell Processes: 3
- sh_abcde: Running npm test [Running, 152 unread lines]
  Command: npm test -- --watch packages/agent/src/tools/utility
- sh_fghij: Watching file changes [Running, 0 unread lines]
  Command: npm run watch
- sh_klmno: Git operations [Idle, 0 unread lines]
  Command: git status

Active Browser Sessions: 1
- bs_12345: TypeScript documentation [Active]
  URL: https://www.typescriptlang.org/docs/handbook/utility-types.html

Memory Usage: 45% of context window used
If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
--- END STATUS ---

This status update provides:

  1. Overall token usage and cost information
  2. List of active sub-agents with their IDs, descriptions, status, and unread message count
  3. List of active shell processes with their IDs, descriptions, status, unread line count, and the actual commands
  4. List of active browser sessions with their IDs, descriptions, status, and current URLs
  5. A reminder about the compactHistory tool when token usage gets high

The status update will be sent periodically (e.g., every 5 agent interactions) to keep the agent informed about its resource usage and background tasks.

bhouston added a commit that referenced this issue Mar 21, 2025
Implements #338 - Agent self-managed message compaction:

1. Enhanced LLM abstraction to track token limits for all providers
2. Added status update mechanism to inform agents about resource usage
3. Created compactHistory tool for summarizing older messages
4. Updated agent documentation and system prompt
5. Added tests for the new functionality
6. Created documentation for the message compaction feature

This feature helps prevent context window overflow errors by
giving agents awareness of their token usage and tools to
manage their context window.
@drinkredwine
Copy link

drinkredwine commented Mar 21, 2025

would you consider sliding window or some kind of Dynamic Context Trimming? or maybe an approach from recent cursor update - making regular checkpoints, being able to return to them, and they can also contain smaller summaries of previous steps. or some kind of memory. so you can recover from crashes easily.

https://cline.bot/blog/understanding-the-new-context-window-progress-bar-in-cline

github-actions bot pushed a commit that referenced this issue Mar 21, 2025
# [mycoder-agent-v1.7.0](mycoder-agent-v1.6.0...mycoder-agent-v1.7.0) (2025-03-21)

### Bug Fixes

* Fix TypeScript errors and tests for message compaction feature ([d4f1fb5](d4f1fb5))

### Features

* Add automatic compaction of historical messages for agents ([a5caf46](a5caf46)), closes [#338](#338)
* Improve message compaction with proactive suggestions ([6276bc0](6276bc0))
Copy link

🎉 This issue has been resolved in version mycoder-agent-v1.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants