computer-usetools

Get started with Computer Use

With the release of Computer Use in Claude 3.5 Sonnet, you can now direct AI models to interact with computers like humans do - moving cursors, clicking buttons, and typing text. This capability enables automation of complex tasks while leveraging Claude's advanced reasoning abilities.

The AI SDK is a powerful TypeScript toolkit for building AI applications with large language models (LLMs) like Anthropic's Claude alongside popular frameworks like React, Next.js, Vue, Svelte, Node.js, and more. In this guide, you will learn how to integrate Computer Use into your AI SDK applications.

Computer Use is currently in beta with some limitations . The feature may be error-prone at times. Anthropic recommends starting with low-risk tasks and implementing appropriate safety measures.

Computer Use

Anthropic recently released a new version of the Claude 3.5 Sonnet model which is capable of 'Computer Use'. This allows the model to interact with computer interfaces through basic actions like:

Moving the cursor
Clicking buttons
Typing text
Taking screenshots
Reading screen content

How It Works

Computer Use enables the model to read and interact with on-screen content through a series of coordinated steps. Here's how the process works:

Start with a prompt and tools

Add Anthropic-defined Computer Use tools to your request and provide a task (prompt) for the model. For example: "save an image to your downloads folder."
Select the right tool

The model evaluates which computer tools can help accomplish the task. It then sends a formatted tool_call to use the appropriate tool.
Execute the action and return results

The AI SDK processes Claude's request by running the selected tool. The results can then be sent back to Claude through a tool_result message.
Complete the task through iterations

Claude analyzes each result to determine if more actions are needed. It continues requesting tool use and processing results until it completes your task or requires additional input.

Available Tools

There are three main tools available in the Computer Use API:

Computer Tool: Enables basic computer control like mouse movement, clicking, and keyboard input
Text Editor Tool: Provides functionality for viewing and editing text files
Bash Tool: Allows execution of bash commands

Implementation Considerations

Computer Use tools in the AI SDK are predefined interfaces that require your own implementation of the execution layer. While the SDK provides the type definitions and structure for these tools, you need to:

Set up a controlled environment for Computer Use execution
Implement core functionality like mouse control and keyboard input
Handle screenshot capture and processing
Set up rules and limits for how Claude can interact with your system

The recommended approach is to start with Anthropic's reference implementation , which provides:

A containerized environment configured for safe Computer Use
Ready-to-use (Python) implementations of Computer Use tools
An agent loop for API interaction and tool execution
A web interface for monitoring and control

This reference implementation serves as a foundation to understand the requirements before building your own custom solution.

Getting Started with the AI SDK

If you have never used the AI SDK before, start by following the Getting Started guide.

First, ensure you have the AI SDK and Anthropic AI SDK provider installed:

pnpm add ai@beta @ai-sdk/anthropic@beta

You can add Computer Use to your AI SDK applications using provider-defined-client tools. These tools accept various input parameters (like display height and width in the case of the computer tool) and then require that you define an execute function.

Here's how you could set up the Computer Tool with the AI SDK:

import { anthropic } from '@ai-sdk/anthropic';
import { getScreenshot, executeComputerAction } from '@/utils/computer-use';

const computerTool = anthropic.tools.computer_20241022({
  displayWidthPx: 1920,
  displayHeightPx: 1080,
  execute: async ({ action, coordinate, text }) => {
    switch (action) {
      case 'screenshot': {
        return {
          type: 'image',
          data: getScreenshot(),
        };
      }
      default: {
        return executeComputerAction(action, coordinate, text);
      }
    }
  },
  experimental_toToolResultContent(result) {
    return typeof result === 'string'
      ? [{ type: 'text', text: result }]
      : [{ type: 'image', data: result.data, mediaType: 'image/png' }];
  },
});

The computerTool handles two main actions: taking screenshots via getScreenshot() and executing computer actions like mouse movements and clicks through executeComputerAction(). Remember, you have to implement this execution logic (eg. the getScreenshot and executeComputerAction functions) to handle the actual computer interactions. The execute function should handle all low-level interactions with the operating system.

Finally, to send tool results back to the model, use the experimental_toToolResultContent() function to convert text and image responses into a format the model can process. The AI SDK includes experimental support for these multi-modal tool results when using Anthropic's models.

Computer Use requires appropriate safety measures like using virtual machines, limiting access to sensitive data, and implementing human oversight for critical actions.

Using Computer Tools with Text Generation

Once your tool is defined, you can use it with both the generateText and streamText functions.

For one-shot text generation, use generateText:

const result = await generateText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: 'Move the cursor to the center of the screen and take a screenshot',
  tools: { computer: computerTool },
});

console.log(result.text);

For streaming responses, use streamText to receive updates in real-time:

const result = streamText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: 'Open the browser and navigate to vercel.com',
  tools: { computer: computerTool },
});

for await (const chunk of result.textStream) {
  console.log(chunk);
}

Configure Multi-Step (Agentic) Generations

To allow the model to perform multiple steps without user intervention, specify a maxSteps value. This will automatically send any tool results back to the model to trigger a subsequent generation:

const stream = streamText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: 'Open the browser and navigate to vercel.com',
  tools: { computer: computerTool },
  maxSteps: 10, // experiment with this value based on your use case
});

Combine Multiple Tools

You can combine multiple tools in a single request to enable more complex workflows. The AI SDK supports all three of Claude's Computer Use tools:

const computerTool = anthropic.tools.computer_20241022({
  ...
});

const bashTool = anthropic.tools.bash_20241022({
  execute: async ({ command, restart }) => execSync(command).toString()
});

const textEditorTool = anthropic.tools.textEditor_20241022({
  execute: async ({
    command,
    path,
    file_text,
    insert_line,
    new_str,
    old_str,
    view_range
  }) => {
    // Handle file operations based on command
    switch(command) {
      return executeTextEditorFunction({
        command,
        path,
        fileText: file_text,
        insertLine: insert_line,
        newStr: new_str,
        oldStr: old_str,
        viewRange: view_range
      });
    }
  }
});


const response = await generateText({
  model: anthropic("claude-3-5-sonnet-20241022"),
  prompt: "Create a new file called example.txt, write 'Hello World' to it, and run 'cat example.txt' in the terminal",
  tools: {
    computer: computerTool,
    bash: bashTool
    str_replace_editor: textEditorTool,
  },
});

Always implement appropriate security measures and obtain user consent before enabling Computer Use in production applications.

Best Practices for Computer Use

To get the best results when using Computer Use:

Specify simple, well-defined tasks with explicit instructions for each step
Prompt Claude to verify outcomes through screenshots
Use keyboard shortcuts when UI elements are difficult to manipulate
Include example screenshots for repeatable tasks
Provide explicit tips in system prompts for known tasks

Security Measures

Remember, Computer Use is a beta feature. Please be aware that it poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using Computer Use to interact with the internet. To minimize risks, consider taking precautions such as:

Use a dedicated virtual machine or container with minimal privileges to prevent direct system attacks or accidents.
Avoid giving the model access to sensitive data, such as account login information, to prevent information theft.
Limit internet access to an allowlist of domains to reduce exposure to malicious content.
Ask a human to confirm decisions that may result in meaningful real-world consequences as well as any tasks requiring affirmative consent, such as accepting cookies, executing financial transactions, or agreeing to terms of service.