With the release of Computer Use in Claude 3.5 Sonnet, you can now direct AI models to interact with computers like humans do - moving cursors, clicking buttons, and typing text. This capability enables automation of complex tasks while leveraging Claude's advanced reasoning abilities.
The AI SDK is a powerful TypeScript toolkit for building AI applications with large language models (LLMs) like Anthropic's Claude alongside popular frameworks like React, Next.js, Vue, Svelte, Node.js, and more. In this guide, you will learn how to integrate Computer Use into your AI SDK applications.
Computer Use is currently in beta with some limitations . The feature may be error-prone at times. Anthropic recommends starting with low-risk tasks and implementing appropriate safety measures.
Anthropic recently released a new version of the Claude 3.5 Sonnet model which is capable of 'Computer Use'. This allows the model to interact with computer interfaces through basic actions like:
Computer Use enables the model to read and interact with on-screen content through a series of coordinated steps. Here's how the process works:
Start with a prompt and tools
Add Anthropic-defined Computer Use tools to your request and provide a task (prompt) for the model. For example: "save an image to your downloads folder."
Select the right tool
The model evaluates which computer tools can help accomplish the task. It then sends a formatted tool_call
to use the appropriate tool.
Execute the action and return results
The AI SDK processes Claude's request by running the selected tool. The results can then be sent back to Claude through a tool_result
message.
Complete the task through iterations
Claude analyzes each result to determine if more actions are needed. It continues requesting tool use and processing results until it completes your task or requires additional input.
There are three main tools available in the Computer Use API:
Computer Use tools in the AI SDK are predefined interfaces that require your own implementation of the execution layer. While the SDK provides the type definitions and structure for these tools, you need to:
The recommended approach is to start with Anthropic's reference implementation , which provides:
This reference implementation serves as a foundation to understand the requirements before building your own custom solution.
If you have never used the AI SDK before, start by following the Getting Started guide.
First, ensure you have the AI SDK and Anthropic AI SDK provider installed:
pnpm add ai@beta @ai-sdk/anthropic@beta
You can add Computer Use to your AI SDK applications using provider-defined-client tools. These tools accept various input parameters (like display height and width in the case of the computer tool) and then require that you define an execute function.
Here's how you could set up the Computer Tool with the AI SDK:
import { anthropic } from '@ai-sdk/anthropic';import { getScreenshot, executeComputerAction } from '@/utils/computer-use';
const computerTool = anthropic.tools.computer_20241022({ displayWidthPx: 1920, displayHeightPx: 1080, execute: async ({ action, coordinate, text }) => { switch (action) { case 'screenshot': { return { type: 'image', data: getScreenshot(), }; } default: { return executeComputerAction(action, coordinate, text); } } }, experimental_toToolResultContent(result) { return typeof result === 'string' ? [{ type: 'text', text: result }] : [{ type: 'image', data: result.data, mediaType: 'image/png' }]; },});
The computerTool
handles two main actions: taking screenshots via getScreenshot()
and executing computer actions like mouse movements and clicks through executeComputerAction()
. Remember, you have to implement this execution logic (eg. the getScreenshot
and executeComputerAction
functions) to handle the actual computer interactions. The execute
function should handle all low-level interactions with the operating system.
Finally, to send tool results back to the model, use the experimental_toToolResultContent()
function to convert text and image responses into a format the model can process. The AI SDK includes experimental support for these multi-modal tool results when using Anthropic's models.
Computer Use requires appropriate safety measures like using virtual machines, limiting access to sensitive data, and implementing human oversight for critical actions.
Once your tool is defined, you can use it with both the generateText
and streamText
functions.
For one-shot text generation, use generateText
:
const result = await generateText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Move the cursor to the center of the screen and take a screenshot', tools: { computer: computerTool },});
console.log(result.text);
For streaming responses, use streamText
to receive updates in real-time:
const result = streamText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Open the browser and navigate to vercel.com', tools: { computer: computerTool },});
for await (const chunk of result.textStream) { console.log(chunk);}
To allow the model to perform multiple steps without user intervention, specify a maxSteps
value. This will automatically send any tool results back to the model to trigger a subsequent generation:
const stream = streamText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Open the browser and navigate to vercel.com', tools: { computer: computerTool }, maxSteps: 10, // experiment with this value based on your use case});
You can combine multiple tools in a single request to enable more complex workflows. The AI SDK supports all three of Claude's Computer Use tools:
const computerTool = anthropic.tools.computer_20241022({ ...});
const bashTool = anthropic.tools.bash_20241022({ execute: async ({ command, restart }) => execSync(command).toString()});
const textEditorTool = anthropic.tools.textEditor_20241022({ execute: async ({ command, path, file_text, insert_line, new_str, old_str, view_range }) => { // Handle file operations based on command switch(command) { return executeTextEditorFunction({ command, path, fileText: file_text, insertLine: insert_line, newStr: new_str, oldStr: old_str, viewRange: view_range }); } }});
const response = await generateText({ model: anthropic("claude-3-5-sonnet-20241022"), prompt: "Create a new file called example.txt, write 'Hello World' to it, and run 'cat example.txt' in the terminal", tools: { computer: computerTool, bash: bashTool str_replace_editor: textEditorTool, },});
Always implement appropriate security measures and obtain user consent before enabling Computer Use in production applications.
To get the best results when using Computer Use:
Remember, Computer Use is a beta feature. Please be aware that it poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using Computer Use to interact with the internet. To minimize risks, consider taking precautions such as: