HyperAgent SDK

HyperAgent Class

The HyperAgent class provides an interface for running autonomous web agents. It manages browser instances, task execution, and integration with Model Context Protocol (MCP) servers.

Creating a HyperAgent Instance

import { HyperAgent } from "@/agent";
import { ChatOpenAI } from "@langchain/openai";

// Initialize with default settings (requires OPENAI_API_KEY env var)
const agent = new HyperAgent();

// Or, initialize with custom configuration
const agentWithConfig = new HyperAgent({
    llm: new ChatOpenAI({ 
        modelName: "gpt-4o",
        apiKey: process.env.OPENAI_API_KEY,
    }), // Specify the LLM
    browserProvider: "Hyperbrowser", // Can be Local or Hyperbrowser depending on the browser provider you want. Defaults to Local
    hyperbrowserConfig: { /* Hyperbrowser specific config if using Hyperbrowser browser provider */ },
    localConfig: { /* Playwright launch option if using the Local Browser Provider*/ }
    debug: true, // Enable debug logging
    customActions: [ /* Array of custom actions */ ],
});

Initializing

Description: Initializes a new instance of the HyperAgent. Configures the language model, browser provider (local or Hyperbrowser), debug mode, and custom actions.
Parameters:
- params (HyperAgentConfig, optional): Configuration object.
  - llm (BaseChatModel, optional): The language model instance to use. Defaults to using GPT-4o if OPENAI_API_KEY environment variable is set.
  - browserProvider ("Hyperbrowser" | "Local", optional): Specifies whether to use Hyperbrowser's cloud browsers or a local Playwright instance. Defaults to Local.
  - debug (boolean, optional): Enables detailed logging if set to true. Defaults to false. Logs are dumped to ./debug directory.
Throws: HyperagentError if no LLM is provided and OPENAI_API_KEY env var is not set.

Browser and Page Management

List all current pages

async getPages(): Promise<HyperPage[]>

Description: Retrieves all currently open pages within the agent's browser context. Each page is enhanced with ai and aiAsync methods for task execution.
Returns: Promise<HyperPage[]> - An array of HyperPage objects.

Usage:

const pages = await agent.getPages();
if (pages.length > 0) {
    await pages[0].ai("Summarize the content of this page.");
}

Creating a newPage

async newPage(): Promise<HyperPage>

Description: Creates and returns a new page (tab) in the agent's browser context. The returned page is enhanced with ai and aiAsync methods.
Returns: Promise<HyperPage> - A new HyperPage object.

Usage:

const newPage = await agent.newPage();
await newPage.goto("https://example.com");
const summary = await newPage.ai("What is this website about?");
console.log(summary);

Get current page

async getCurrentPage(): Promise<Page>

Description: Gets the agent's currently active page. If no page exists or the current page is closed, it creates a new one. Note: This returns a standard Playwright Page object, not a HyperPage.
Returns: Promise<Page> - The current or a new Playwright Page.

Usage:

const currentPage = await agent.getCurrentPage();
await currentPage.goto("https://google.com");

Close agent

`async closeAgent(): Promise<void>`

Description: Closes the agent, including the browser instance, browser context, and any active MCP connections. Cancels any tasks that are still running or paused.
Returns: Promise<void>

Usage:

await agent.closeAgent();
console.log("Agent closed.");

Task Execution

Execute a task

async executeTask(task: string, params?: TaskParams, initPage?: Page): Promise<TaskOutput>

Description: Executes a given task instruction synchronously. The agent uses its LLM and configured actions to perform the task on a browser page. It waits for the task to complete (or fail) and returns the final output.
Parameters:
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<TaskOutput> - The result of the task, typically a string summary or structured data if outputSchema was provided.
Throws: Rethrows any error encountered during task execution.

Usage:

const page = await agent.newPage();
await page.goto("https://example.com");
const result = await agent.executeTask("Extract the main heading from www.example.com", { outputSchema: z.object({ heading: z.string() }) }, page);
console.log(result); // { heading: "Example Domain" }

Execute a task asynchronously

async executeTaskAsync(task: string, params?: TaskParams, initPage?: Page): Promise<Task>

Description: Executes a given task instruction asynchronously. It immediately returns a Task control object, allowing management (pause, resume, cancel) of the background task.
Parameters
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<Task> - A Task control object with methods:
- getStatus(): TaskStatus
- pause(): TaskStatus
- resume(): TaskStatus
- cancel(): TaskStatus

Usage:

const taskControl = await agent.executeTaskAsync("Go to news.google.com, and search for international news.");
console.log("Task started with status:", taskControl.getStatus());
// ... later
taskControl.pause();
// ... even later
taskControl.cancel();

Model Context Protocol (MCP) Integration

MCP allows extending the agent's capabilities by connecting to external servers that provide additional tools/actions.

Initialize a MCP client

async initializeMCPClient(config: MCPConfig): Promise<void>

Description: Initializes the MCP client and attempts to connect to all servers specified in the configuration. Actions provided by successfully connected servers are registered with the agent.
Parameters:
- config (MCPConfig): Configuration object containing an array of servers (each with connection details like URL and ID).
Returns: Promise<void>

Usage:

await agent.initializeMCPClient({
    servers: [
        { id: "server1", url: "ws://localhost:8080" },
        // ... other servers
    ]
});

Connect to a single MCP server

async connectToMCPServer(serverConfig: MCPServerConfig): Promise<string | null>

Description: Connects to a single MCP server at runtime. Registers actions provided by the server if the connection is successful.
Parameters:
- serverConfig (MCPServerConfig): Configuration for the specific server to connect to.
Returns: Promise<string | null> - The server ID if connection was successful, otherwise null.

Usage:

const serverId = await agent.connectToMCPServer({ id: "runtimeServer", url: "ws://localhost:8081" });
if (serverId) {
    console.log(`Connected to ${serverId}`);
}

Disconnect from a MCP server

disconnectFromMCPServer(serverId: string): boolean

Description: Disconnects from a specific MCP server identified by its ID. Note: This does not automatically unregister the actions provided by that server.
Parameters:
- serverId (string): The ID of the server to disconnect from.
Returns: boolean - true if disconnection was successful or the server wasn't connected, false if an error occurred.

Usage:

const success = agent.disconnectFromMCPServer("server1");
console.log("Disconnected:", success);

Check if a MCP server is connected

isMCPServerConnected(serverId: string): boolean

Description: Checks if the agent is currently connected to a specific MCP server.
Parameters:
- serverId (string): The ID of the server to check.
Returns: boolean - true if connected, false otherwise.

Usage:

if (agent.isMCPServerConnected("server1")) {
    console.log("Server1 is connected.");
}

List MCP server ids

getMCPServerIds(): string[]

Description: Retrieves the IDs of all currently connected MCP servers.
Returns: string[] - An array of connected server IDs.

Usage:

const connectedServers = agent.getMCPServerIds();
console.log("Connected servers:", connectedServers);

Get MCP server info

getMCPServerInfo(): Array<{ id: string; toolCount: number; toolNames: string[]; }> | null

Description: Gets information about all connected MCP servers, including their IDs and the tools (actions) they provide.
Returns: Array<{ id: string; toolCount: number; toolNames: string[]; }> | null - An array of server info objects, or null if the MCP client isn't initialized.

Usage:

const serverInfo = agent.getMCPServerInfo();
if (serverInfo) {
    serverInfo.forEach(info => {
        console.log(`Server: ${info.id}, Tools: ${info.toolNames.join(', ')}`);
    });
}

Utility Methods

Pretty print contents of an action output

pprintAction(action: ActionType): string

Description: Generates a human-readable string representation of an agent action, if a pretty-print function is defined for that action type. Useful for logging or debugging.
Parameters:
- action (ActionType): The action object (containing type and params).
Returns: string - A formatted string representing the action, or an empty string if no specific pretty-print function exists for the action type.

Usage:

// Assuming 'lastAction' is an action object from a task step
console.log(agent.pprintAction(lastAction));
// Example output: "Clicked element with selector '#submit-button'"

PreviousAbout HyperAgent NextHyperAgent Types

Last updated 1 month ago

HyperAgent SDK

HyperAgent Class

The HyperAgent class provides an interface for running autonomous web agents. It manages browser instances, task execution, and integration with Model Context Protocol (MCP) servers.

Creating a HyperAgent Instance

import { HyperAgent } from "@/agent";
import { ChatOpenAI } from "@langchain/openai";

// Initialize with default settings (requires OPENAI_API_KEY env var)
const agent = new HyperAgent();

// Or, initialize with custom configuration
const agentWithConfig = new HyperAgent({
    llm: new ChatOpenAI({ 
        modelName: "gpt-4o",
        apiKey: process.env.OPENAI_API_KEY,
    }), // Specify the LLM
    browserProvider: "Hyperbrowser", // Can be Local or Hyperbrowser depending on the browser provider you want. Defaults to Local
    hyperbrowserConfig: { /* Hyperbrowser specific config if using Hyperbrowser browser provider */ },
    localConfig: { /* Playwright launch option if using the Local Browser Provider*/ }
    debug: true, // Enable debug logging
    customActions: [ /* Array of custom actions */ ],
});

Initializing

Description: Initializes a new instance of the HyperAgent. Configures the language model, browser provider (local or Hyperbrowser), debug mode, and custom actions.
Parameters:
- params (HyperAgentConfig, optional): Configuration object.
  - llm (BaseChatModel, optional): The language model instance to use. Defaults to using GPT-4o if OPENAI_API_KEY environment variable is set.
  - browserProvider ("Hyperbrowser" | "Local", optional): Specifies whether to use Hyperbrowser's cloud browsers or a local Playwright instance. Defaults to Local.
  - hyperbrowserConfig (object, optional): Configuration for the Hyperbrowser provider. See more details in the .
  - localConfig (object, optional): Configuration for the local browser provider. See more details in the
  - customActions (AgentActionDefinition[], optional): An array of custom actions to register with the agent. You can read more about custom actions .
  - debug (boolean, optional): Enables detailed logging if set to true. Defaults to false. Logs are dumped to ./debug directory.
Throws: HyperagentError if no LLM is provided and OPENAI_API_KEY env var is not set.

Browser and Page Management

List all current pages

async getPages(): Promise<HyperPage[]>

Description: Retrieves all currently open pages within the agent's browser context. Each page is enhanced with ai and aiAsync methods for task execution.
Returns: Promise<HyperPage[]> - An array of HyperPage objects.

Usage:

const pages = await agent.getPages();
if (pages.length > 0) {
    await pages[0].ai("Summarize the content of this page.");
}

Creating a newPage

async newPage(): Promise<HyperPage>

Description: Creates and returns a new page (tab) in the agent's browser context. The returned page is enhanced with ai and aiAsync methods.
Returns: Promise<HyperPage> - A new HyperPage object.

Usage:

const newPage = await agent.newPage();
await newPage.goto("https://example.com");
const summary = await newPage.ai("What is this website about?");
console.log(summary);

Get current page

async getCurrentPage(): Promise<Page>

Description: Gets the agent's currently active page. If no page exists or the current page is closed, it creates a new one. Note: This returns a standard Playwright Page object, not a HyperPage.
Returns: Promise<Page> - The current or a new Playwright Page.

Usage:

const currentPage = await agent.getCurrentPage();
await currentPage.goto("https://google.com");

Close agent

`async closeAgent(): Promise<void>`

Description: Closes the agent, including the browser instance, browser context, and any active MCP connections. Cancels any tasks that are still running or paused.
Returns: Promise<void>

Usage:

await agent.closeAgent();
console.log("Agent closed.");

Task Execution

Execute a task

async executeTask(task: string, params?: TaskParams, initPage?: Page): Promise<TaskOutput>

Description: Executes a given task instruction synchronously. The agent uses its LLM and configured actions to perform the task on a browser page. It waits for the task to complete (or fail) and returns the final output.
Parameters:
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<TaskOutput> - The result of the task, typically a string summary or structured data if outputSchema was provided.
Throws: Rethrows any error encountered during task execution.

Usage:

const page = await agent.newPage();
await page.goto("https://example.com");
const result = await agent.executeTask("Extract the main heading from www.example.com", { outputSchema: z.object({ heading: z.string() }) }, page);
console.log(result); // { heading: "Example Domain" }

Execute a task asynchronously

async executeTaskAsync(task: string, params?: TaskParams, initPage?: Page): Promise<Task>

Description: Executes a given task instruction asynchronously. It immediately returns a Task control object, allowing management (pause, resume, cancel) of the background task.
Parameters
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<Task> - A Task control object with methods:
- getStatus(): TaskStatus
- pause(): TaskStatus
- resume(): TaskStatus
- cancel(): TaskStatus

Usage:

const taskControl = await agent.executeTaskAsync("Go to news.google.com, and search for international news.");
console.log("Task started with status:", taskControl.getStatus());
// ... later
taskControl.pause();
// ... even later
taskControl.cancel();

Model Context Protocol (MCP) Integration

MCP allows extending the agent's capabilities by connecting to external servers that provide additional tools/actions.

Initialize a MCP client

async initializeMCPClient(config: MCPConfig): Promise<void>

Description: Initializes the MCP client and attempts to connect to all servers specified in the configuration. Actions provided by successfully connected servers are registered with the agent.
Parameters:
- config (MCPConfig): Configuration object containing an array of servers (each with connection details like URL and ID).
Returns: Promise<void>

Usage:

await agent.initializeMCPClient({
    servers: [
        { id: "server1", url: "ws://localhost:8080" },
        // ... other servers
    ]
});

Connect to a single MCP server

async connectToMCPServer(serverConfig: MCPServerConfig): Promise<string | null>

Description: Connects to a single MCP server at runtime. Registers actions provided by the server if the connection is successful.
Parameters:
- serverConfig (MCPServerConfig): Configuration for the specific server to connect to.
Returns: Promise<string | null> - The server ID if connection was successful, otherwise null.

Usage:

const serverId = await agent.connectToMCPServer({ id: "runtimeServer", url: "ws://localhost:8081" });
if (serverId) {
    console.log(`Connected to ${serverId}`);
}

Disconnect from a MCP server

disconnectFromMCPServer(serverId: string): boolean

Description: Disconnects from a specific MCP server identified by its ID. Note: This does not automatically unregister the actions provided by that server.
Parameters:
- serverId (string): The ID of the server to disconnect from.
Returns: boolean - true if disconnection was successful or the server wasn't connected, false if an error occurred.

Usage:

const success = agent.disconnectFromMCPServer("server1");
console.log("Disconnected:", success);

Check if a MCP server is connected

isMCPServerConnected(serverId: string): boolean

Description: Checks if the agent is currently connected to a specific MCP server.
Parameters:
- serverId (string): The ID of the server to check.
Returns: boolean - true if connected, false otherwise.

Usage:

if (agent.isMCPServerConnected("server1")) {
    console.log("Server1 is connected.");
}

List MCP server ids

getMCPServerIds(): string[]

Description: Retrieves the IDs of all currently connected MCP servers.
Returns: string[] - An array of connected server IDs.

Usage:

const connectedServers = agent.getMCPServerIds();
console.log("Connected servers:", connectedServers);

Get MCP server info

getMCPServerInfo(): Array<{ id: string; toolCount: number; toolNames: string[]; }> | null

Description: Gets information about all connected MCP servers, including their IDs and the tools (actions) they provide.
Returns: Array<{ id: string; toolCount: number; toolNames: string[]; }> | null - An array of server info objects, or null if the MCP client isn't initialized.

Usage:

const serverInfo = agent.getMCPServerInfo();
if (serverInfo) {
    serverInfo.forEach(info => {
        console.log(`Server: ${info.id}, Tools: ${info.toolNames.join(', ')}`);
    });
}

Utility Methods

Pretty print contents of an action output

pprintAction(action: ActionType): string

Description: Generates a human-readable string representation of an agent action, if a pretty-print function is defined for that action type. Useful for logging or debugging.
Parameters:
- action (ActionType): The action object (containing type and params).
Returns: string - A formatted string representing the action, or an empty string if no specific pretty-print function exists for the action type.

Usage:

// Assuming 'lastAction' is an action object from a task step
console.log(agent.pprintAction(lastAction));
// Example output: "Clicked element with selector '#submit-button'"

PreviousAbout HyperAgent NextHyperAgent Types

Last updated 1 month ago