This repository demonstrates advanced, agentic patterns built on top of the Realtime API, currently focusing on the implementation of a Conversation Assistant. The goal of this project is to help users participate more effectively in conversations. See the Product Requirements Document (PRD.md) for full details.
The core framework demonstrates:
- Sequential agent handoffs according to a defined agent graph (taking inspiration from OpenAI Swarm)
- Background escalation to more intelligent models like o1-mini for high-stakes decisions
You can use this repo to understand how to build complex, multi-agent realtime voice applications.
Note: The screenshot may reflect an earlier version of the UI.
- This is a Next.js typescript app
- Install dependencies with
npm i - Add your
OPENAI_API_KEYto your env - Start the server with
npm run devornpm run build-check - Open your browser to http://localhost:3000 to see the app. It should automatically connect to the
adhdAssistantAgent Set.
The primary agent configuration is now focused on the ADHD Assistant, located in src/app/agentConfigs/adhdAssistant/index.ts.
Previous examples like simpleExample, customerServiceRetail, and frontDeskAuthentication have been removed to streamline focus on the current project goal.
// Example snippet from src/app/agentConfigs/adhdAssistant/index.ts
import { AgentConfig } from "@/app/types";
import { injectTransferTools } from "../utils";
// Define agents for ADHD Assistant workflow
const monitorAgent: AgentConfig = {
// ... configuration ...
};
const responseAgent: AgentConfig = {
// ... configuration ...
};
// ... other agents ...
const agents = injectTransferTools([/* ...adhd agents... */]);
export default agents;- While the current focus is the ADHD assistant, the underlying framework is reusable. You can adapt the structure in
src/app/agentConfigs/to create new agent sets. Add your new config tosrc/app/agentConfigs/index.tsto make it selectable in the UI's "Scenario" dropdown. - For help creating prompts, refer to the metaprompt here or the Voice Agent Metaprompter GPT.
- Scenario/Agent Selection: Select the
adhdAssistantscenario (or others you add) and specific agents using the dropdowns. - Transcript: Located on the left, showing the conversation flow, including user/assistant messages, tool calls, and agent changes. Click to expand details.
- Agent Answers: A dedicated panel (center-right) now displays responses specifically from the
responseAgentfor clarity. This panel can be collapsed/expanded. - Dashboard: The right panel, formerly "Logs", now serves as a Dashboard, displaying detailed client/server events, potentially including token usage, agent steps, and timing information in the future. Click events to see the full payload. This panel can be collapsed/expanded.
- Bottom Toolbar: Controls for connecting/disconnecting, muting/unmuting the microphone (Push-to-talk has been removed), and toggling the visibility of the Agent Answers and Dashboard panels.
This section tracks the progress towards implementing the features outlined in the PRD.md.
| Feature Category | Feature | PRD Phase | Status | Notes |
|---|---|---|---|---|
| Core Infrastructure | Base Project Setup (Next.js, Deps) | - | Done | Existing framework reused. react-draggable added. |
| Agent Configuration Framework | 1 | Done | AgentConfig structure exists, ADHD config scaffolded. |
|
| Basic UI Shell | 1 | Done | Transcript, Events panel structure exists. | |
| Audio Processing (Basic Input) | 1 | In Progress | Basic connection (realtimeConnection) exists, returns audioTrack. |
|
| Speaker Identification Pipeline | 1 | To Do | Implementing via dual-microphone input channels as per PRD AudioConfig. |
|
| Context Management (Basic Storage) | 1 | In Progress | TranscriptContext exists, agentName added. Needs speaker separation. |
|
| Question Detection System | 1 | To Do | Not yet implemented. | |
| Basic Response Generation Logic | 1 | To Do | Agent structure exists, but specific ADHD response logic TBD. | |
| Response Enhancement | Advanced Response Generation (LLM) | 2 | To Do | Requires LLM integration beyond basic agent calls for context/history use. |
| Interruption Handling | 2 | To Do | Not yet implemented. | |
| Context Relevance Scoring | 2 | To Do | Not yet implemented. | |
| UI & UX | Configuration Interface | - | To Do | Audio source selection, etc., not built. |
| Conversation View (Speaker Differentiation) | 1/2 | To Do | Basic transcript exists, needs speaker labels/styling per PRD. | |
Response Display (AgentAnswers panel) |
2 | Done | New AgentAnswers.tsx component added and integrated. |
|
Dashboard Panel (Dashboard component) |
- | Done | New Dashboard.tsx component added, replacing Logs panel. |
|
| Bottom Toolbar Enhancements | - | Done | Mic mute, panel toggles added. PTT removed. | |
| Project Management | Removal of Old Scenarios | - | Done | customerServiceRetail, frontDeskAuthentication removed. |
| PRD Definition | - | Done | PRD.md created. |
|
| Optimization & Testing | Performance Optimization | 3 | To Do | |
| User Testing | 2/3 | To Do |