Skip to content

Commit 66a31c1

Browse files
authored
Adding ElatoAI project — OpenAI Realtime API Speech on ESP32 devices with Arduino and Deno Edge Functions (#1808)
1 parent 22a8c6c commit 66a31c1

File tree

6 files changed

+286
-0
lines changed

6 files changed

+286
-0
lines changed

articles/related_resources.md

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ People are writing great tools and papers for improving outputs from GPT. Here a
77
- [Arthur Shield](https://www.arthur.ai/get-started): A paid product for detecting toxicity, hallucination, prompt injection, etc.
88
- [Baserun](https://baserun.ai/): A paid product for testing, debugging, and monitoring LLM-based apps
99
- [Chainlit](https://docs.chainlit.io/overview): A Python library for making chatbot interfaces.
10+
- [ElatoAI](https://github.com/akdeb/ElatoAI): A platform for running OpenAI Realtime API Speech on ESP32 on Arduino using Deno Edge Runtime and Supabase.
1011
- [Embedchain](https://github.com/embedchain/embedchain): A Python library for managing and syncing unstructured data with LLMs.
1112
- [FLAML (A Fast Library for Automated Machine Learning & Tuning)](https://microsoft.github.io/FLAML/docs/Getting-Started/): A Python library for automating selection of models, hyperparameters, and other tunable choices.
1213
- [Guidance](https://github.com/microsoft/guidance): A handy looking Python library from Microsoft that uses Handlebars templating to interleave generation, prompting, and logical control.

authors.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,11 @@ msingh-openai:
248248
website: "https://github.com/msingh-openai"
249249
avatar: "https://avatars.githubusercontent.com/u/168678187?v=4"
250250

251+
akashdeepdeb:
252+
name: "Akashdeep Deb"
253+
website: "https://github.com/akdeb"
254+
avatar: "https://avatars.githubusercontent.com/u/20175219"
255+
251256
ted-at-openai:
252257
name: "Ted Sanders"
253258
website: "https://github.com/ted-at-openai"
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
<img src="arduino_ai_speech_assets/elato-alien.png" alt="Elato Logo" width="100%">
2+
3+
# 👾 ElatoAI: Running OpenAI Realtime API Speech on ESP32 on Arduino with Deno Edge Functions
4+
5+
This guide shows how to build a AI voice agent device with Realtime AI Speech powered by OpenAI Realtime API, ESP32, Secure WebSockets, and Deno Edge Functions for >10-minute uninterrupted global conversations.
6+
7+
An active version of this README is available at [ElatoAI](https://github.com/akdeb/ElatoAI).
8+
9+
<div align="center">
10+
11+
[![Discord Follow](https://dcbadge.vercel.app/api/server/KJWxDPBRUj?style=flat)](https://discord.gg/KJWxDPBRUj)
12+
[![License: MIT](https://img.shields.io/badge/license-MIT-blue)](https://www.gnu.org/licenses/gpl-3.0.en.html)&ensp;&ensp;&ensp;
13+
![Node.js](https://img.shields.io/badge/Node.js-22.13.0-yellow.svg)
14+
![Next.js](https://img.shields.io/badge/Next.js-14.2.7-brightgreen.svg)
15+
![React](https://img.shields.io/badge/React-18.2.0-blue.svg)
16+
17+
</div>
18+
19+
## Demo Video
20+
21+
https://github.com/user-attachments/assets/aa60e54c-5847-4a68-80b5-5d6b1a5b9328
22+
23+
<div align="center">
24+
<a href="https://www.youtube.com/watch?v=o1eIAwVll5I">
25+
<img src="https://img.shields.io/badge/Watch%20Demo-YouTube-red?style=for-the-badge&logo=youtube" alt="Watch Demo on YouTube">
26+
</a>
27+
</div>
28+
29+
## Hardware Design
30+
31+
The reference implementation uses an ESP32-S3 microcontroller with minimal additional components:
32+
33+
<img src="arduino_ai_speech_assets/pcb-design.png" alt="Hardware Setup" width="100%">
34+
35+
**Required Components:**
36+
- ESP32-S3 development board
37+
- I2S microphone (e.g., INMP441)
38+
- I2S amplifier and speaker (e.g., MAX98357A)
39+
- Push button to start/stop the conversation
40+
- RGB LED for visual feedback
41+
- Optional: touch sensor for alternative control
42+
43+
**Optional hardware:**
44+
A fully assembled PCB and device is available in the [ElatoAI store](https://www.elatoai.com/products).
45+
46+
## 🚀 Quick Start Guide
47+
48+
<a href="https://www.youtube.com/watch?v=bXrNRpGOJWw">
49+
<img src="https://img.shields.io/badge/Quickstart%20Tutorial-YouTube-yellow?style=for-the-badge&logo=youtube" alt="Watch Demo on YouTube">
50+
</a>
51+
52+
1. **Clone the repository**
53+
54+
Head over to the [ElatoAI GitHub repository](https://github.com/akdeb/ElatoAI) and clone the repository.
55+
56+
```bash
57+
git clone https://github.com/akdeb/ElatoAI.git
58+
cd ElatoAI
59+
```
60+
61+
2. **Set your environment variables (OPENAI_API_KEY, SUPABASE_ANON_KEY)**
62+
63+
In the `frontend-nextjs` directory, create a `.env.local` file and set your environment variables.
64+
65+
```bash
66+
cd frontend-nextjs
67+
cp .env.example .env.local
68+
69+
# In .env.local, set your environment variables
70+
# NEXT_PUBLIC_SUPABASE_ANON_KEY=<your-supabase-anon-key>
71+
# OPENAI_API_KEY=<your-openai-api-key>
72+
```
73+
74+
In the `server-deno` directory, create a `.env` file and set your environment variables.
75+
76+
```bash
77+
cd server-deno
78+
cp .env.example .env
79+
80+
# In .env, set your environment variables
81+
# SUPABASE_KEY=<your-supabase-anon-key>
82+
# OPENAI_API_KEY=<your-openai-api-key>
83+
```
84+
85+
2. **Start Supabase**
86+
87+
Install [Supabase CLI](https://supabase.com/docs/guides/local-development/cli/getting-started) and set up your Local Supabase Backend. From the root directory, run:
88+
```bash
89+
brew install supabase/tap/supabase
90+
supabase start # Starts your local Supabase server with the default migrations and seed data.
91+
```
92+
93+
3. **Set up your NextJS Frontend**
94+
95+
([See the Frontend README](https://github.com/akdeb/ElatoAI/tree/main/frontend-nextjs/README.md))
96+
97+
From the `frontend-nextjs` directory, run the following commands. (**Login creds:** Email: `[email protected]`, Password: `admin`)
98+
```bash
99+
cd frontend-nextjs
100+
npm install
101+
102+
# Run the development server
103+
npm run dev
104+
```
105+
106+
4. **Start the Deno server**
107+
108+
([See the Deno server README](https://github.com/akdeb/ElatoAI/tree/main/server-deno/README.md))
109+
```bash
110+
# Navigate to the server directory
111+
cd server-deno
112+
113+
# Run the server at port 8000
114+
deno run -A --env-file=.env main.ts
115+
```
116+
117+
5. **Setup the ESP32 Device firmware**
118+
119+
([See the ESP32 Device README](https://github.com/akdeb/ElatoAI/tree/main/firmware-arduino/README.md))
120+
121+
In `Config.cpp` set `ws_server` and `backend_server` to your local IP address. Run `ifconfig` in your console and find `en0` -> `inet` -> `192.168.1.100` (it may be different for your Wifi network). This tells the ESP32 device to connect to your NextJS frontend and Deno server running on your local machine. All services should be on the same Wifi network.
122+
123+
6. **Setup the ESP32 Device Wifi**
124+
125+
Build and upload the firmware to your ESP32 device. The ESP32 should open an `ELATO-DEVICE` captive portal to connect to Wifi. Connect to it and go to `http://192.168.4.1` to configure the device wifi.
126+
127+
7. Once your Wifi credentials are configured, turn the device OFF and ON again and it should connect to your Wifi and your server.
128+
129+
8. Now you can talk to your AI Character!
130+
131+
## 🚀 Ready to Launch?
132+
133+
1. Register your device by adding your ESP32 Device's MAC Address and a unique user code to the `devices` table in Supabase.
134+
> **Pro Tip:** To find your ESP32-S3 Device's MAC Address, build and upload `test/print_mac_address_test.cpp` using PlatformIO and view the serial monitor.
135+
136+
137+
2. On your frontend client in the [Settings page](http://localhost:3000/home/settings), add the unique user code so that the device is linked to your account in Supabase.
138+
139+
140+
3. If you're testing locally, you can keep enabled the `DEV_MODE` macro in `firmware-arduino/Config.h` and the Deno server env variable to use your local IP addresses for testing.
141+
142+
143+
4. Now you can register multiple devices to your account by repeating the process above.
144+
145+
## Project Architecture
146+
147+
ElatoAI consists of three main components:
148+
149+
1. **Frontend Client** (`Next.js` hosted on Vercel) - to create and talk to your AI agents and 'send' it to your ESP32 device
150+
2. **Edge Server Functions** (`Deno` running on Deno/Supabase Edge) - to handle the websocket connections from the ESP32 device and the OpenAI API calls
151+
3. **ESP32 IoT Client** (`PlatformIO/Arduino`) - to receive the websocket connections from the Edge Server Functions and send audio to the OpenAI API via the Deno edge server.
152+
153+
154+
## 🌟 Key Features
155+
156+
1. **Realtime Speech-to-Speech**: Instant speech conversion powered by OpenAI's Realtime APIs.
157+
2. **Create Custom AI Agents**: Create custom agents with different personalities and voices.
158+
3. **Customizable Voices**: Choose from a variety of voices and personalities.
159+
4. **Secure WebSockets**: Reliable, encrypted WebSocket communication.
160+
5. **Server VAD Turn Detection**: Intelligent conversation flow handling for smooth interactions.
161+
6. **Opus Audio Compression**: High-quality audio streaming with minimal bandwidth.
162+
7. **Global Edge Performance**: Low latency Deno Edge Functions ensuring seamless global conversations.
163+
8. **ESP32 Arduino Framework**: Optimized and easy-to-use hardware integration.
164+
9. **Conversation History**: View your conversation history.
165+
10. **Device Management and Authentication**: Register and manage your devices.
166+
11. **User Authentication**: Secure user authentication and authorization.
167+
12. **Conversations with WebRTC and Websockets**: Talk to your AI with WebRTC on the NextJS webapp and with websockets on the ESP32.
168+
13. **Volume Control**: Control the volume of the ESP32 speaker from the NextJS webapp.
169+
14. **Realtime Transcripts**: The realtime transcripts of your conversations are stored in the Supabase DB.
170+
15. **OTA Updates**: Over the Air Updates for the ESP32 firmware.
171+
16. **Wifi Management with captive portal**: Connect to your Wifi network from the ESP32 device.
172+
17. **Factory Reset**: Factory reset the ESP32 device from the NextJS webapp.
173+
18. **Button and Touch Support**: Use the button OR touch sensor to control the ESP32 device.
174+
19. **No PSRAM Required**: The ESP32 device does not require PSRAM to run the speech to speech AI.
175+
20. **OAuth for Web client**: OAuth for your users to manage their AI characters and devices.
176+
177+
## 🛠 Tech Stack
178+
179+
| Component | Technology Used |
180+
|-----------------|------------------------------------------|
181+
| Frontend | Next.js, Vercel |
182+
| Backend | Supabase DB |
183+
| Edge Functions | Edge Functions on Deno / Supabase Edge Runtime |
184+
| IoT Client | PlatformIO, Arduino Framework, ESP32-S3 |
185+
| Audio Codec | Opus |
186+
| Communication | Secure WebSockets |
187+
| Libraries | ArduinoJson, WebSockets, AsyncWebServer, ESP32_Button, Arduino Audio Tools, ArduinoLibOpus |
188+
189+
## 📈 Core Use Cases
190+
191+
We have a [Usecases.md](https://github.com/akdeb/ElatoAI/tree/main/Usecases.md) file that outlines the core use cases for the [Elato AI device](https://www.elatoai.com/products) or any other custom conversational AI device.
192+
193+
## 🗺️ High-Level Flow
194+
195+
```mermaid
196+
flowchart TD
197+
User[User Speech] --> ESP32
198+
ESP32[ESP32 Device] -->|WebSocket| Edge[Deno Edge Function]
199+
Edge -->|OpenAI API| OpenAI[OpenAI Realtime API]
200+
OpenAI --> Edge
201+
Edge -->|WebSocket| ESP32
202+
ESP32 --> User[AI Generated Speech]
203+
```
204+
205+
## Project Structure
206+
207+
```mermaid
208+
graph TD
209+
repo[ElatoAI]
210+
repo --> frontend[Frontend Vercel NextJS]
211+
repo --> deno[Deno Edge Function]
212+
repo --> esp32[ESP32 Arduino Client]
213+
deno --> supabase[Supabase DB]
214+
215+
frontend --> supabase
216+
esp32 --> websockets[Secure WebSockets]
217+
esp32 --> opus[Opus Codec]
218+
esp32 --> audio_tools[arduino-audio-tools]
219+
esp32 --> libopus[arduino-libopus]
220+
esp32 --> ESPAsyncWebServer[ESPAsyncWebServer]
221+
```
222+
223+
## ⚙️ PlatformIO Configuration
224+
225+
```ini
226+
[env:esp32-s3-devkitc-1]
227+
platform = espressif32 @ 6.10.0
228+
board = esp32-s3-devkitc-1
229+
framework = arduino
230+
monitor_speed = 115200
231+
232+
lib_deps =
233+
bblanchon/ArduinoJson@^7.1.0
234+
links2004/WebSockets@^2.4.1
235+
ESP32Async/ESPAsyncWebServer@^3.7.6
236+
https://github.com/esp-arduino-libs/ESP32_Button.git#v0.0.1
237+
https://github.com/pschatzmann/arduino-audio-tools.git#v1.0.1
238+
https://github.com/pschatzmann/arduino-libopus.git#a1.1.0
239+
```
240+
241+
## 📊 Important Stats
242+
243+
- ⚡️ **Latency**: <2s round-trip globally
244+
- 🎧 **Audio Quality**: Opus codec at bitrate 12kbps (high clarity)
245+
-**Uninterrupted Conversations**: Up to 10 minutes continuous conversations
246+
- 🌎 **Global Availability**: Optimized with edge computing with Deno
247+
248+
## 🛡 Security
249+
250+
- Secure WebSockets (WSS) for encrypted data transfers
251+
- Optional: API Key encryption with 256-bit AES
252+
- Supabase DB for secure authentication
253+
- Supabase RLS for all tables
254+
255+
## 🚫 Limitations
256+
- 3-4s Cold start time while connecting to edge server
257+
- Limited to upto 10 minutes of uninterrupted conversations
258+
- Edge server stops when wall clock time is exceeded
259+
- No speech interruption detection on ESP32
260+
261+
## License
262+
263+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
264+
265+
---
266+
267+
**If you find this project interesting or useful, drop a GitHub ⭐️ at [ElatoAI](https://github.com/akdeb/ElatoAI). It helps a lot!**

registry.yaml

+13
Original file line numberDiff line numberDiff line change
@@ -1047,6 +1047,19 @@
10471047
tags:
10481048
- embeddings
10491049

1050+
- title: ElatoAI - Realtime Speech AI Agents for ESP32 on Arduino
1051+
path: examples/voice_solutions/running_realtime_api_speech_on_esp32_arduino_edge_runtime_elatoai.md
1052+
date: 2025-05-01
1053+
authors:
1054+
- akashdeepdeb
1055+
tags:
1056+
- realtime-api
1057+
- speech
1058+
- audio
1059+
- esp32
1060+
- iot
1061+
- arduino
1062+
10501063
- title: Related resources from around the web
10511064
path: articles/related_resources.md
10521065
redirects:

0 commit comments

Comments
 (0)