A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client. This project provides real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities.
This is a simplified version of Google's original React implementation, created in response to this issue.
- Real-time text chat with Gemini API
- Audio input/output with visualization
- Motion-detected video streaming
- Screen sharing capabilities
- Function calling support
- Built with vanilla JavaScript (no dependencies)
- Modern web browser with WebRTC, WebSocket, and Web Audio API support
- Google AI Studio API key
- Python 3.0+ OR
npx http-server
(for local development server)
-
Clone the repository
-
Set up your API key:
cp js/config/config.example.js js/config/config.js # Edit js/config/config.js with your API key
-
Start the development server:
python -m http.server 8000
or
npx http-server 8000
-
Access the application at
http://localhost:8000
├── js/
│ ├── audio/ # Audio processing and management
│ ├── config/ # Configuration files
│ ├── core/ # Core functionality (WebSocket, worklets)
│ ├── tools/ # Function calling implementations
│ ├── utils/ # Utility functions
│ ├── video/ # Video and screen sharing
│ └── main.js # Application entry point
├── css/ # Styling
└── index.html # Main HTML file
- Click "Connect" to establish API connection
- Use the interface to:
- Send text messages
- Toggle microphone for audio input
- Enable webcam for video streaming
- Share your screen
- Monitor the logs panel for real-time feedback
Custom tools can be added to extend functionality. See js/tools/README.md
for implementation details.
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the MIT License.