Skip to content

ontaptom/multimodal-live-api

Repository files navigation

Multimodal Live API with Gemini

This repository provides quickstart examples for using the Gemini Multimodal Live API, showcasing how to interact with Gemini using various input and output modalities.

Note: This is an experimental API that may change in the future. The implementation uses API version 'v1beta1' for VertexAI and 'v1alpha' for Gemini Developer API (Google AI Studio). Documentation from Google is still evolving.

Examples

1. Text-to-Audio Chat (text-to-audio-liveapi.py)

  • Type messages in your terminal
  • Listen to Gemini's responses through your speakers
  • Simple command-line interface
  • Type exit to quit.

2. Voice-to-Voice Chat (audio-to-audio-liveapi.py)

  • Speak directly to Gemini using your microphone
  • Receive audio responses through your speakers
  • Real-time voice conversation
  • Press ctrl+c to quit.

Requirements

google-genai
pyaudio

Python 3.11+ is required for the voice-to-voice example, as it uses asyncio.TaskGroup for concurrent task management.

Setup

  1. Clone the repository:
git clone https://github.com/ontaptom/multimodal-live-api.git
cd multimodal-live-api
  1. Install dependencies:
pip install -r requirements.txt
  1. Choose your authentication method:

Option A: Google AI Studio API Key

  • Visit aistudio.google.com to get your API key
  • Set the environment variable: export GOOGLE_API_KEY=your_api_key
  • In both scripts, set use_vertexai = False

Option B: Vertex AI

  • Ensure you have a Google Cloud Project set up
  • Set use_vertexai = True in both scripts
  • Update the PROJECT_ID variable with your project ID
  • Ensure you have the necessary permissions and have enabled the Vertex AI API

Usage

Text-to-Audio Chat:

python text-to-audio-liveapi.py
  • Type your message at the "You: " prompt
  • Listen to Gemini's response
  • Type "quit" to exit

Voice-to-Voice Chat:

python audio-to-audio-liveapi.py
  • Start speaking into your microphone
  • Listen to Gemini's responses
  • Press Ctrl+C to exit

💡 Headphones are recommended when using the voice-to-voice chat to prevent audio feedback loops.

Implementation Details

The API is still not perfectly documented by Google, and there are important differences in how you must configure the client depending on which authentication method you use:

API Version Differences

  • VertexAI uses v1beta1
  • Google AI Studio uses v1alpha

Model Name Format

  • VertexAI: "gemini-2.0-flash-exp" (just the model name) - at least in v1beta1 API version
  • Google AI Studio: "models/gemini-2.0-flash-exp" (requires the "models/" prefix) - at least in v1alpha API version

Configuration Format

  • VertexAI: Uses LiveConnectConfig object with Modality enum - at least in v1beta1 API version
  • Google AI Studio: Uses dictionary format with nested "generation_config" object - at least in v1alpha API version

Here's how this looks in code:

# For VertexAI:
client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location='us-central1',
    http_options=HttpOptions(api_version="v1beta1")
)
MODEL = "gemini-2.0-flash-exp"  # Just the model name
CONFIG = LiveConnectConfig(response_modalities=[Modality.AUDIO])

# For Google AI Studio:
client = genai.Client(
    http_options={"api_version": "v1alpha"}
)
MODEL = "models/gemini-2.0-flash-exp"  # Note the "models/" prefix
CONFIG = {"generation_config": {"response_modalities": ["AUDIO"]}}

These differences are critical for successful operation with either authentication method.

License

Copyright 2025

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Google Cloud credits are provided for this project. #VertexAISprint

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages