Multimodal Live API with Gemini

This repository provides quickstart examples for using the Gemini Multimodal Live API, showcasing how to interact with Gemini using various input and output modalities.

Note: This is an experimental API that may change in the future. The implementation uses API version 'v1beta1' for VertexAI and 'v1alpha' for Gemini Developer API (Google AI Studio). Documentation from Google is still evolving.

Examples

1. Text-to-Audio Chat (text-to-audio-liveapi.py)

Type messages in your terminal
Listen to Gemini's responses through your speakers
Simple command-line interface
Type exit to quit.

2. Voice-to-Voice Chat (audio-to-audio-liveapi.py)

Speak directly to Gemini using your microphone
Receive audio responses through your speakers
Real-time voice conversation
Press ctrl+c to quit.

Requirements

google-genai
pyaudio

Python 3.11+ is required for the voice-to-voice example, as it uses asyncio.TaskGroup for concurrent task management.

Setup

Clone the repository:

git clone https://github.com/ontaptom/multimodal-live-api.git
cd multimodal-live-api

Install dependencies:

pip install -r requirements.txt

Choose your authentication method:

Option A: Google AI Studio API Key

Visit aistudio.google.com to get your API key
Set the environment variable: export GOOGLE_API_KEY=your_api_key
In both scripts, set use_vertexai = False

Option B: Vertex AI

Ensure you have a Google Cloud Project set up
Set use_vertexai = True in both scripts
Update the PROJECT_ID variable with your project ID
Ensure you have the necessary permissions and have enabled the Vertex AI API

Usage

Text-to-Audio Chat:

python text-to-audio-liveapi.py

Type your message at the "You: " prompt
Listen to Gemini's response
Type "quit" to exit

Voice-to-Voice Chat:

python audio-to-audio-liveapi.py

Start speaking into your microphone
Listen to Gemini's responses
Press Ctrl+C to exit

💡 Headphones are recommended when using the voice-to-voice chat to prevent audio feedback loops.

Implementation Details

The API is still not perfectly documented by Google, and there are important differences in how you must configure the client depending on which authentication method you use:

API Version Differences

VertexAI uses v1beta1
Google AI Studio uses v1alpha

Model Name Format

VertexAI: "gemini-2.0-flash-exp" (just the model name) - at least in v1beta1 API version
Google AI Studio: "models/gemini-2.0-flash-exp" (requires the "models/" prefix) - at least in v1alpha API version

Configuration Format

VertexAI: Uses LiveConnectConfig object with Modality enum - at least in v1beta1 API version
Google AI Studio: Uses dictionary format with nested "generation_config" object - at least in v1alpha API version

Here's how this looks in code:

# For VertexAI:
client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location='us-central1',
    http_options=HttpOptions(api_version="v1beta1")
)
MODEL = "gemini-2.0-flash-exp"  # Just the model name
CONFIG = LiveConnectConfig(response_modalities=[Modality.AUDIO])

# For Google AI Studio:
client = genai.Client(
    http_options={"api_version": "v1alpha"}
)
MODEL = "models/gemini-2.0-flash-exp"  # Note the "models/" prefix
CONFIG = {"generation_config": {"response_modalities": ["AUDIO"]}}

These differences are critical for successful operation with either authentication method.

License

Copyright 2025

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Live API with Gemini

Examples

1. Text-to-Audio Chat (text-to-audio-liveapi.py)

2. Voice-to-Voice Chat (audio-to-audio-liveapi.py)

Requirements

Setup

Option A: Google AI Studio API Key

Option B: Vertex AI

Usage

Text-to-Audio Chat:

Voice-to-Voice Chat:

Implementation Details

API Version Differences

Model Name Format

Configuration Format

License

Google Cloud credits are provided for this project. #VertexAISprint

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
audio-to-audio-liveapi.py		audio-to-audio-liveapi.py
requirements.txt		requirements.txt
text-to-audio-liveapi.py		text-to-audio-liveapi.py

ontaptom/multimodal-live-api

Folders and files

Latest commit

History

Repository files navigation

Multimodal Live API with Gemini

Examples

1. Text-to-Audio Chat (text-to-audio-liveapi.py)

2. Voice-to-Voice Chat (audio-to-audio-liveapi.py)

Requirements

Setup

Option A: Google AI Studio API Key

Option B: Vertex AI

Usage

Text-to-Audio Chat:

Voice-to-Voice Chat:

Implementation Details

API Version Differences

Model Name Format

Configuration Format

License

Google Cloud credits are provided for this project. #VertexAISprint

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages