This repository provides quickstart examples for using the Gemini Multimodal Live API, showcasing how to interact with Gemini using various input and output modalities.
Note: This is an experimental API that may change in the future. The implementation uses API version 'v1beta1' for VertexAI and 'v1alpha' for Gemini Developer API (Google AI Studio). Documentation from Google is still evolving.
- Type messages in your terminal
- Listen to Gemini's responses through your speakers
- Simple command-line interface
- Type
exit
to quit.
- Speak directly to Gemini using your microphone
- Receive audio responses through your speakers
- Real-time voice conversation
- Press
ctrl+c
to quit.
google-genai
pyaudio
Python 3.11+ is required for the voice-to-voice example, as it uses asyncio.TaskGroup
for concurrent task management.
- Clone the repository:
git clone https://github.com/ontaptom/multimodal-live-api.git
cd multimodal-live-api
- Install dependencies:
pip install -r requirements.txt
- Choose your authentication method:
- Visit aistudio.google.com to get your API key
- Set the environment variable:
export GOOGLE_API_KEY=your_api_key
- In both scripts, set
use_vertexai = False
- Ensure you have a Google Cloud Project set up
- Set
use_vertexai = True
in both scripts - Update the
PROJECT_ID
variable with your project ID - Ensure you have the necessary permissions and have enabled the Vertex AI API
python text-to-audio-liveapi.py
- Type your message at the "You: " prompt
- Listen to Gemini's response
- Type "quit" to exit
python audio-to-audio-liveapi.py
- Start speaking into your microphone
- Listen to Gemini's responses
- Press Ctrl+C to exit
💡 Headphones are recommended when using the voice-to-voice chat to prevent audio feedback loops.
The API is still not perfectly documented by Google, and there are important differences in how you must configure the client depending on which authentication method you use:
- VertexAI uses
v1beta1
- Google AI Studio uses
v1alpha
- VertexAI:
"gemini-2.0-flash-exp"
(just the model name) - at least inv1beta1
API version - Google AI Studio:
"models/gemini-2.0-flash-exp"
(requires the "models/" prefix) - at least inv1alpha
API version
- VertexAI: Uses
LiveConnectConfig
object withModality
enum - at least inv1beta1
API version - Google AI Studio: Uses dictionary format with nested "generation_config" object - at least in
v1alpha
API version
Here's how this looks in code:
# For VertexAI:
client = genai.Client(
vertexai=True,
project=PROJECT_ID,
location='us-central1',
http_options=HttpOptions(api_version="v1beta1")
)
MODEL = "gemini-2.0-flash-exp" # Just the model name
CONFIG = LiveConnectConfig(response_modalities=[Modality.AUDIO])
# For Google AI Studio:
client = genai.Client(
http_options={"api_version": "v1alpha"}
)
MODEL = "models/gemini-2.0-flash-exp" # Note the "models/" prefix
CONFIG = {"generation_config": {"response_modalities": ["AUDIO"]}}
These differences are critical for successful operation with either authentication method.
Copyright 2025
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.