![]() |
![]() |
![]() |
|
![]() |
Generating text, summarizing, and analysing content are just some of the tasks you can accomplish with Gemma open models. This tutorial shows you how to get started running Gemma using Hugging Face Transformers using both text and image input to generate text content. The Transformers Python library provides a API for accessing pre-trained generative AI models, including Gemma. For more information, see the Transformers documentation.
Setup
Before starting this tutorial, complete the following steps:
- Get access to Gemma by logging into Hugging Face and selecting Acknowledge license for a Gemma model.
- Select a Colab runtime with sufficient resources to run the Gemma model size you want to run. Learn more.
- Generate a Hugging Face Access Token and add it to your Colab environment.
Configure Access Token
Add your access token to Colab to enable downloading of Gemma models from the Hugging Face web site. Use the Colab Secrets feature to securely save your token without adding it to your working code.
To add your Hugging Face Access Token as a Secret:
- Open the secrets tab by selecting the key icon on left side of the interface, or select Tools > Command pallete, type
secrets
, and press Enter. - Select Add new secret to add a new secret entry.
- In the Name field, enter
HF_TOKEN
. - In the Value field, enter the text of your Hugging Face Access Token.
- In the Notebook access field, select the switch to enable access.
Once you have entered your Access Token as HF_TOKEN
and value, you can access and set it within your Colab notebook environment using the following code:
from google.colab import userdata
from huggingface_hub import login
# Login into Hugging Face Hub
hf_token = userdata.get('HF_TOKEN') # If you are running inside a Google Colab
login(hf_token)
Install Python packages
Install the Hugging Face libraries required for running the Gemma model and making requests.
# Install Pytorch & other libraries
%pip install "torch>=2.4.0"
# Install a transformers version that supports Gemma 3 (>= 4.51.3)
%pip install "transformers>=4.51.3"
Generate text from text
Prompting a Gemma model with text to get a text response is the simplest way to use Gemma and works with nearly all Gemma variants. This section shows how to use the Hugging Face Transformers library load and configure a Gemma model for text to text generation.
Load model
Use the torch
and transformers
libraries to create an instance of a model execution pipeline
class with Gemma. When using a model for generating output or following directions, select an instruction tuned (IT) model, which typically has it
in the model ID string. Using the pipeline
object, you specify the Gemma variant you want to use, the type of task you want to perform, specifically "text-generation"
for text-to-text generation, as shown in the following code example:
import torch
from transformers import pipeline
pipeline = pipeline(
task="text-generation",
model="google/gemma-3-4b-it",
device=0, # "cuda" for Colab, "msu" for iOS devices
torch_dtype=torch.bfloat16
)
Gemma supports only a few task
settings for generation. For more information on the available task
settings, see the Hugging Face Pipelines task() documentation. Use the torch data type torch.bfloat16
to reduce the precision of the model and compute resources needed, without significantly impacting the output quality of the model. For the device
setting, you can use "cuda"
for Colab, or "msu"
for iOS devices, or just set this to 0
(zero) to specify the first GPU on your system. For more information about using the Pipeline class, see the Hugging Face Pipelines documentation.
Run text generation
Once you have the Gemma model loaded and configured in a pipeline
object, you can send prompts to the model. The following example code shows a basic request using the text_inputs
parameter:
pipeline(text_inputs="roses are red")
[{'generated_text': 'roses are red, violets are blue, \ni love you more than you ever knew.\n\n**Explanation'}]
Use a prompt template
When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as user
or model
, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:
messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "Roses are red..."},]
},
],
]
pipeline(messages, max_new_tokens=50)
Generate text from image data
Starting with Gemma 3, for model sizes 4B and higher, you can use image data as part of your prompt. This section shows how to use the Transformers library to load and configure a Gemma model to use image data and text input to generate text output.
Load model
When loading a Gemma model for use with image data, you configure the Transformer pipeline
instance specifically for use with images. In particular, you must select a pipeline configuration that can handle visual data by setting the task
parameter to "image-text-to-text"
, as shown in the following code example:
import torch
from transformers import pipeline
pipeline = pipeline(
task="image-text-to-text", # required for image input
model="google/gemma-3-4b-it",
device=0,
torch_dtype=torch.bfloat16
)
Run text generation
Once you have the Gemma model configured to handle image input with a pipeline
instance, you can send prompts with images to the model. Use the <start_of_image>
token to add the image to the text of your prompt. The following example code shows a basic request using the pipeline
parameter:
pipeline(
"https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg",
text="<start_of_image> What is shown in this image?"
)
[{'input_text': '<start_of_image> What is shown in this image?', 'generated_text': '<start_of_image> What is shown in this image?\n\nThis image showcases a traditional Indian Thali. A Thali is a platter that contains a variety'}]
Use a prompt template
When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as user
or model
, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
{"type": "text", "text": "What is shown in this image?"},
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "This image shows"},
],
},
]
pipeline(text=messages, max_new_tokens=50, return_full_text=False)
You can include multiple images in your prompt by including additional "type": "image",
entries in the content
list.
Next steps
Build and explore more with Gemma models: