Run Gemma with Hugging Face Transformers

View on ai.google.dev Run in Google Colab Run in Kaggle Open in Vertex AI View source on GitHub

Generating text, summarizing, and analysing content are just some of the tasks you can accomplish with Gemma open models. This tutorial shows you how to get started running Gemma using Hugging Face Transformers using both text and image input to generate text content. The Transformers Python library provides a API for accessing pre-trained generative AI models, including Gemma. For more information, see the Transformers documentation.

Setup

Before starting this tutorial, complete the following steps:

  • Get access to Gemma by logging into Hugging Face and selecting Acknowledge license for a Gemma model.
  • Select a Colab runtime with sufficient resources to run the Gemma model size you want to run. Learn more.
  • Generate a Hugging Face Access Token and add it to your Colab environment.

Configure Access Token

Add your access token to Colab to enable downloading of Gemma models from the Hugging Face web site. Use the Colab Secrets feature to securely save your token without adding it to your working code.

To add your Hugging Face Access Token as a Secret:

  1. Open the secrets tab by selecting the key icon on left side of the interface, or select Tools > Command pallete, type secrets, and press Enter.
  2. Select Add new secret to add a new secret entry.
  3. In the Name field, enter HF_TOKEN.
  4. In the Value field, enter the text of your Hugging Face Access Token.
  5. In the Notebook access field, select the switch to enable access.

Once you have entered your Access Token as HF_TOKEN and value, you can access and set it within your Colab notebook environment using the following code:

from google.colab import userdata
from huggingface_hub import login

# Login into Hugging Face Hub
hf_token = userdata.get('HF_TOKEN') # If you are running inside a Google Colab
login(hf_token)

Install Python packages

Install the Hugging Face libraries required for running the Gemma model and making requests.

# Install Pytorch & other libraries
%pip install "torch>=2.4.0"

# Install a transformers version that supports Gemma 3 (>= 4.51.3)
%pip install "transformers>=4.51.3"

Generate text from text

Prompting a Gemma model with text to get a text response is the simplest way to use Gemma and works with nearly all Gemma variants. This section shows how to use the Hugging Face Transformers library load and configure a Gemma model for text to text generation.

Load model

Use the torch and transformers libraries to create an instance of a model execution pipeline class with Gemma. When using a model for generating output or following directions, select an instruction tuned (IT) model, which typically has it in the model ID string. Using the pipeline object, you specify the Gemma variant you want to use, the type of task you want to perform, specifically "text-generation" for text-to-text generation, as shown in the following code example:

import torch
from transformers import pipeline

pipeline = pipeline(
    task="text-generation",
    model="google/gemma-3-4b-it",
    device=0, # "cuda" for Colab, "msu" for iOS devices
    torch_dtype=torch.bfloat16
)

Gemma supports only a few task settings for generation. For more information on the available task settings, see the Hugging Face Pipelines task() documentation. Use the torch data type torch.bfloat16 to reduce the precision of the model and compute resources needed, without significantly impacting the output quality of the model. For the device setting, you can use "cuda" for Colab, or "msu" for iOS devices, or just set this to 0 (zero) to specify the first GPU on your system. For more information about using the Pipeline class, see the Hugging Face Pipelines documentation.

Run text generation

Once you have the Gemma model loaded and configured in a pipeline object, you can send prompts to the model. The following example code shows a basic request using the text_inputs parameter:

pipeline(text_inputs="roses are red")
[{'generated_text': 'roses are red, violets are blue, \ni love you more than you ever knew.\n\n**Explanation'}]

Use a prompt template

When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as user or model, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Roses are red..."},]
        },
    ],
]

pipeline(messages, max_new_tokens=50)

Generate text from image data

Starting with Gemma 3, for model sizes 4B and higher, you can use image data as part of your prompt. This section shows how to use the Transformers library to load and configure a Gemma model to use image data and text input to generate text output.

Load model

When loading a Gemma model for use with image data, you configure the Transformer pipeline instance specifically for use with images. In particular, you must select a pipeline configuration that can handle visual data by setting the task parameter to "image-text-to-text", as shown in the following code example:

import torch
from transformers import pipeline

pipeline = pipeline(
    task="image-text-to-text", # required for image input
    model="google/gemma-3-4b-it",
    device=0,
    torch_dtype=torch.bfloat16
)

Run text generation

Once you have the Gemma model configured to handle image input with a pipeline instance, you can send prompts with images to the model. Use the <start_of_image> token to add the image to the text of your prompt. The following example code shows a basic request using the pipeline parameter:

pipeline(
    "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg",
    text="<start_of_image> What is shown in this image?"
)
[{'input_text': '<start_of_image> What is shown in this image?',
  'generated_text': '<start_of_image> What is shown in this image?\n\nThis image showcases a traditional Indian Thali. A Thali is a platter that contains a variety'}]

Use a prompt template

When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as user or model, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipeline(text=messages, max_new_tokens=50, return_full_text=False)

You can include multiple images in your prompt by including additional "type": "image", entries in the content list.

Next steps

Build and explore more with Gemma models: