Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

### Running tests
```bash
pytest
```

To run tests with API recording for new tests:
```bash
PYTEST_GEMINI_API_KEY="$(llm keys get gemini)" pytest --record-mode once
```

### Updating README documentation
The README uses cog for dynamic content generation:
```bash
python -m cogapp --check README.md
```

### Installing development environment
```bash
python3 -m venv venv
source venv/bin/activate
llm install -e '.[test]'
```

## Architecture Overview

This is an LLM plugin that provides access to Google's Gemini API models. Key components:

### Core Model Implementation
- **llm_gemini.py**: Main plugin module containing all model implementations
- `GeminiModel` class handles text generation for 45+ Gemini models
- `GeminiEmbeddingModel` handles embedding models (text-embedding-004, gemini-embedding-exp-03-07 with size variants)
- Model registration via `register_models()` hookimpl with both sync and async versions

### Model Configuration & Capabilities
- **Model Sets**: Models are categorized by capabilities:
- `GOOGLE_SEARCH_MODELS`: 44 models supporting Google search grounding
- `THINKING_BUDGET_MODELS`: 8 models supporting "thinking" mode for reasoning
- `NO_VISION_MODELS`: 2 models without multi-modal support (gemma-3-1b-it, gemma-3n-e4b-it)
- `ATTACHMENT_TYPES`: 21 supported MIME types for multi-modal inputs

- **Model Capability Flags**:
- `can_vision`: Multi-modal support for images/audio/video
- `can_google_search`: Web grounding capability
- `can_thinking_budget`: Reasoning with thinking tokens
- `can_schema`: JSON schema support (excluded for flash-thinking and gemma-3 models)

### Options System
- **Base Options** (via `Options` class):
- `temperature`, `max_output_tokens`, `top_p`, `top_k`
- `json_object`: Force JSON output
- `timeout`: Request timeout (httpx)
- `code_execution`: Enable Python sandbox execution

- **Extended Options** (via inheritance):
- `OptionsWithGoogleSearch`: Adds `google_search` flag
- `OptionsWithThinkingBudget`: Adds `thinking_budget` parameter

### Advanced Features
- **Schema Cleanup**: `cleanup_schema()` removes unsupported JSON schema properties
- **Google Search API Compatibility**: Handles both `google_search_retrieval` (older) and `google_search` (newer) tool formats
- **Token Usage Tracking**: Separates candidate vs thinking tokens for accurate billing
- **Safety Settings**: All categories set to BLOCK_NONE by default
- **Response Processing**: Handles executableCode, codeExecutionResult, functionCall parts

### CLI Commands
- **`llm gemini models`**: Lists all Gemini API models
- `--method` flag for filtering by supported methods
- `--key` for API key override
- **`llm gemini files`**: Lists uploaded files in Gemini API

### Testing Infrastructure
- **pytest-recording**: VCR-style test recording with API key filtering
- **Test Coverage**:
- Prompt generation with various options
- Embedding model variants
- Tool/function calling
- CLI commands
- JSON schema cleanup
- Async model support

### Implementation Details
- **Model ID System**: Internal `gemini_model_id` vs external `gemini/` prefixed `model_id`
- **Message Building**: Complex conversation history with tool results and attachments
- **Streaming**: Uses ijson for efficient JSON streaming from API responses
- **MIME Type Resolution**: Custom handling (e.g., audio/mpeg → audio/mp3)

The plugin integrates with the LLM ecosystem via entry points defined in pyproject.toml.
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,36 @@ llm -m gemini-2.0-flash -o google_search 1 \

Use `llm logs -c --json` after running a prompt to see the full JSON response, which includes [additional information](https://github.com/simonw/llm-gemini/pull/29#issuecomment-2606201877) about grounded results.

### URL Context

Some Gemini 2.0+ models support URL Context, which allows the model to retrieve and analyze content from specified URLs as part of their response.

This feature is currently experimental and supports up to 20 URLs with a daily quota limit of 1500 via the API.

Supported models:
- `gemini-2.5-pro-preview-05-06`
- `gemini-2.5-flash-preview-05-20`
- `gemini-2.0-flash`
- `gemini-2.0-flash-live-001`

To enable URL context, use `-o url_context 1` and provide URLs as attachments:

```bash
llm -m gemini-2.0-flash -o url_context 1 \
'Compare these articles' \
-a https://example.com/article1 \
-a https://example.com/article2
```

You can also combine URL context with Google search:

```bash
llm -m gemini-2.5-pro-preview-05-06 \
-o url_context 1 -o google_search 1 \
'Find recent news about this topic and compare with this article' \
-a https://example.com/article
```

### Chat

To chat interactively with the model, run `llm chat`:
Expand Down
43 changes: 39 additions & 4 deletions llm_gemini.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,13 @@
"gemini-2.5-flash-preview-05-20",
}

URL_CONTEXT_MODELS = {
"gemini-2.5-pro-preview-05-06",
"gemini-2.5-flash-preview-05-20",
"gemini-2.0-flash",
"gemini-2.0-flash-live-001",
}

NO_VISION_MODELS = {"gemma-3-1b-it", "gemma-3n-e4b-it"}

ATTACHMENT_TYPES = {
Expand Down Expand Up @@ -129,6 +136,7 @@ def register_models(register):
"gemini-2.0-flash-thinking-exp-01-21",
# Released 5th Feb 2025:
"gemini-2.0-flash",
"gemini-2.0-flash-live-001",
"gemini-2.0-pro-exp-02-05",
# Released 25th Feb 2025:
"gemini-2.0-flash-lite",
Expand All @@ -145,6 +153,7 @@ def register_models(register):
):
can_google_search = model_id in GOOGLE_SEARCH_MODELS
can_thinking_budget = model_id in THINKING_BUDGET_MODELS
can_url_context = model_id in URL_CONTEXT_MODELS
can_vision = model_id not in NO_VISION_MODELS
can_schema = "flash-thinking" not in model_id and "gemma-3" not in model_id
register(
Expand All @@ -154,13 +163,15 @@ def register_models(register):
can_google_search=can_google_search,
can_thinking_budget=can_thinking_budget,
can_schema=can_schema,
can_url_context=can_url_context,
),
AsyncGeminiPro(
model_id,
can_vision=can_vision,
can_google_search=can_google_search,
can_thinking_budget=can_thinking_budget,
can_schema=can_schema,
can_url_context=can_url_context,
),
aliases=(model_id,),
)
Expand Down Expand Up @@ -265,7 +276,19 @@ class OptionsWithGoogleSearch(Options):
default=None,
)

class OptionsWithThinkingBudget(OptionsWithGoogleSearch):
class OptionsWithUrlContext(Options):
url_context: Optional[bool] = Field(
description="Enables the model to retrieve and analyze content from specified URLs",
default=None,
)

class OptionsWithGoogleSearchAndUrlContext(OptionsWithGoogleSearch):
url_context: Optional[bool] = Field(
description="Enables the model to retrieve and analyze content from specified URLs",
default=None,
)

class OptionsWithThinkingBudget(OptionsWithGoogleSearchAndUrlContext):
thinking_budget: Optional[int] = Field(
description="Indicates the thinking budget in tokens. Set to 0 to disable.",
default=None,
Expand All @@ -278,16 +301,26 @@ def __init__(
can_google_search=False,
can_thinking_budget=False,
can_schema=False,
can_url_context=False,
):
self.model_id = "gemini/{}".format(gemini_model_id)
self.gemini_model_id = gemini_model_id
self.can_google_search = can_google_search
self.can_url_context = can_url_context
self.supports_schema = can_schema
if can_google_search:
self.Options = self.OptionsWithGoogleSearch
self.can_thinking_budget = can_thinking_budget

# Set Options class based on capabilities - hierarchical assignment
# to ensure models with multiple features get the correct combined Options class
if can_thinking_budget:
self.Options = self.OptionsWithThinkingBudget
elif can_google_search and can_url_context:
self.Options = self.OptionsWithGoogleSearchAndUrlContext
elif can_url_context:
self.Options = self.OptionsWithUrlContext
elif can_google_search:
self.Options = self.OptionsWithGoogleSearch

self.can_thinking_budget = can_thinking_budget
if can_vision:
self.attachment_types = ATTACHMENT_TYPES

Expand Down Expand Up @@ -391,6 +424,8 @@ def build_request_body(self, prompt, conversation):
else "google_search"
)
tools.append({tool_name: {}})
if prompt.options and self.can_url_context and prompt.options.url_context:
tools.append({"url_context": {}})
if prompt.tools:
tools.append(
{
Expand Down
Loading