simonw · nicobailon · May 23, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,93 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Development Commands
+
+### Running tests
+```bash
+pytest
+```
+
+To run tests with API recording for new tests:
+```bash
+PYTEST_GEMINI_API_KEY="$(llm keys get gemini)" pytest --record-mode once
+```
+
+### Updating README documentation
+The README uses cog for dynamic content generation:
+```bash
+python -m cogapp --check README.md
+```
+
+### Installing development environment
+```bash
+python3 -m venv venv
+source venv/bin/activate
+llm install -e '.[test]'
+```
+
+## Architecture Overview
+
+This is an LLM plugin that provides access to Google's Gemini API models. Key components:
+
+### Core Model Implementation
+- **llm_gemini.py**: Main plugin module containing all model implementations
+  - `GeminiModel` class handles text generation for 45+ Gemini models
+  - `GeminiEmbeddingModel` handles embedding models (text-embedding-004, gemini-embedding-exp-03-07 with size variants)
+  - Model registration via `register_models()` hookimpl with both sync and async versions
+
+### Model Configuration & Capabilities
+- **Model Sets**: Models are categorized by capabilities:
+  - `GOOGLE_SEARCH_MODELS`: 44 models supporting Google search grounding
+  - `THINKING_BUDGET_MODELS`: 8 models supporting "thinking" mode for reasoning
+  - `NO_VISION_MODELS`: 2 models without multi-modal support (gemma-3-1b-it, gemma-3n-e4b-it)
+  - `ATTACHMENT_TYPES`: 21 supported MIME types for multi-modal inputs
+
+- **Model Capability Flags**:
+  - `can_vision`: Multi-modal support for images/audio/video
+  - `can_google_search`: Web grounding capability
+  - `can_thinking_budget`: Reasoning with thinking tokens
+  - `can_schema`: JSON schema support (excluded for flash-thinking and gemma-3 models)
+
+### Options System
+- **Base Options** (via `Options` class):
+  - `temperature`, `max_output_tokens`, `top_p`, `top_k`
+  - `json_object`: Force JSON output
+  - `timeout`: Request timeout (httpx)
+  - `code_execution`: Enable Python sandbox execution
+
+- **Extended Options** (via inheritance):
+  - `OptionsWithGoogleSearch`: Adds `google_search` flag
+  - `OptionsWithThinkingBudget`: Adds `thinking_budget` parameter
+
+### Advanced Features
+- **Schema Cleanup**: `cleanup_schema()` removes unsupported JSON schema properties
+- **Google Search API Compatibility**: Handles both `google_search_retrieval` (older) and `google_search` (newer) tool formats
+- **Token Usage Tracking**: Separates candidate vs thinking tokens for accurate billing
+- **Safety Settings**: All categories set to BLOCK_NONE by default
+- **Response Processing**: Handles executableCode, codeExecutionResult, functionCall parts
+
+### CLI Commands
+- **`llm gemini models`**: Lists all Gemini API models
+  - `--method` flag for filtering by supported methods
+  - `--key` for API key override
+- **`llm gemini files`**: Lists uploaded files in Gemini API
+
+### Testing Infrastructure
+- **pytest-recording**: VCR-style test recording with API key filtering
+- **Test Coverage**:
+  - Prompt generation with various options
+  - Embedding model variants
+  - Tool/function calling
+  - CLI commands
+  - JSON schema cleanup
+  - Async model support
+
+### Implementation Details
+- **Model ID System**: Internal `gemini_model_id` vs external `gemini/` prefixed `model_id`
+- **Message Building**: Complex conversation history with tool results and attachments
+- **Streaming**: Uses ijson for efficient JSON streaming from API responses
+- **MIME Type Resolution**: Custom handling (e.g., audio/mpeg → audio/mp3)
+
+The plugin integrates with the LLM ecosystem via entry points defined in pyproject.toml.
diff --git a/README.md b/README.md
@@ -170,6 +170,36 @@ llm -m gemini-2.0-flash -o google_search 1 \
 
 Use `llm logs -c --json` after running a prompt to see the full JSON response, which includes [additional information](https://github.com/simonw/llm-gemini/pull/29#issuecomment-2606201877) about grounded results.
 
+### URL Context
+
+Some Gemini 2.0+ models support URL Context, which allows the model to retrieve and analyze content from specified URLs as part of their response.
+
+This feature is currently experimental and supports up to 20 URLs with a daily quota limit of 1500 via the API.
+
+Supported models:
+- `gemini-2.5-pro-preview-05-06`
+- `gemini-2.5-flash-preview-05-20`
+- `gemini-2.0-flash`
+- `gemini-2.0-flash-live-001`
+
+To enable URL context, use `-o url_context 1` and provide URLs as attachments:
+
+```bash
+llm -m gemini-2.0-flash -o url_context 1 \
+  'Compare these articles' \
+  -a https://example.com/article1 \
+  -a https://example.com/article2
+```
+
+You can also combine URL context with Google search:
+
+```bash
+llm -m gemini-2.5-pro-preview-05-06 \
+  -o url_context 1 -o google_search 1 \
+  'Find recent news about this topic and compare with this article' \
+  -a https://example.com/article
+```
+
 ### Chat
 
 To chat interactively with the model, run `llm chat`:

diff --git a/llm_gemini.py b/llm_gemini.py
@@ -64,6 +64,13 @@
     "gemini-2.5-flash-preview-05-20",
 }
 
+URL_CONTEXT_MODELS = {
+    "gemini-2.5-pro-preview-05-06",
+    "gemini-2.5-flash-preview-05-20",
+    "gemini-2.0-flash",
+    "gemini-2.0-flash-live-001",
+}
+
 NO_VISION_MODELS = {"gemma-3-1b-it", "gemma-3n-e4b-it"}
 
 ATTACHMENT_TYPES = {
@@ -129,6 +136,7 @@ def register_models(register):
         "gemini-2.0-flash-thinking-exp-01-21",
         # Released 5th Feb 2025:
         "gemini-2.0-flash",
+        "gemini-2.0-flash-live-001",
         "gemini-2.0-pro-exp-02-05",
         # Released 25th Feb 2025:
         "gemini-2.0-flash-lite",
@@ -145,6 +153,7 @@ def register_models(register):
     ):
         can_google_search = model_id in GOOGLE_SEARCH_MODELS
         can_thinking_budget = model_id in THINKING_BUDGET_MODELS
+        can_url_context = model_id in URL_CONTEXT_MODELS
         can_vision = model_id not in NO_VISION_MODELS
         can_schema = "flash-thinking" not in model_id and "gemma-3" not in model_id
         register(
@@ -154,13 +163,15 @@ def register_models(register):
                 can_google_search=can_google_search,
                 can_thinking_budget=can_thinking_budget,
                 can_schema=can_schema,
+                can_url_context=can_url_context,
             ),
             AsyncGeminiPro(
                 model_id,
                 can_vision=can_vision,
                 can_google_search=can_google_search,
                 can_thinking_budget=can_thinking_budget,
                 can_schema=can_schema,
+                can_url_context=can_url_context,
             ),
             aliases=(model_id,),
         )
@@ -265,7 +276,19 @@ class OptionsWithGoogleSearch(Options):
             default=None,
         )
 
-    class OptionsWithThinkingBudget(OptionsWithGoogleSearch):
+    class OptionsWithUrlContext(Options):
+        url_context: Optional[bool] = Field(
+            description="Enables the model to retrieve and analyze content from specified URLs",
+            default=None,
+        )
+
+    class OptionsWithGoogleSearchAndUrlContext(OptionsWithGoogleSearch):
+        url_context: Optional[bool] = Field(
+            description="Enables the model to retrieve and analyze content from specified URLs",
+            default=None,
+        )
+
+    class OptionsWithThinkingBudget(OptionsWithGoogleSearchAndUrlContext):
         thinking_budget: Optional[int] = Field(
             description="Indicates the thinking budget in tokens. Set to 0 to disable.",
             default=None,
@@ -278,16 +301,26 @@ def __init__(
         can_google_search=False,
         can_thinking_budget=False,
         can_schema=False,
+        can_url_context=False,
     ):
         self.model_id = "gemini/{}".format(gemini_model_id)
         self.gemini_model_id = gemini_model_id
         self.can_google_search = can_google_search
+        self.can_url_context = can_url_context
         self.supports_schema = can_schema
-        if can_google_search:
-            self.Options = self.OptionsWithGoogleSearch
-        self.can_thinking_budget = can_thinking_budget
+
+        # Set Options class based on capabilities - hierarchical assignment
+        # to ensure models with multiple features get the correct combined Options class
         if can_thinking_budget:
             self.Options = self.OptionsWithThinkingBudget
+        elif can_google_search and can_url_context:
+            self.Options = self.OptionsWithGoogleSearchAndUrlContext
+        elif can_url_context:
+            self.Options = self.OptionsWithUrlContext
+        elif can_google_search:
+            self.Options = self.OptionsWithGoogleSearch
+
+        self.can_thinking_budget = can_thinking_budget
         if can_vision:
             self.attachment_types = ATTACHMENT_TYPES
 
@@ -391,6 +424,8 @@ def build_request_body(self, prompt, conversation):
                 else "google_search"
             )
             tools.append({tool_name: {}})
+        if prompt.options and self.can_url_context and prompt.options.url_context:
+            tools.append({"url_context": {}})
         if prompt.tools:
             tools.append(
                 {