Skip to content

PaliGemma2 ViT Heap Buffer Overflow #762

@Somet2mes

Description

@Somet2mes

hi , I previously reported a heap buffer overflow issue in the PaliGemma2 ViT image processing path . The Google Security Team responded that this is not considered a security vulnerability, with the rationale that:

"Users are recommended to run untrusted models in a sandbox, since models are inherently equivalent to code. Running untrusted models is not something that we would consider within the threat model of gemma.cpp."

I understand and respect this position. However, I respectfully disagree with this assessment.Here's why I think this is a real security issue:

  1. Loading a model file is not the same as executing code. Users download models from HuggingFace and community platforms expecting to load data, not run arbitrary code. The inference engine should validate model inputs, just like any data parser should.

  2. The open-source LLM ecosystem cannot control malicious models. Models are shared freely across platforms. The only defense is for inference engines to handle model files robustly. Treating models as "equivalent to code" shifts all responsibility to users, which is unrealistic.

  3. This is a logic bug, not a "malicious model" scenario. Unlike embedding shellcode in weights, this exploits a buffer size mismatch in the C++ implementation. It passes all validation checks and triggers at runtime.

  4. Inconsistency with similar projects. llama.cpp treats similar issues as security bugs and fixes them. This inconsistency confuses users about proper security practices.

That said, I understand you have a different threat model. I'm providing the technical details so you can decide whether to fix this, regardless of how it's classified.

Technical Details

Root Cause:

  • Image::GetPatch (paligemma/image.cc:237) always copies 588 floats (14×14×3)
  • Buffer size is calculated from vit_config.patch_width (gemma/vit.cc:298)
  • When patch_width != 14, the buffer is too small for the fixed-size copy

Trigger:
A malformed .sbs file with patch_width=1 passes shape validation but causes heap overflow at runtime.

Code Analysis:

// gemma/vit.cc:298 - Buffer allocation
size_t patch_size = patch_width * patch_width * 3;  // Can be 3 if patch_width=1

// paligemma/image.cc:237 - Fixed-size copy
for (size_t y = 0; y < kPatchSize; ++y) {  // kPatchSize=14, always copies 588 floats
    std::memcpy(out, in, kBytesPerRow);     // Overflows if patch_size < 588
    ...
}

Impact:

  • Heap corruption
  • Process crash
  • Undefined behavior

Heap Buffer Overflow Trigged like.

Image

Proposed Fix

Option 1: Runtime Validation (Simple)

// In EmbedImagePatches (gemma/vit.cc:296)
if (patch_width != 14) {
    HWY_ABORT("Invalid patch_width=%zu, only 14 is supported", patch_width);
}

Option 2: Dynamic Sizing (Proper Fix)

// Modify Image::GetPatch to accept patch_width parameter
void GetPatch(size_t patch_idx, float* dest, size_t patch_width) {
    const size_t kNumChannels = 3;
    const size_t bytes_per_row = patch_width * kNumChannels * sizeof(float);
    // Use patch_width for copy size instead of hardcoded kPatchSize
}

Option 3: Shape Validation at Load Time

// In TensorLoader
if (tensor_name == "img_emb_kernel") {
    if (shape[1] != 588) {  // 14*14*3
        HWY_ABORT("Invalid img_emb_kernel shape, expected (..., 588)");
    }
}

Reproduction

I can provide detailed reproduction steps if needed, though creating the malformed .sbs file is complex (requires modifying conversion scripts and understanding tensor shape registration).

Request

Would you consider fixing this as a robustness improvement? Even if it's outside the security threat model, it would prevent crashes from malformed model files.

I'm happy to submit a PR if you'd like.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions