-
Notifications
You must be signed in to change notification settings - Fork 573
Description
hi , I previously reported a heap buffer overflow issue in the PaliGemma2 ViT image processing path . The Google Security Team responded that this is not considered a security vulnerability, with the rationale that:
"Users are recommended to run untrusted models in a sandbox, since models are inherently equivalent to code. Running untrusted models is not something that we would consider within the threat model of gemma.cpp."
I understand and respect this position. However, I respectfully disagree with this assessment.Here's why I think this is a real security issue:
-
Loading a model file is not the same as executing code. Users download models from HuggingFace and community platforms expecting to load data, not run arbitrary code. The inference engine should validate model inputs, just like any data parser should.
-
The open-source LLM ecosystem cannot control malicious models. Models are shared freely across platforms. The only defense is for inference engines to handle model files robustly. Treating models as "equivalent to code" shifts all responsibility to users, which is unrealistic.
-
This is a logic bug, not a "malicious model" scenario. Unlike embedding shellcode in weights, this exploits a buffer size mismatch in the C++ implementation. It passes all validation checks and triggers at runtime.
-
Inconsistency with similar projects. llama.cpp treats similar issues as security bugs and fixes them. This inconsistency confuses users about proper security practices.
That said, I understand you have a different threat model. I'm providing the technical details so you can decide whether to fix this, regardless of how it's classified.
Technical Details
Root Cause:
Image::GetPatch(paligemma/image.cc:237) always copies 588 floats (14×14×3)- Buffer size is calculated from
vit_config.patch_width(gemma/vit.cc:298) - When
patch_width != 14, the buffer is too small for the fixed-size copy
Trigger:
A malformed .sbs file with patch_width=1 passes shape validation but causes heap overflow at runtime.
Code Analysis:
// gemma/vit.cc:298 - Buffer allocation
size_t patch_size = patch_width * patch_width * 3; // Can be 3 if patch_width=1
// paligemma/image.cc:237 - Fixed-size copy
for (size_t y = 0; y < kPatchSize; ++y) { // kPatchSize=14, always copies 588 floats
std::memcpy(out, in, kBytesPerRow); // Overflows if patch_size < 588
...
}Impact:
- Heap corruption
- Process crash
- Undefined behavior
Heap Buffer Overflow Trigged like.
Proposed Fix
Option 1: Runtime Validation (Simple)
// In EmbedImagePatches (gemma/vit.cc:296)
if (patch_width != 14) {
HWY_ABORT("Invalid patch_width=%zu, only 14 is supported", patch_width);
}Option 2: Dynamic Sizing (Proper Fix)
// Modify Image::GetPatch to accept patch_width parameter
void GetPatch(size_t patch_idx, float* dest, size_t patch_width) {
const size_t kNumChannels = 3;
const size_t bytes_per_row = patch_width * kNumChannels * sizeof(float);
// Use patch_width for copy size instead of hardcoded kPatchSize
}Option 3: Shape Validation at Load Time
// In TensorLoader
if (tensor_name == "img_emb_kernel") {
if (shape[1] != 588) { // 14*14*3
HWY_ABORT("Invalid img_emb_kernel shape, expected (..., 588)");
}
}Reproduction
I can provide detailed reproduction steps if needed, though creating the malformed .sbs file is complex (requires modifying conversion scripts and understanding tensor shape registration).
Request
Would you consider fixing this as a robustness improvement? Even if it's outside the security threat model, it would prevent crashes from malformed model files.
I'm happy to submit a PR if you'd like.