ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings. #2751

trinh-hoang-hiep · 2025-03-19T11:39:34Z

gemma has problem when passing input embedding to model.generate() function forced to pass **input_ids otherwise it causes this error

maybe Integer vs. Float Comparison:

When you calculate special_image_mask based on input_ids, the comparison is done between integer values, which gives exact results.
When comparing on inputs_embeds, you compare float vectors from the embedding layer with the embedding vector of the special token. Since float comparisons can have floating point precision issues, sometimes some values may be incorrectly identified (or have very small deviations that cause the comparison to return True or False inconsistently).

        if input_ids is None:
            special_image_mask = inputs_embeds == self.get_input_embeddings()(
                torch.tensor(self.config.image_token_index, dtype=torch.long, device=inputs_embeds.device)
            )
        else:
            special_image_mask = (input_ids == self.config.image_token_index).unsqueeze(-1)
            special_image_mask = special_image_mask.expand_as(inputs_embeds).to(inputs_embeds.device)

if using input embedding special_image_mask.sum() =tensor(655363, device='cuda:0'),
if using input_ids it is equal to tensor(655360, device='cuda:0')

The text was updated successfully, but these errors were encountered:

trinh-hoang-hiep · 2025-03-21T04:43:04Z

i solved it here https://github.com/trinh-hoang-hiep/prompt-tuning-gemma3

pySilver · 2025-04-06T07:57:16Z

@trinh-hoang-hiep can you tell a little bit more? I am facing identical error and your link points to 404

trinh-hoang-hiep closed this as completed Mar 21, 2025

trinh-hoang-hiep reopened this Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings. #2751

ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings. #2751

trinh-hoang-hiep commented Mar 19, 2025

trinh-hoang-hiep commented Mar 21, 2025

pySilver commented Apr 6, 2025

ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings. #2751

ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings. #2751

Comments

trinh-hoang-hiep commented Mar 19, 2025

trinh-hoang-hiep commented Mar 21, 2025

pySilver commented Apr 6, 2025