Feat: add Kwai-Keye transformers #39292

Kwai-Keye · 2025-07-09T06:41:03Z

Add support for Kwai-Keye/Keye-VL-8B-Preview model

Description

This pull request adds support for the Keye-VL-8B-Preview model developed by Kwai-Keye. Keye-VL-8B-Preview is an advanced vision-language model that demonstrates strong performance in video understanding, visual perception, and reasoning tasks,.

The model repository can be found at:

Hugging Face Hub: https://huggingface.co/Kw ai-Ke ye/Ke ye-VL -8B-P revie w
GitHub: https://github.com /Kwai -Keye /Keye

Key Changes

Added model configuration files for Keye-VL-8B-Preview
Implemented model architecture code based on the official specifications
Added tokenizer support for the model's specific tokenization requirements
Included example usage scripts in the documentation

Model Architecture

The model consists of:

A Siglip vision encoder for processing image/video inputs
A Qwen3 decoder for language understanding and generation

Usage Example

import torch
from transformers import KeyeForConditionalGeneration, AutoProcessor
from PIL import Image
import requests

model = KeyeForConditionalGeneration.from_pretrained(
    "Kwai-Keye/Keye-VL-8B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="sdpa",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained("Kwai-Keye/Keye-VL-8B-Preview", trust_remote_code=True)
url = "https://s1-11508.kwimgs.com/kos/nlav11508/mllm_all/ziran_jiafeimao_11.jpg"
messages = [
    {
        "role":"user",
        "content":[
            {
                "type":"image",
                "image": url,
            },
            {
                "type":"text",
                "text":"Describe this image."
            }
        ]
    }

]

image_inputs = [Image.open(requests.get(url, stream=True).raw)]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=None,
    padding=True,
    return_tensors="pt",
).to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Checklist

Model code is properly formatted and follows transformers coding guidelines
Documentation is updated with usage examples
All new and existing tests pass locally with the changes

We believe that integrating Keye-VL-8B-Preview into the transformers library will provide users with another powerful option for vision-language tasks. We welcome any feedback or suggestions for improving this integration.

github-actions · 2025-07-09T18:50:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

lingzhixin added 12 commits July 9, 2025 14:02

update

ce14eff

add keye

63f650d

add keye doc

098b8e4

add keye doc

d94e745

fix modeling_auto

169670d

update processing_keye.py

497af63

fix processor

1ba9958

fix processor2

327d7cf

Merge remote-tracking branch 'upstream/main' into dev2

f7b28ea

update2

6447cba

update3

cc8f397

update3

e8fcdc5

update4

e55a81e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: add Kwai-Keye transformers #39292

Feat: add Kwai-Keye transformers #39292

Kwai-Keye commented Jul 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

Uh oh!

Feat: add Kwai-Keye transformers #39292

Are you sure you want to change the base?

Feat: add Kwai-Keye transformers #39292

Conversation

Kwai-Keye commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Model Architecture

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

Uh oh!

Kwai-Keye commented Jul 9, 2025 •

edited

Loading