Support for NemotronH Nano VLM #23644

danielafrimi · 2025-08-26T09:45:53Z

Adds a new multimodal model implementation: vllm/model_executor/models/nano_nemotron_vl.py

for online serving do the following:

vllm serve <hf-card> --runner generate --max-model-len 8192 --trust_remote_code
text prompt for video/image

        "role":
        "user",
        "content": [
            {
                "type": "text",
                # "text": "What's in this image?" #  <image> placeholder is in the prompt
                 "text": "What's in this video and whats on the tv?" # <video> placeholder is in the prompt
            },
            {
                "type": "video_url",
                "video_url": {
                    "url": video_url
                },

                # "type": "image_url",
                # "image_url": {
                #     "url": image_url
                # },
            },
        ],
    }],

gemini-code-assist

Code Review

This pull request introduces support for the NemotronH Nano VLM model, with the core logic implemented in the new vllm/model_executor/models/nano_nemotron_vl.py file. The implementation is comprehensive, but I've identified a critical issue in the model registry that would prevent the model from loading, along with a few high-severity issues including a debug print statement, a questionable hardcoded limit, and a stateful implementation detail that could be refactored for better robustness.

vllm/model_executor/models/registry.py

vllm/model_executor/models/nano_nemotron_vl.py

DarkLight1337

Thanks for implementing this in vLLM! Some initial comments

vllm/model_executor/models/nano_nemotron_vl.py

danielafrimi · 2025-08-28T10:38:48Z

@DarkLight1337 Thanks for reviewing!

ping you back

vllm/config/__init__.py

DarkLight1337

Can you add this model to tests/models/multimodal/processing/test_common.py to validate the processing logic?

DarkLight1337 · 2025-08-28T11:03:53Z

Also need to add it to the test registry tests/models/registry.py

danielafrimi · 2025-08-28T11:26:24Z

Thanks!

@DarkLight1337 There isn't a HF card for this model yet - we will release it in the upcoming month.
so for adding the model into tests/models/registry.py i dont know exactly the model repo name ( i can add it after the model release).
Same for tests/models/multimodal/processing/test_common.py it looks like i need to add the HF model_id.

what do you think?

DarkLight1337 · 2025-09-04T09:42:58Z

Inside tests/models/registry.py, you can add an entry with a dummy model name and set is_available_online=False, that should skip the test for the model

danielafrimi · 2025-09-04T16:24:22Z

@DarkLight1337 Thanks for the review.
added the model to registry

DarkLight1337 · 2025-09-05T02:08:32Z

vllm/model_executor/models/nano_nemotron_vl.py

Let's inherit from InternVL processor to avoid duplicate code

@DarkLight1337 BaseNanoNemotronVLProcessor and BaseInternVLProcessor don’t follow the exact same logic.
For example:

In NanoNemotron we don’t use min/max/max_dynamic_patch, but we do persist normalization attributes.
Image/video processing is largely similar between the two, but NanoNemotron assumes that image/video placeholders are in the prompt, which leads to slightly different handling (also the processing itself).

Ok sure, thanks for the explanation

DarkLight1337 · 2025-09-08T02:42:16Z

vllm/model_executor/models/nano_nemotron_vl.py

Is the vision model part not implemented in vLLM yet?

yes, i have a draft PR which implemented the RADIO model with vLLM native.
after the current PR will merge ill rebase the below one

PR

vllm/model_executor/models/nano_nemotron_vl.py

Signed-off-by: Daniel Afrimi <[email protected]> clean Signed-off-by: Daniel Afrimi <[email protected]> rename file Signed-off-by: Daniel Afrimi <[email protected]> CR Signed-off-by: Daniel Afrimi <[email protected]> refactor Signed-off-by: Daniel Afrimi <[email protected]> add video support Signed-off-by: Daniel Afrimi <[email protected]> add model to test regisry Signed-off-by: Daniel Afrimi <[email protected]> online serving works Signed-off-by: Daniel Afrimi <[email protected]> online serving works Signed-off-by: Daniel Afrimi <[email protected]> wip-remove some code Signed-off-by: Daniel Afrimi <[email protected]> online serving works after changing chat_template Signed-off-by: Daniel Afrimi <[email protected]>

Signed-off-by: Daniel Afrimi <[email protected]>

danielafrimi · 2025-09-09T09:10:23Z

vllm/model_executor/models/nano_nemotron_vl.py

+        return (self.language_model.mamba_cache.
+                get_seqlen_agnostic_capture_inputs(batch_size))
+
+    @classmethod


@DarkLight1337 added both class function which is required to calc the mamba page size. however those impl are in the llm model and in the vlm one (vllm/model_executor/models/config.py).

what do you think about this WAR? wondering if there any nice impl for this.

@tdoublep probably has a better understanding of this

The implementation here looks good to me. We need these class methods get_mamba_state_shape_from_config because in vllm/model_executor/models/config.py we do some manipulation of the cache config (e.g., modifying the attention block size) and we need to be able to do this without actually instantiating the model.

danielafrimi · 2025-09-09T09:14:39Z

@DarkLight1337 fix the above + add some comments. let me know what you think.

danielafrimi · 2025-09-09T12:20:17Z

@DarkLight1337 Can we approve it? or is any other changes needed

DarkLight1337

LGTM then, thanks

Signed-off-by: Daniel Afrimi <[email protected]>

Signed-off-by: Daniel Afrimi <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

danielafrimi requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners August 26, 2025 09:45

mergify bot added the new-model Requests to new models label Aug 26, 2025

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

DarkLight1337 reviewed Aug 26, 2025

View reviewed changes

danielafrimi mentioned this pull request Aug 27, 2025

Support for NemotronH Nano VLM with an optimized vision model (vLLM native) #23753

Closed

DarkLight1337 reviewed Aug 28, 2025

View reviewed changes

vllm/config/__init__.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 28, 2025

View reviewed changes

danielafrimi requested a review from ywang96 as a code owner September 4, 2025 16:22

DarkLight1337 reviewed Sep 5, 2025

View reviewed changes

DarkLight1337 reviewed Sep 8, 2025

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 8, 2025

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py Outdated Show resolved Hide resolved

danielafrimi added 2 commits September 9, 2025 11:53

V1

7c7ccfa

Signed-off-by: Daniel Afrimi <[email protected]>

danielafrimi force-pushed the vlm_clean branch from 56fd086 to 7c7ccfa Compare September 9, 2025 09:06

danielafrimi commented Sep 9, 2025

View reviewed changes

DarkLight1337 approved these changes Sep 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 9, 2025 13:47

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

Merge branch 'main' into vlm_clean

70a9ee2

tomeras91 mentioned this pull request Sep 10, 2025

[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend NVIDIA/TensorRT-LLM#7207

Merged

vllm-bot merged commit 72d3010 into vllm-project:main Sep 10, 2025
38 of 41 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

Support for NemotronH Nano VLM (vllm-project#23644)

b0c16df

Signed-off-by: Daniel Afrimi <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Support for NemotronH Nano VLM (vllm-project#23644)

6de2ff7

Signed-off-by: Daniel Afrimi <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

Support for NemotronH Nano VLM (vllm-project#23644)

239f853

Signed-off-by: Daniel Afrimi <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Uh oh!

Support for NemotronH Nano VLM #23644

Support for NemotronH Nano VLM #23644

Uh oh!

Conversation

danielafrimi commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielafrimi commented Aug 28, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielafrimi commented Aug 28, 2025

Uh oh!

DarkLight1337 commented Sep 4, 2025

Uh oh!

danielafrimi commented Sep 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielafrimi commented Sep 9, 2025

Uh oh!

danielafrimi commented Sep 9, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danielafrimi commented Aug 26, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Aug 28, 2025 •

edited

Loading