Skip to content

feat: pass an image as part of the evaluation #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
giladgd opened this issue Nov 5, 2023 · 11 comments
Open

feat: pass an image as part of the evaluation #88

giladgd opened this issue Nov 5, 2023 · 11 comments
Assignees
Labels
new feature New feature or request roadmap Part of the roadmap for node-llama-cpp (https://github.com/orgs/withcatai/projects/1)

Comments

@giladgd
Copy link
Contributor

giladgd commented Nov 5, 2023

When llama.cpp's support for this will be stable.
Hopefully, there will be an official API for this after ggml-org/llama.cpp#11292 is implemented.

@giladgd giladgd self-assigned this Nov 5, 2023
@giladgd giladgd converted this from a draft issue Nov 5, 2023
@giladgd giladgd added new feature New feature or request roadmap Part of the roadmap for node-llama-cpp (https://github.com/orgs/withcatai/projects/1) labels Nov 5, 2023
@samlhuillier
Copy link

Interested in this kind of multimodal support. Any update on progress?

@fozziethebeat
Copy link

Does this encompass adding support for running llava models or should that be a separate feature request? I noticed that llama-cpp-python already includes llava support from llama.cpp so this shouldn't be too hard with setting up the bindings.

@giladgd
Copy link
Contributor Author

giladgd commented Dec 3, 2023

I haven't started working on this yet, but it is planned as part of the roadmap.
The plan is to add support for llama.cpp's ability to pass an image to a model, which right now only supports LLaVA.

I'll work on this once llama.cpp's API for this is final, to prevent frequent breaking API changes (unlike what happens on some other libraries)

@fozziethebeat
Copy link

Make sense. Hopefully llama.cpp finalizes that API.

@AlexTech314
Copy link

Any update on this? Would love to leverage multimodal models! Love the library so far :)

@giladgd
Copy link
Contributor Author

giladgd commented Mar 28, 2025

An official API for this is in active development on llama.cpp; one it's ready we can start working on adding support for it in node-llama-cpp.
It appears that the API being worked on will also include support for other modalities, such as audio and video, so this is going to be a major feature once it lands (node-llama-cpp will support everything).

@AlexTech314
Copy link

An official API for this is in active development on llama.cpp; one it's ready we can start working on adding support for it in node-llama-cpp.
It appears that the API being worked on will also include support for other modalities, such as audio and video, so this is going to be a major feature once it lands (node-llama-cpp will support everything).

Beautiful. Looking forward to it, this library is insane.

@wisng
Copy link

wisng commented Apr 18, 2025

Any updates on this feature, it seems there were some experimental support for gemma 3 vision last week ggml-org/llama.cpp#12344

@giladgd
Copy link
Contributor Author

giladgd commented May 9, 2025

@wisng I'm waiting for an official stable API for this, which is still in the works.

@Mihailoff
Copy link

🔥 Multimodal support arrived in llama-server: ggml-org/llama.cpp#12898 | documentation

Perhaps we can't call it stable, but it is there now.

@giladgd
Copy link
Contributor Author

giladgd commented May 16, 2025

I've started poking around with the mtmd API to integrate multimodality into node-llama-cpp.
There were too many breaking changes around it recently, so I'll wait a bit more before I spend more time integrating it, but it's coming up!
Can't commit to a timeline yet, but I'll release a few beta versions with it before a stable release to gather feedback and iron out any bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request roadmap Part of the roadmap for node-llama-cpp (https://github.com/orgs/withcatai/projects/1)
Projects
Development

No branches or pull requests

6 participants