llama-run: add support for downloading models from ModelScope #13370

yeahdongcn · 2025-05-08T01:24:24Z

Make sure to read the contributing guidelines before submitting a PR

To better support users in mainland China, this PR adds support for using ModelScope as a model endpoint in llama-run.

Testing Done

❯ ./build/bin/llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
 58% |█████████████████████████████████████████████████████████████                                             |  448.59 MB/ 772.71 MB   2.79 MB/s    4m 31s
❯ ./build/bin/llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
 10% |██████████                                                                                                |    8.51 MB/  84.12 MB   4.79 MB/s       15s

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn · 2025-05-09T01:22:48Z

Hi @ericcurtin @ggerganov,

When you have a moment, could you please take a look at this? Thanks!

ericcurtin · 2025-05-09T09:24:22Z

I recommend adding this to RamaLama also, there's a python3 implementation of this there

yeahdongcn · 2025-05-09T09:34:40Z

I recommend adding this to RamaLama also, there's a python3 implementation of this there

Sounds good — I’ll start working on this.

ericcurtin · 2025-05-09T09:37:55Z

"ramalama run" has jumped ahead of "llama-run" in terms of functionality because it uses llama-server from llama.cpp which is a more complete inferencing implementation than llama-run

* origin/master: (39 commits) server : vision support via libmtmd (ggml-org#12898) sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858) metal : optimize MoE for large batches (ggml-org#13388) CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306) llama : do not crash if there is no CPU backend (ggml-org#13395) CUDA: fix crash on large batch size for MoE models (ggml-org#13384) imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389) llama-run: add support for downloading models from ModelScope (ggml-org#13370) mtmd : fix batch_view for m-rope (ggml-org#13397) llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398) rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353) vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326) server : (webui) rename has_multimodal --> modalities (ggml-org#13393) ci : limit write permission to only the release step + fixes (ggml-org#13392) mtmd : Expose helper_decode_image_chunk (ggml-org#13366) server : (webui) fix a very small misalignment (ggml-org#13387) server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365) convert : support rope_scaling type and rope_type (ggml-org#13349) mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381) context : allow cache-less context for embeddings (ggml-org#13108) ...

github-actions bot added the examples label May 8, 2025

llama-run: add support for downloading models from ModelScope

da93920

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn force-pushed the xd/modelscope branch from 9bbef09 to da93920 Compare May 8, 2025 01:33

ericcurtin approved these changes May 9, 2025

View reviewed changes

ericcurtin merged commit 0527771 into ggml-org:master May 9, 2025
46 checks passed

yeahdongcn mentioned this pull request May 10, 2025

Add support for modelscope and update doc containers/ramalama#1381

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-run: add support for downloading models from ModelScope #13370

llama-run: add support for downloading models from ModelScope #13370

Uh oh!

yeahdongcn commented May 8, 2025

Uh oh!

yeahdongcn commented May 9, 2025

Uh oh!

ericcurtin commented May 9, 2025

Uh oh!

Uh oh!

yeahdongcn commented May 9, 2025

Uh oh!

ericcurtin commented May 9, 2025

Uh oh!

Uh oh!

llama-run: add support for downloading models from ModelScope #13370

llama-run: add support for downloading models from ModelScope #13370

Uh oh!

Conversation

yeahdongcn commented May 8, 2025

Testing Done

Uh oh!

yeahdongcn commented May 9, 2025

Uh oh!

ericcurtin commented May 9, 2025

Uh oh!

Uh oh!

yeahdongcn commented May 9, 2025

Uh oh!

ericcurtin commented May 9, 2025

Uh oh!

Uh oh!