Replies: 1 comment
-
Likely you are running out of context, because you set just Try to replace this argument with the following in order to utilize the full context size of the model:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Forgive me, I'm a total noob to LLMs and llama.cpp.
I'm trying to couple llama-server.exe with an Open WebUI frontend. While I have it working, after the first prompt goes through and finishes, my GPU/CPU stays maxed out as though it's still generating.
This is what I run:
./llama-server --model "C:\Users\admin\.lmstudio\models\lmstudio-community\DeepSeek-R1-0528-Qwen3-8B-GGUF\DeepSeek-R1-0528-Qwen3-8B-Q4_K_M.gguf" --port 10000 --ctx-size 1024 --n-gpu-layers 40 --alias "DeepSeek-R1-8B"
This is the relevant part of the response after the prompt and before I manually terminate:
Beta Was this translation helpful? Give feedback.
All reactions