Skip to content

[Bug]: Use much more VRAM to chat with Ollama models than running in cmd #7981

Open
@konn-submarine-bu

Description

@konn-submarine-bu

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

.

RAGFlow image version

nightly

Other environment information

Actual behavior

when i excute ollama run qwen3:32b, and ollama ps
it shows 45g VRAM occupied.
But when i try bind this model to ragflow, it shows need 71g VRAM and results 30% parameters loaded on CPU
I was thinking its ollama problem, however I uninstall and install it with many versions, there still is this issue.
I dont know whether it is because of ragflow.
Could you clarify how ragflow invokes ollama server?

Expected behavior

No response

Steps to reproduce

fail to bind qwen3:32b to ragflow although run it smoothly in cmd

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working, pull request that fix bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions