Skip to content

feat: Support GPU-CPU offloading fallback when model doesn’t fit in GPU (like Ollama) #5499

@eckartal

Description

@eckartal

Problem Statement

In Ollama, if a model doesn’t fully fit in GPU memory, it automatically offloads to CPU. Jan lacks this behavior, if the model can’t fit entirely in GPU, it fails instead of falling back.

User comment: https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylpbzi/?utm_source=share

Feature Idea

Implement GPU-CPU fallback support similar to Ollama, allowing seamless model loading even if GPU memory is insufficient. This would improve compatibility across more hardware setups and prevent runtime errors.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

QA

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions