feat: Support GPU-CPU offloading fallback when model doesn’t fit in GPU (like Ollama)

### Problem Statement

In Ollama, if a model doesn’t fully fit in GPU memory, it automatically offloads to CPU. Jan lacks this behavior, if the model can’t fit entirely in GPU, it fails instead of falling back.

User comment: [https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylpbzi/?utm_source=share](https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylpbzi/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)

### Feature Idea

Implement GPU-CPU fallback support similar to Ollama, allowing seamless model loading even if GPU memory is insufficient. This would improve compatibility across more hardware setups and prevent runtime errors.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support GPU-CPU offloading fallback when model doesn’t fit in GPU (like Ollama) #5499

Problem Statement

Feature Idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Support GPU-CPU offloading fallback when model doesn’t fit in GPU (like Ollama) #5499

Description

Problem Statement

Feature Idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions