Problem Statement
In Ollama, if a model doesn’t fully fit in GPU memory, it automatically offloads to CPU. Jan lacks this behavior, if the model can’t fit entirely in GPU, it fails instead of falling back.
User comment: https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylpbzi/?utm_source=share
Feature Idea
Implement GPU-CPU fallback support similar to Ollama, allowing seamless model loading even if GPU memory is insufficient. This would improve compatibility across more hardware setups and prevent runtime errors.