You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing quite frequent CUDA illegal memory access errors.
I've seen this with Scout and/or Maverick before too, but this particular instance is when running mradermacher's Dolphin-Mistral-24B-Venice-Edition-i1-GGUF/Dolphin-Mistral-24B-Venice-Edition.i1-IQ4_XS.gguf.
I believe I have ruled out a card/VRAM/PCIe issue, since I consistently get this error despite various combinations of CUDA_VISIBLE_DEVICES (and confirming in nvtop that the cards reported as problematic have since been excluded). The cards are on risers, but short, high-quality ones, and different risers on different cards, so I don't think that's relevant. The different cards have different PCI versions and lane counts, too.
First Bad Commit
No response
Relevant log output
llama-cpp-chat-1 | /app/ggml/src/ggml-cuda/ggml-cuda.cu:75: CUDA error
llama-cpp-chat-1 | CUDA error: an illegal memory access was encountered
llama-cpp-chat-1 | current device: 0, infunctionggml_backend_cuda_cpy_tensor_async at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2428
llama-cpp-chat-1 | cudaMemcpyPeerAsync(dst->data, cuda_ctx_dst->device, src->data, cuda_ctx_src->device, ggml_nbytes(dst), cuda_ctx_src->stream())
The text was updated successfully, but these errors were encountered:
Just got this with a Q6_K_L model, so not an I-quant issue.
llama-cpp-chat-1 | /app/ggml/src/ggml-cuda/ggml-cuda.cu:75: CUDA error
llama-cpp-chat-1 | CUDA error: an illegal memory access was encountered
llama-cpp-chat-1 | current device: 0, in function ggml_backend_cuda_cpy_tensor_async at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2428
llama-cpp-chat-1 | cudaMemcpyPeerAsync(dst->data, cuda_ctx_dst->device, src->data, cuda_ctx_src->device, ggml_nbytes(dst), cuda_ctx_src->stream())
4x Founders Edition 3090's, MSI MAG x670e Tomahawk, short (15--20cm) PCIe riser cables, 7950x CPU, 128MB system RAM. The risers seem to be high quality, and necessary to fit the 3-slot cards to the board, unfortunately. Definitely a bit more janky than I'd like though, and could be causing some PCIe bus issue.
Name and Version
ghcr.io/ggerganov/llama.cpp:server-cuda
docker.compose.image=sha256:f608f747701dc4df42f89cab23a6d6b556889f4454737395e988fd3a94e41b45
llama-cpp-chat-1 | build: 5332 (7c28a74) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
I'm seeing quite frequent CUDA illegal memory access errors.
I've seen this with Scout and/or Maverick before too, but this particular instance is when running mradermacher's Dolphin-Mistral-24B-Venice-Edition-i1-GGUF/Dolphin-Mistral-24B-Venice-Edition.i1-IQ4_XS.gguf.
I believe I have ruled out a card/VRAM/PCIe issue, since I consistently get this error despite various combinations of CUDA_VISIBLE_DEVICES (and confirming in nvtop that the cards reported as problematic have since been excluded). The cards are on risers, but short, high-quality ones, and different risers on different cards, so I don't think that's relevant. The different cards have different PCI versions and lane counts, too.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: