Skip to content

Misc. bug: Illegal CUDA memory access in ggml_backend_cuda_cpy_tensor_async #13449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lee-b opened this issue May 11, 2025 · 5 comments
Open

Comments

@lee-b
Copy link

lee-b commented May 11, 2025

Name and Version

ghcr.io/ggerganov/llama.cpp:server-cuda

docker.compose.image=sha256:f608f747701dc4df42f89cab23a6d6b556889f4454737395e988fd3a94e41b45

llama-cpp-chat-1 | build: 5332 (7c28a74) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

image: ghcr.io/ggerganov/llama.cpp:server-cuda

    command:
      - "-a"
      - "llama-cpp-chat"

      - "-a"
      - "llama-cpp-chat"

      - "-ctv"
      - "q4_0"

      - "-ctv"
      - "q4_0"

      - "--slot-save-path"
      - "/prompt-cache/slot-saves"

      - "-fa"
      - "-m"
      - "/models/mradermacher--Dolphin-Mistral-24B-Venice-Edition-i1-GGUF/Dolphin-Mistral-24B-Venice-Edition.i1-IQ4_XS.gguf"
      - "-np"
      - "2"
      - "-c"
      - "131072"
      - "-ngl"
      - "500"
      - "--cache-reuse"
      - "25"


    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: [ "0", "2", "3" ]
              capabilities: [gpu]

Problem description & steps to reproduce

I'm seeing quite frequent CUDA illegal memory access errors.

I've seen this with Scout and/or Maverick before too, but this particular instance is when running mradermacher's Dolphin-Mistral-24B-Venice-Edition-i1-GGUF/Dolphin-Mistral-24B-Venice-Edition.i1-IQ4_XS.gguf.

I believe I have ruled out a card/VRAM/PCIe issue, since I consistently get this error despite various combinations of CUDA_VISIBLE_DEVICES (and confirming in nvtop that the cards reported as problematic have since been excluded). The cards are on risers, but short, high-quality ones, and different risers on different cards, so I don't think that's relevant. The different cards have different PCI versions and lane counts, too.

First Bad Commit

No response

Relevant log output

llama-cpp-chat-1  | /app/ggml/src/ggml-cuda/ggml-cuda.cu:75: CUDA error
llama-cpp-chat-1  | CUDA error: an illegal memory access was encountered
llama-cpp-chat-1  |   current device: 0, in function ggml_backend_cuda_cpy_tensor_async at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2428
llama-cpp-chat-1  |   cudaMemcpyPeerAsync(dst->data, cuda_ctx_dst->device, src->data, cuda_ctx_src->device, ggml_nbytes(dst), cuda_ctx_src->stream())
@lee-b
Copy link
Author

lee-b commented May 11, 2025

Possibly related (yet different crash point) CUDA illegal memory access issue with Scout: #13281.

@lee-b
Copy link
Author

lee-b commented May 11, 2025

Just got this with a Q6_K_L model, so not an I-quant issue.

llama-cpp-chat-1  | /app/ggml/src/ggml-cuda/ggml-cuda.cu:75: CUDA error
llama-cpp-chat-1  | CUDA error: an illegal memory access was encountered
llama-cpp-chat-1  |   current device: 0, in function ggml_backend_cuda_cpy_tensor_async at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2428
llama-cpp-chat-1  |   cudaMemcpyPeerAsync(dst->data, cuda_ctx_dst->device, src->data, cuda_ctx_src->device, ggml_nbytes(dst), cuda_ctx_src->stream())

@JohannesGaessler
Copy link
Collaborator

What hardware are you using?

@JohannesGaessler
Copy link
Collaborator

This isn't going to fix the bug, but you can try compiling with GGML_CUDA_NO_PEER_COPY.

@lee-b
Copy link
Author

lee-b commented May 11, 2025

What hardware are you using?

4x Founders Edition 3090's, MSI MAG x670e Tomahawk, short (15--20cm) PCIe riser cables, 7950x CPU, 128MB system RAM. The risers seem to be high quality, and necessary to fit the 3-slot cards to the board, unfortunately. Definitely a bit more janky than I'd like though, and could be causing some PCIe bus issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants