CUDA: fix FlashAttention on Turing #13415

JohannesGaessler · 2025-05-09T21:35:15Z

The problem is that the V data is not being loaded correctly for nstages == 0 (no asynchronous data loading available) and I forgot to test this configuration prior to merging. Sorry!

Dampfinchen · 2025-05-09T22:02:35Z

Nothing to apologize for, thanks for fixing it! :)

Fast fix, fast test - I can confirm it works again after this PR. Thanks for your great work!

This reverts commit d891942.

CUDA: fix FlashAttention on Turing

6fe0f09

JohannesGaessler mentioned this pull request May 9, 2025

CUDA: FA support for Deepseek (Ampere or newer) #13306

Merged

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 9, 2025

slaren approved these changes May 9, 2025

View reviewed changes

JohannesGaessler merged commit d891942 into ggml-org:master May 10, 2025
39 of 40 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 14, 2025

Revert "CUDA: fix FlashAttention on Turing (ggml-org#13415)"

a8e1c88

This reverts commit d891942.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix FlashAttention on Turing #13415

CUDA: fix FlashAttention on Turing #13415

JohannesGaessler commented May 9, 2025

Dampfinchen commented May 9, 2025 •

edited

Loading

CUDA: fix FlashAttention on Turing #13415

CUDA: fix FlashAttention on Turing #13415

Conversation

JohannesGaessler commented May 9, 2025

Dampfinchen commented May 9, 2025 • edited Loading

Dampfinchen commented May 9, 2025 •

edited

Loading