Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

soerenkampschroer · 2025-05-11T10:43:52Z

Name and Version

version: 5335 (d891942)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for x86_64-apple-darwin24.4.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server, llama-bench, llama-cli

Command line

VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m ~/models/bartowski/Qwen2.5-Coder-3B-GGUF/Qwen2.5-Coder-3B-Q4_K_M.gguf -fa 1

Problem description & steps to reproduce

Flash Attention is not working on macOS / Vulkan. Trying to use it (-fa 1) will result in the following error:

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D128_aligned_f32accf16
ggml_vulkan: vk::Device::createComputePipeline: ErrorInitializationFailed
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
[1]    44288 abort      ./llama-server --port 2108 -m  --n-gpu-layers 200 -fa

This is running on:

Intel Mac (15.4.1)
AMD 6800
MoltenVK v1.3.0
Vulkan SDK v1.4.313

@jeffbolznv I've attached the full logs of a llama-bench run with validation layers enabled below.

First Bad Commit

dc1d2ad
#13324

Relevant log output

❯ VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m ~/.cache/sanctum/models/bartowski/Qwen2.5-Coder-3B-GGUF/Qwen2.5-Coder-3B-Q4_K_M.gguf -fa 1
[Vulkan Loader] INFO:           Vulkan Loader Version 1.4.315
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
ggml_vulkan: Validation layers enabled
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] INFO:           Portability enumeration bit was set, enumerating portability drivers.
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] LAYER:          Searching for explicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d
[Vulkan Loader] LAYER:             Found the following files:
[Vulkan Loader] LAYER:                /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] INFO:           Found manifest file /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json (file version 1.2.0)
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] WARNING | LAYER: env var 'VK_INSTANCE_LAYERS' defined and adding layers "VK_LAYER_KHRONOS_validation"
[Vulkan Loader] WARNING | LAYER: env var 'VK_INSTANCE_LAYERS' defined and adding layers "VK_LAYER_KHRONOS_validation"
[Vulkan Loader] DEBUG | LAYER:  Loading layer library /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] INFO | LAYER:   Insert instance layer "VK_LAYER_KHRONOS_validation" (/usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib)
[Vulkan Loader] LAYER:          vkCreateInstance layer callstack setup to:
[Vulkan Loader] LAYER:             <Application>
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             <Loader>
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             VK_LAYER_KHRONOS_validation
[Vulkan Loader] LAYER:                     Type: Explicit
[Vulkan Loader] LAYER:                     Enabled By: Environment Variable VK_INSTANCE_LAYERS
[Vulkan Loader] LAYER:                     Manifest: /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] LAYER:                     Library:  /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             <Drivers>
Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateInstance(): Attempting to enable deprecated extension VK_EXT_validation_features, but this extension has been deprecated by VK_EXT_layer_settings.

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateInstance(): Attempting to enable extension VK_EXT_validation_features, but this extension is intended to support use by applications when debugging and it is strongly recommended that it be otherwise avoided.

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | threads | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | --------------: | -------------------: |
[Vulkan Loader] INFO | LAYER:   Inserted device layer "VK_LAYER_KHRONOS_validation" (/usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib)
[Vulkan Loader] DRIVER:         vkCreateDevice layer callstack setup to:
[Vulkan Loader] DRIVER:            <Application>
[Vulkan Loader] DRIVER:              ||
[Vulkan Loader] DRIVER:            <Loader>
[Vulkan Loader] DRIVER:              ||
[Vulkan Loader] LAYER:             VK_LAYER_KHRONOS_validation
[Vulkan Loader] LAYER:                     Type: Explicit
[Vulkan Loader] LAYER:                     Enabled By: Environment Variable VK_INSTANCE_LAYERS
[Vulkan Loader] LAYER:                     Manifest: /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] LAYER:                     Library:  /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] DRIVER:            <Device>
Validation Error: [ VUID-VkDeviceCreateInfo-pProperties-04451 ] | MessageID = 0x3a3b6ca0
vkCreateDevice(): VK_KHR_portability_subset must be enabled because physical device VkPhysicalDevice 0x600002793be0 supports it.
The Vulkan spec states: If the VK_KHR_portability_subset extension is included in pProperties of vkEnumerateDeviceExtensionProperties, ppEnabledExtensionNames must include "VK_KHR_portability_subset" (https://docs.vulkan.org/spec/latest/chapters/devsandqueues.html#VUID-VkDeviceCreateInfo-pProperties-04451)
Objects: 1
    [0] VkPhysicalDevice 0x600002793be0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

[Vulkan Loader] DRIVER:                Using "AMD Radeon RX 6800" with driver: "/usr/local/etc/vulkan/icd.d/../../../lib/libMoltenVK.dylib"
Validation Warning: [ BestPractices ] | MessageID = 0xda0b64be
vkFreeMemory(): VK Object VkBuffer 0x80000000008 still has a reference to mem obj VkDeviceMemory 0x90000000009.
Objects: 2
    [0] VkBuffer 0x80000000008
    [1] VkDeviceMemory 0x90000000009

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-Error-Result ] | MessageID = 0x53c1342f
vkCreateComputePipelines(): Returned error VK_ERROR_INITIALIZATION_FAILED.

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D128_aligned_f32accf16
ggml_vulkan: vk::Device::createComputePipeline: ErrorInitializationFailed
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
[1]    23747 abort      VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m  -fa 1

The text was updated successfully, but these errors were encountered:

jeffbolznv · 2025-05-11T23:47:21Z

Hmm, nothing obviously wrong in the validation logs. Are you able to capture the metal shader that is failing?

soerenkampschroer · 2025-05-12T00:32:00Z

Sure, I ran test_backend_ops with only this test enabled:

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D64_aligned_f32acc_smallrowsf16

llama.gputrace.zip

jeffbolznv · 2025-05-12T00:40:01Z

Is there source for the translated metal shader in there? I don't know how to decode that.

soerenkampschroer · 2025-05-12T13:55:21Z

You're right, the trace seems to be corrupted. Maybe because test-backend-ops is crashing too soon? The trace was done through moltenvk like this:
export METAL_CAPTURE_ENABLED=1 MVK_CONFIG_AUTO_GPU_CAPTURE_SCOPE=1 MVK_CONFIG_AUTO_GPU_CAPTURE_OUTPUT_FILE=llama.gputrace

I'm currently trying to do it through xcode by attaching the debugger to the process, but that also seems to crash before it can capture anything useful. I can see that the pipeline creation error already happened in terminal, but there is nothing to capture in xcode. Nothing in the "FPS" tab and the option "Capture GPU Workload" is greyed out.

0cc4m · 2025-05-12T14:48:05Z

Is there a way to enable debug output for MoltenVK? On our side it's successfully reporting vk::Device::createComputePipeline: ErrorInitializationFailed, which causes the crash. There should be info on the MoltenVK side what exactly caused it to fail the pipeline initialization.

soerenkampschroer · 2025-05-12T19:24:20Z

I think I got something more useful by installing the debug version of MoltenVK. The trace is still corrupted, but the logs are more verbose and there is what looks like shader code in there. It's quite long, so I'm attaching it as a text file.

moltenvk-debug-logs.txt

jeffbolznv · 2025-05-12T22:38:38Z

Thanks, there are several errors related to use of gl_WorkGroupSize.x as a constant expression:

[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Shader library compile failed (Error code 3):
program_source:199:39: error: non-type template argument is not a constant expression
    threadgroup spvUnsafeArray<float, _1008> _1011;
                                      ^~~~~
program_source:199:39: note: initializer of '_1008' is not a constant expression
program_source:168:15: note: declared here
constant uint _1008 = gl_WorkGroupSize.x;
              ^

Maybe we need to declare another variable using the same spec id and use that for the shared memory variables?

jeffbolznv · 2025-05-13T13:48:42Z

@soerenkampschroer please try this change: e1c331f

soerenkampschroer · 2025-05-13T16:46:18Z

Your commit seems to have fixed the issue, no more errors:

model	size	params	backend	threads	fa	test	t/s
qwen2 3B Q4_K - Medium	1.79 GiB	3.09 B	Vulkan,BLAS	6	0	pp512	859.49 ± 0.56
qwen2 3B Q4_K - Medium	1.79 GiB	3.09 B	Vulkan,BLAS	6	0	tg128	112.31 ± 3.79
qwen2 3B Q4_K - Medium	1.79 GiB	3.09 B	Vulkan,BLAS	6	1	pp512	711.90 ± 0.31
qwen2 3B Q4_K - Medium	1.79 GiB	3.09 B	Vulkan,BLAS	6	1	tg128	76.10 ± 0.24

soerenkampschroer · 2025-05-13T19:16:44Z

Just wanted to clear up that the performace gap is not as big with larger models. I haven't had the time to really test it, but the numbers for larger models are way better:

./llama-bench -ngl 30 -m ~/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf -fa 0,1

model	size	params	backend	threads	fa	test	t/s
qwen3moe 30B.A3B Q4_K - Medium	17.35 GiB	30.53 B	Vulkan,BLAS	6	0	pp512	49.16 ± 0.17
qwen3moe 30B.A3B Q4_K - Medium	17.35 GiB	30.53 B	Vulkan,BLAS	6	0	tg128	24.77 ± 0.93
qwen3moe 30B.A3B Q4_K - Medium	17.35 GiB	30.53 B	Vulkan,BLAS	6	1	pp512	47.20 ± 0.68
qwen3moe 30B.A3B Q4_K - Medium	17.35 GiB	30.53 B	Vulkan,BLAS	6	1	tg128	23.18 ± 0.27

./llama-bench -ngl 100 -m ~/models/bartowski/Qwen_Qwen3-14B-GGUF/Qwen_Qwen3-14B-Q4_K_M.gguf -fa 0,1

model	size	params	backend	threads	fa	test	t/s
qwen3 14B Q4_K - Medium	8.38 GiB	14.77 B	Vulkan,BLAS	6	0	pp512	196.93 ± 0.40
qwen3 14B Q4_K - Medium	8.38 GiB	14.77 B	Vulkan,BLAS	6	0	tg128	40.23 ± 0.43
qwen3 14B Q4_K - Medium	8.38 GiB	14.77 B	Vulkan,BLAS	6	1	pp512	170.01 ± 0.13
qwen3 14B Q4_K - Medium	8.38 GiB	14.77 B	Vulkan,BLAS	6	1	tg128	35.14 ± 0.17

soerenkampschroer added the bug-unconfirmed label May 11, 2025

jeffbolznv mentioned this issue May 13, 2025

vulkan: workaround FA compile failures on macos #13517

Merged

0cc4m closed this as completed in #13517 May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

soerenkampschroer commented May 11, 2025 •

edited

Loading

jeffbolznv commented May 11, 2025

soerenkampschroer commented May 12, 2025

jeffbolznv commented May 12, 2025

soerenkampschroer commented May 12, 2025

0cc4m commented May 12, 2025

soerenkampschroer commented May 12, 2025

jeffbolznv commented May 12, 2025

jeffbolznv commented May 13, 2025

soerenkampschroer commented May 13, 2025

soerenkampschroer commented May 13, 2025

Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

Comments

soerenkampschroer commented May 11, 2025 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

jeffbolznv commented May 11, 2025

soerenkampschroer commented May 12, 2025

jeffbolznv commented May 12, 2025

soerenkampschroer commented May 12, 2025

0cc4m commented May 12, 2025

soerenkampschroer commented May 12, 2025

jeffbolznv commented May 12, 2025

jeffbolznv commented May 13, 2025

soerenkampschroer commented May 13, 2025

soerenkampschroer commented May 13, 2025

soerenkampschroer commented May 11, 2025 •

edited

Loading