Skip to content

Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan #13450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
soerenkampschroer opened this issue May 11, 2025 · 10 comments · Fixed by #13517
Closed

Comments

@soerenkampschroer
Copy link

soerenkampschroer commented May 11, 2025

Name and Version

version: 5335 (d891942)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for x86_64-apple-darwin24.4.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server, llama-bench, llama-cli

Command line

VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m ~/models/bartowski/Qwen2.5-Coder-3B-GGUF/Qwen2.5-Coder-3B-Q4_K_M.gguf -fa 1

Problem description & steps to reproduce

Flash Attention is not working on macOS / Vulkan. Trying to use it (-fa 1) will result in the following error:

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D128_aligned_f32accf16
ggml_vulkan: vk::Device::createComputePipeline: ErrorInitializationFailed
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
[1]    44288 abort      ./llama-server --port 2108 -m  --n-gpu-layers 200 -fa

This is running on:

  • Intel Mac (15.4.1)
  • AMD 6800
  • MoltenVK v1.3.0
  • Vulkan SDK v1.4.313

@jeffbolznv I've attached the full logs of a llama-bench run with validation layers enabled below.

First Bad Commit

dc1d2ad
#13324

Relevant log output

❯ VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m ~/.cache/sanctum/models/bartowski/Qwen2.5-Coder-3B-GGUF/Qwen2.5-Coder-3B-Q4_K_M.gguf -fa 1
[Vulkan Loader] INFO:           Vulkan Loader Version 1.4.315
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
ggml_vulkan: Validation layers enabled
[Vulkan Loader] INFO:           No valid vk_loader_settings.json file found, no loader settings will be active
[Vulkan Loader] INFO:           Portability enumeration bit was set, enumerating portability drivers.
[Vulkan Loader] LAYER:          Searching for implicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.config/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/xdg/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /etc/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /Users/soeren/.local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/local/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:                /usr/share/vulkan/implicit_layer.d
[Vulkan Loader] LAYER:             Found no files
[Vulkan Loader] LAYER:          Searching for explicit layer manifest files
[Vulkan Loader] LAYER:             In following locations:
[Vulkan Loader] LAYER:                /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d
[Vulkan Loader] LAYER:             Found the following files:
[Vulkan Loader] LAYER:                /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] INFO:           Found manifest file /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json (file version 1.2.0)
[Vulkan Loader] DRIVER:         Searching for driver manifest files
[Vulkan Loader] DRIVER:            In following locations:
[Vulkan Loader] DRIVER:               /Users/soeren/Documents/Projects/llm/llama.cpp-vulkan/build/bin/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.config/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/xdg/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /etc/vulkan/icd.d
[Vulkan Loader] DRIVER:               /Users/soeren/.local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/local/share/vulkan/icd.d
[Vulkan Loader] DRIVER:               /usr/share/vulkan/icd.d
[Vulkan Loader] DRIVER:            Found the following files:
[Vulkan Loader] DRIVER:               /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json
[Vulkan Loader] DRIVER:         Found ICD manifest file /usr/local/etc/vulkan/icd.d/MoltenVK_icd.json, version 1.0.0
[Vulkan Loader] DEBUG | DRIVER: Searching for ICD drivers named ../../../lib/libMoltenVK.dylib
[Vulkan Loader] WARNING | LAYER: env var 'VK_INSTANCE_LAYERS' defined and adding layers "VK_LAYER_KHRONOS_validation"
[Vulkan Loader] WARNING | LAYER: env var 'VK_INSTANCE_LAYERS' defined and adding layers "VK_LAYER_KHRONOS_validation"
[Vulkan Loader] DEBUG | LAYER:  Loading layer library /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] INFO | LAYER:   Insert instance layer "VK_LAYER_KHRONOS_validation" (/usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib)
[Vulkan Loader] LAYER:          vkCreateInstance layer callstack setup to:
[Vulkan Loader] LAYER:             <Application>
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             <Loader>
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             VK_LAYER_KHRONOS_validation
[Vulkan Loader] LAYER:                     Type: Explicit
[Vulkan Loader] LAYER:                     Enabled By: Environment Variable VK_INSTANCE_LAYERS
[Vulkan Loader] LAYER:                     Manifest: /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] LAYER:                     Library:  /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] LAYER:             <Drivers>
Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateInstance(): Attempting to enable deprecated extension VK_EXT_validation_features, but this extension has been deprecated by VK_EXT_layer_settings.

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateInstance(): Attempting to enable extension VK_EXT_validation_features, but this extension is intended to support use by applications when debugging and it is strongly recommended that it be otherwise avoided.

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | threads | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | --------------: | -------------------: |
[Vulkan Loader] INFO | LAYER:   Inserted device layer "VK_LAYER_KHRONOS_validation" (/usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib)
[Vulkan Loader] DRIVER:         vkCreateDevice layer callstack setup to:
[Vulkan Loader] DRIVER:            <Application>
[Vulkan Loader] DRIVER:              ||
[Vulkan Loader] DRIVER:            <Loader>
[Vulkan Loader] DRIVER:              ||
[Vulkan Loader] LAYER:             VK_LAYER_KHRONOS_validation
[Vulkan Loader] LAYER:                     Type: Explicit
[Vulkan Loader] LAYER:                     Enabled By: Environment Variable VK_INSTANCE_LAYERS
[Vulkan Loader] LAYER:                     Manifest: /usr/local/opt/vulkan-validationlayers/share/vulkan/explicit_layer.d/VkLayer_khronos_validation.json
[Vulkan Loader] LAYER:                     Library:  /usr/local/opt/vulkan-validationlayers/lib/libVkLayer_khronos_validation.dylib
[Vulkan Loader] LAYER:               ||
[Vulkan Loader] DRIVER:            <Device>
Validation Error: [ VUID-VkDeviceCreateInfo-pProperties-04451 ] | MessageID = 0x3a3b6ca0
vkCreateDevice(): VK_KHR_portability_subset must be enabled because physical device VkPhysicalDevice 0x600002793be0 supports it.
The Vulkan spec states: If the VK_KHR_portability_subset extension is included in pProperties of vkEnumerateDeviceExtensionProperties, ppEnabledExtensionNames must include "VK_KHR_portability_subset" (https://docs.vulkan.org/spec/latest/chapters/devsandqueues.html#VUID-VkDeviceCreateInfo-pProperties-04451)
Objects: 1
    [0] VkPhysicalDevice 0x600002793be0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x7ff32f00f200

[Vulkan Loader] DRIVER:                Using "AMD Radeon RX 6800" with driver: "/usr/local/etc/vulkan/icd.d/../../../lib/libMoltenVK.dylib"
Validation Warning: [ BestPractices ] | MessageID = 0xda0b64be
vkFreeMemory(): VK Object VkBuffer 0x80000000008 still has a reference to mem obj VkDeviceMemory 0x90000000009.
Objects: 2
    [0] VkBuffer 0x80000000008
    [1] VkDeviceMemory 0x90000000009

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-Error-Result ] | MessageID = 0x53c1342f
vkCreateComputePipelines(): Returned error VK_ERROR_INITIALIZATION_FAILED.

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D128_aligned_f32accf16
ggml_vulkan: vk::Device::createComputePipeline: ErrorInitializationFailed
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
[1]    23747 abort      VK_LOADER_DEBUG=all ./llama-bench -ngl 99 -m  -fa 1
@jeffbolznv
Copy link
Collaborator

Hmm, nothing obviously wrong in the validation logs. Are you able to capture the metal shader that is failing?

@soerenkampschroer
Copy link
Author

Sure, I ran test_backend_ops with only this test enabled:

ggml_vulkan: Compute pipeline creation failed for flash_attn_f32_f16_D64_aligned_f32acc_smallrowsf16

llama.gputrace.zip

@jeffbolznv
Copy link
Collaborator

Is there source for the translated metal shader in there? I don't know how to decode that.

@soerenkampschroer
Copy link
Author

You're right, the trace seems to be corrupted. Maybe because test-backend-ops is crashing too soon? The trace was done through moltenvk like this:
export METAL_CAPTURE_ENABLED=1 MVK_CONFIG_AUTO_GPU_CAPTURE_SCOPE=1 MVK_CONFIG_AUTO_GPU_CAPTURE_OUTPUT_FILE=llama.gputrace

I'm currently trying to do it through xcode by attaching the debugger to the process, but that also seems to crash before it can capture anything useful. I can see that the pipeline creation error already happened in terminal, but there is nothing to capture in xcode. Nothing in the "FPS" tab and the option "Capture GPU Workload" is greyed out.

Image

@0cc4m
Copy link
Collaborator

0cc4m commented May 12, 2025

Is there a way to enable debug output for MoltenVK? On our side it's successfully reporting vk::Device::createComputePipeline: ErrorInitializationFailed, which causes the crash. There should be info on the MoltenVK side what exactly caused it to fail the pipeline initialization.

@soerenkampschroer
Copy link
Author

I think I got something more useful by installing the debug version of MoltenVK. The trace is still corrupted, but the logs are more verbose and there is what looks like shader code in there. It's quite long, so I'm attaching it as a text file.

moltenvk-debug-logs.txt

@jeffbolznv
Copy link
Collaborator

Thanks, there are several errors related to use of gl_WorkGroupSize.x as a constant expression:

[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Shader library compile failed (Error code 3):
program_source:199:39: error: non-type template argument is not a constant expression
    threadgroup spvUnsafeArray<float, _1008> _1011;
                                      ^~~~~
program_source:199:39: note: initializer of '_1008' is not a constant expression
program_source:168:15: note: declared here
constant uint _1008 = gl_WorkGroupSize.x;
              ^

Maybe we need to declare another variable using the same spec id and use that for the shared memory variables?

@jeffbolznv
Copy link
Collaborator

@soerenkampschroer please try this change: e1c331f

@soerenkampschroer
Copy link
Author

Your commit seems to have fixed the issue, no more errors:

❯ ./llama-bench -ngl 99 -m ~/.cache/sanctum/models/bartowski/Qwen2.5-Coder-3B-GGUF/Qwen2.5-Coder-3B-Q4_K_M.gguf -fa 0,1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

model size params backend threads fa test t/s
qwen2 3B Q4_K - Medium 1.79 GiB 3.09 B Vulkan,BLAS 6 0 pp512 859.49 ± 0.56
qwen2 3B Q4_K - Medium 1.79 GiB 3.09 B Vulkan,BLAS 6 0 tg128 112.31 ± 3.79
qwen2 3B Q4_K - Medium 1.79 GiB 3.09 B Vulkan,BLAS 6 1 pp512 711.90 ± 0.31
qwen2 3B Q4_K - Medium 1.79 GiB 3.09 B Vulkan,BLAS 6 1 tg128 76.10 ± 0.24

@soerenkampschroer
Copy link
Author

Just wanted to clear up that the performace gap is not as big with larger models. I haven't had the time to really test it, but the numbers for larger models are way better:

./llama-bench -ngl 30 -m ~/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf -fa 0,1

model size params backend threads fa test t/s
qwen3moe 30B.A3B Q4_K - Medium 17.35 GiB 30.53 B Vulkan,BLAS 6 0 pp512 49.16 ± 0.17
qwen3moe 30B.A3B Q4_K - Medium 17.35 GiB 30.53 B Vulkan,BLAS 6 0 tg128 24.77 ± 0.93
qwen3moe 30B.A3B Q4_K - Medium 17.35 GiB 30.53 B Vulkan,BLAS 6 1 pp512 47.20 ± 0.68
qwen3moe 30B.A3B Q4_K - Medium 17.35 GiB 30.53 B Vulkan,BLAS 6 1 tg128 23.18 ± 0.27

./llama-bench -ngl 100 -m ~/models/bartowski/Qwen_Qwen3-14B-GGUF/Qwen_Qwen3-14B-Q4_K_M.gguf -fa 0,1

model size params backend threads fa test t/s
qwen3 14B Q4_K - Medium 8.38 GiB 14.77 B Vulkan,BLAS 6 0 pp512 196.93 ± 0.40
qwen3 14B Q4_K - Medium 8.38 GiB 14.77 B Vulkan,BLAS 6 0 tg128 40.23 ± 0.43
qwen3 14B Q4_K - Medium 8.38 GiB 14.77 B Vulkan,BLAS 6 1 pp512 170.01 ± 0.13
qwen3 14B Q4_K - Medium 8.38 GiB 14.77 B Vulkan,BLAS 6 1 tg128 35.14 ± 0.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants