[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

pytorchbot · 2025-05-08T06:36:16Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #10525 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/220/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/220/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/220/orig
@diff-train-skip-merge

cc @SS-JIA @manuelcandales @cbilgin

Pull Request resolved: #10525 ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Added some refactors to `test_vulkan_delegate` Python test to improve code organization. Introduce the `linear_qcsnw` nomenclature: * q - quantized * c - per-channel / channelswise * s - symmetric * n - number of bits (qcs4w for 4-bit quant, qcs8w for 8-bit quant) * w - weight quantized Added custom op for `linear_qcs4w` for 4-bit weight quantized linear and add the ability for the quantized op fusion pass to produce this op. Slight renaming/refactoring of quantization config retrieval functions in the `VulkanQuantizer` to improve clarity and API flexibility. ghstack-source-id: 282688199 @exported-using-ghexport Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)

pytorch-bot · 2025-05-08T06:36:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10771

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…10771) ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)

pytorchbot requested review from jackzhxng, iseeyuan, larryliu0820, swolchok, SS-JIA and kimishpatel as code owners May 8, 2025 06:36

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 8, 2025

SS-JIA added module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ release notes: vulkan Changes to the Vulkan backend delegate and removed module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ labels May 8, 2025

SS-JIA approved these changes May 8, 2025

View reviewed changes

SS-JIA merged commit b1d00e2 into main May 8, 2025
82 of 85 checks passed

SS-JIA deleted the gh/SS-JIA/220/orig branch May 8, 2025 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

pytorchbot commented May 8, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 8, 2025

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

Conversation

pytorchbot commented May 8, 2025 • edited by pytorch-bot bot Loading

pytorch-bot bot commented May 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10771

pytorchbot commented May 8, 2025 •

edited by pytorch-bot bot

Loading