Qualcomm AI Engine Direct - Add block quantization to llama #10225

chunit-quic · 2025-04-16T06:29:25Z

Add CLI argument to use block quantization for llama

pytorch-bot · 2025-04-16T06:29:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5b3d1ec with merge base 4559a61 ():

NEW FAILURE - The following job has failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chunit-quic · 2025-04-16T06:30:51Z

Hi @cccclai,

It's a straightforward PR. The goal is to add an argument to support LPBQ (Low Power Block Quantization) for LLaMA. Thank you!

- Add CLI argument to use block quantization for llama

cccclai · 2025-04-16T17:10:22Z

Oh wow, this is awesome. Any chance you have data point on this?

facebook-github-bot · 2025-04-16T17:10:43Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

Thanks!

…10225) - Add CLI argument to use block quantization for llama Co-authored-by: Chun-I Tsai <chunit@qti.qualcomm.com>

chunit-quic requested a review from cccclai as a code owner April 16, 2025 06:29

facebook-github-bot added the CLA Signed label Apr 16, 2025

Qualcomm AI Engine Direct - Add block quantization to llama

5b3d1ec

- Add CLI argument to use block quantization for llama

chunit-quic force-pushed the add_llama_block_quant branch from 8da6241 to 5b3d1ec Compare April 16, 2025 07:09

cccclai added the release notes: qualcomm label Apr 16, 2025

cccclai approved these changes Apr 16, 2025

View reviewed changes

cccclai merged commit e6c7b30 into pytorch:main Apr 16, 2025
89 of 90 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - Add block quantization to llama #10225

Qualcomm AI Engine Direct - Add block quantization to llama #10225

chunit-quic commented Apr 16, 2025

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading

chunit-quic commented Apr 16, 2025

cccclai commented Apr 16, 2025

facebook-github-bot commented Apr 16, 2025

cccclai left a comment

Qualcomm AI Engine Direct - Add block quantization to llama #10225

Qualcomm AI Engine Direct - Add block quantization to llama #10225

Conversation

chunit-quic commented Apr 16, 2025

pytorch-bot bot commented Apr 16, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10225

❌ 1 New Failure

chunit-quic commented Apr 16, 2025

cccclai commented Apr 16, 2025

facebook-github-bot commented Apr 16, 2025

cccclai left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading