-
Notifications
You must be signed in to change notification settings - Fork 858
[new blog post] introducing auto-round #2826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@IlyasMoutawwakil @SunMarc Please help review the PR. Thanks. |
cc @MekkCyber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great blogpost 🔥 !
Small nit : for images other than the thumbnail, it's better to keep them here : https://huggingface.co/datasets/huggingface/documentation-images, you can open a pr, add them, and ping me or someone else to merge it.
Example of how to use the images after adding them : https://github.com/huggingface/blog/blob/main/1_58_llm_extreme_quantization.md
_blog.yml
Outdated
date: April 23, 2025 | ||
tags: | ||
- llms | ||
- inference | ||
- quantization | ||
- intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to change the date before publishing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !
There are many line unnecessary line breaks (in the middle of sentences), probably from copy-pasting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this blogpost ! One minor concern I have is that this is a bit too detailed sometimes, feel a bit too much like documentation. It could be interesting to make it a bit lighter (less snippets) and link transformers docs for more details. WDTY ?
autoround.md
Outdated
# What is AutoRound? | ||
|
||
**AutoRound** is a weight-only post-training quantization (PTQ) method developed by Intel. It uses signed gradient | ||
descent to jointly optimize weight rounding and clipping ranges, enabling accurate low-bit quantization (e.g., | ||
INT2 - INT8) with minimal accuracy loss in most scenarios. For example, at INT2, it outperforms popular baselines by up to **2.1x higher in relative accuracy**. | ||
|
||
Despite its strong performance, AutoRound is fast and lightweight — quantizing a 72B model takes just **37 minutes on an | ||
A100 GPU** under light mode. It also supports mixed-bit tuning, lm-head quantization, GPTQ/AWQ/GGUF format exporting, and flexible tuning recipes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it could be a nice addition to give more details on how auto-round quantization work !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain the AutoRound algorithm in detail, some background and mathematical concepts need to be introduced.
To simplify this blog, I’ve included an image that provides an overview of the algorithm and recommend readers refer to our paper for more details.
I've removed some details. Please check if there are any other parts that need to be deleted/refined |
Thanks, the pr is here. I will switch to these images after it merged |
Most issues have been addressed. Please review it again when you have a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating ! This is much better. Let us know when you want this merged !
Everything is ready. Could you kindly merge it at your convenience? Thank you in advance! |
@IlyasMoutawwakil @MekkCyber please review and approve. Thanks. |
Thank you @wenhuach21 @SunMarc @IlyasMoutawwakil @MekkCyber ! |
don't forget to link authors to the Intel org like i've done in dc8c591 This brings more visibility to the organization, and displays the blogpost on the org page: ![]() |
Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.
Preparing the Article
You're not quite done yet, though. Please make sure to follow this process (as documented here):
md
file. You can also specifyguest
ororg
for the authors.Here is an example of a complete PR: #2382
Getting a Review
Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.
Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.