v0.3.1

Latest

Latest

qiyuxinlin released this 17 May 07:28

· 12 commits to main since this release

32f3d7b

🚀 New Features

Intel Arc support @aubreyli @rnwang04

⚡ Performance Improvements

DeepSeek-R1 Q4 decoding @ 7.5 tokens/s
Measured on a single-socket Xeon + DDR5 4800 MT/s + A770 platform; enabling dual-NUMA delivers additional speedups.
Easy benchmarking
Try it yourself with the local_chat script to see these gains firsthand.

🔜 What’s Next

Balance_serve integration
We’re working to seamlessly merge Intel GPU operators into the balance_serve backend for end-to-end support and streamlined maintenance.

Contributors

aubreyli and rnwang04

Assets 2