Skip to content

v0.3.1

Latest
Compare
Choose a tag to compare
@qiyuxinlin qiyuxinlin released this 17 May 07:28
· 12 commits to main since this release
32f3d7b

🚀 New Features

⚡ Performance Improvements

  • DeepSeek-R1 Q4 decoding @ 7.5 tokens/s
    Measured on a single-socket Xeon + DDR5 4800 MT/s + A770 platform; enabling dual-NUMA delivers additional speedups.

  • Easy benchmarking
    Try it yourself with the local_chat script to see these gains firsthand.

🔜 What’s Next

  • Balance_serve integration
    We’re working to seamlessly merge Intel GPU operators into the balance_serve backend for end-to-end support and streamlined maintenance.