Skip to content

Wt/camb/interface #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

JackWeiw
Copy link

Motivation

This PR changes llm op interface of paged_prefiil_attention.

Modification

Add relevant params to dlinfer attention backend and kernel
Added params requested by tmo:
cu_seq_lens_kv (Tensor): The cumulative sequence lengths of the key/value sequences.
max_kv_seq_len (int): The maximum length of any key/value sequence.

JackWeiw and others added 8 commits December 30, 2024 10:50
* support w8a8 smooth_quant and loading

* optimize int8

* fix fp8 kernels

* update docs for w8a8

* resolve comments

* resolve comments

* fix ut

* disable not quant last norm

* disable quant last norm for cogvlm and minicpmv26 models

---------

Co-authored-by: grimoire <[email protected]>
* first

* better tuning

* restore tuning value
* remove threadsafe

* optimize performance

* 22.4

* 22.5

* delete jsonl

* add docs

* fix link

* rst

* remove sleep req step

* remove scheduler sleep

* fix ut

* recovery async engine
* Update ascend get_started.md

* Update ascend get_started.md

* fix Dockerfile_aarch64_ascend
@JackWeiw JackWeiw closed this Jan 14, 2025
@JackWeiw JackWeiw deleted the wt/camb/interface branch January 14, 2025 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants