关于Mini-InternVL-Chat-4B-V1-5推理速度慢的问题 #2902
Closed
zhuchen1109
started this conversation in
General
Replies: 1 comment 1 reply
-
升级下新版本,这个 kernel 好像早两个版本就删了 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用Mini-InternVL-Chat-4B-V1-5在A800进行推理,输入token数在2200左右,输出5个token,发现推理很慢,使用ns工具分析,_fwd_kernel函数耗时异常,我打印计算grid的关键几个参数值:

max_seqlen 2243
q shape [2243, 32, 96]
k shape [75, 64, 32, 96]
BLOCK_M 256
grid (9,32,1)
对应代码:
想请教下是什么原因导致这个很慢呢?个人觉得这个grid不太合理吧,internvl2-8b grid是[62,8,1]。

Beta Was this translation helpful? Give feedback.
All reactions