-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Neuron] Add multi-LoRA support for Neuron. #18284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
da77995
to
3230f31
Compare
This pull request has merge conflicts that must be resolved before it can be |
3230f31
to
54cbd07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for contributing @aws-satyajith . the changes look good to me. it would be better if we could support dynamic lora loading, so that we don't need to pass lora-module in the argument.
@WoosukKwon @simon-mo could we get a quick review on this PR so we merge before V0 Freeze in 0.9.0? 🙏 |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Satyajith Chilappagari <[email protected]>
Signed-off-by: Satyajith Chilappagari <[email protected]>
Signed-off-by: Satyajith Chilappagari <[email protected]>
Signed-off-by: Satyajith Chilappagari <[email protected]>
54cbd07
to
0790f87
Compare
vllm/engine/arg_utils.py
Outdated
max_lora_rank: int = LoRAConfig.max_lora_rank | ||
fully_sharded_loras: bool = LoRAConfig.fully_sharded_loras | ||
max_cpu_loras: Optional[int] = LoRAConfig.max_cpu_loras | ||
lora_modules: Optional[LoRAModulePath] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use override_neuron_config
to include lora modules, instead of adding an additional argument lora_modules
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Since the regular usage diverges from dynamic loading anyway, I think moving lora_modules to override neuron config is doable. I'll make the change and publish a new revision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that most of the changes in this PR are within Neuron
, after addressing the comments above, overall LGTM.
Signed-off-by: Satyajith Chilappagari <[email protected]>
73a0573
to
fefba85
Compare
@jeejeelee @simon-mo seems like all checks are complete - we know that entrypoints has been flaky recently. |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
Signed-off-by: Satyajith Chilappagari <[email protected]> Signed-off-by: amit <[email protected]>
Add multi-LoRA support for Neuron.
RFC: #15970