Skip to content

Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hyunbin70 opened this issue May 3, 2025 · 1 comment

Comments

@hyunbin70
Copy link

Hi, first of all, thanks a lot for your amazing work!

I’m currently trying to fine-tune the Qwen2.5-Omni-7B model on my custom dataset, which only contains audio-related data.

I’d like to freeze all parts related to the LLM and ViT, and also freeze the audio encoder.
The only part I want to fine-tune from the original model is the audio aligner (i.e., the projection layer). Additionally, I want to add a 1x1 convolutional layer before the audio encoder and train this layer together with the projection layer.

Given this setup, where would be the best place to start? I’ve looked at the following examples, but I’m still a bit unsure:
ms-swift/examples/train/multimodal/lora_llm_full_vit
ms-swift/examples/custom/model.py

Looking forward to your response — thank you again!

@Jintao-Huang
Copy link
Collaborator

You can refer to ms-swift/examples/custom/model.py to customize a model, where the get_function needs to return the model and tokenizer.

Then, specify --freeze_llm false, and use --trainable_parameters to specify the additional layers that need to be trained. However, the issue with this approach is that it will save all the weights.

Alternatively, you can refer to ms-swift/examples/train/multimodal/lora_llm_full_vit to customize the tuner method, and define from_pretrained and save_pretrained. Finally, use merge-lora to obtain the complete weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants