Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

hyunbin70 · 2025-05-03T11:21:44Z

Hi, first of all, thanks a lot for your amazing work!

I’m currently trying to fine-tune the Qwen2.5-Omni-7B model on my custom dataset, which only contains audio-related data.

I’d like to freeze all parts related to the LLM and ViT, and also freeze the audio encoder.
The only part I want to fine-tune from the original model is the audio aligner (i.e., the projection layer). Additionally, I want to add a 1x1 convolutional layer before the audio encoder and train this layer together with the projection layer.

Given this setup, where would be the best place to start? I’ve looked at the following examples, but I’m still a bit unsure:
ms-swift/examples/train/multimodal/lora_llm_full_vit
ms-swift/examples/custom/model.py

Looking forward to your response — thank you again!

Jintao-Huang · 2025-05-03T14:19:16Z

You can refer to ms-swift/examples/custom/model.py to customize a model, where the get_function needs to return the model and tokenizer.

Then, specify --freeze_llm false, and use --trainable_parameters to specify the additional layers that need to be trained. However, the issue with this approach is that it will save all the weights.

Alternatively, you can refer to ms-swift/examples/train/multimodal/lora_llm_full_vit to customize the tuner method, and define from_pretrained and save_pretrained. Finally, use merge-lora to obtain the complete weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

hyunbin70 commented May 3, 2025

Jintao-Huang commented May 3, 2025

Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower #4070

Comments

hyunbin70 commented May 3, 2025

Jintao-Huang commented May 3, 2025