You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all, thanks a lot for your amazing work!
I’m currently trying to fine-tune the Qwen2.5-Omni-7B model on my custom dataset, which only contains audio-related data.
I’d like to freeze all parts related to the LLM and ViT, and also freeze the audio encoder.
The only part I want to fine-tune from the original model is the audio aligner (i.e., the projection layer). Additionally, I want to add a 1x1 convolutional layer before the audio encoder and train this layer together with the projection layer.
Given this setup, where would be the best place to start? I’ve looked at the following examples, but I’m still a bit unsure:
ms-swift/examples/train/multimodal/lora_llm_full_vit
ms-swift/examples/custom/model.py
Looking forward to your response — thank you again!
The text was updated successfully, but these errors were encountered:
You can refer to ms-swift/examples/custom/model.py to customize a model, where the get_function needs to return the model and tokenizer.
Then, specify --freeze_llm false, and use --trainable_parameters to specify the additional layers that need to be trained. However, the issue with this approach is that it will save all the weights.
Alternatively, you can refer to ms-swift/examples/train/multimodal/lora_llm_full_vit to customize the tuner method, and define from_pretrained and save_pretrained. Finally, use merge-lora to obtain the complete weights.
Hi, first of all, thanks a lot for your amazing work!
I’m currently trying to fine-tune the Qwen2.5-Omni-7B model on my custom dataset, which only contains audio-related data.
I’d like to freeze all parts related to the LLM and ViT, and also freeze the audio encoder.
The only part I want to fine-tune from the original model is the audio aligner (i.e., the projection layer). Additionally, I want to add a 1x1 convolutional layer before the audio encoder and train this layer together with the projection layer.
Given this setup, where would be the best place to start? I’ve looked at the following examples, but I’m still a bit unsure:
ms-swift/examples/train/multimodal/lora_llm_full_vit
ms-swift/examples/custom/model.py
Looking forward to your response — thank you again!
The text was updated successfully, but these errors were encountered: