Skip to content

Qwen2-audio-instruct用lora微调后inference,出现tensor维度不对应的问题 #4128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SylviaZiyaZhou opened this issue May 8, 2025 · 2 comments

Comments

@SylviaZiyaZhou
Copy link

SylviaZiyaZhou commented May 8, 2025

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    PtEngine, RequestConfig, safe_snapshot_download, get_model_tokenizer, get_template, InferRequest
)
from swift.tuners import Swift
# 请调整下面几行
model = 'ms-swift/models/qwen/Qwen2-Audio-7B-Instruct'
lora_checkpoint = safe_snapshot_download('ms-swift/output/v5-20250506-212353/checkpoint-37404')  # 修改成checkpoint_dir
template_type = None  # None: 使用对应模型默认的template_type
default_system = None  # None: 使用对应模型默认的default_system

# 加载模型和对话模板
model, tokenizer = get_model_tokenizer(model)
model = Swift.from_pretrained(model, lora_checkpoint)
template_type = template_type or model.model_meta.template
template = get_template(template_type, tokenizer, default_system=default_system)
engine = PtEngine.from_model_template(model, template, max_batch_size=2)
request_config = RequestConfig(max_tokens=8192, temperature=0)

# 这里使用了2个infer_request来展示batch推理
infer_requests = [
    InferRequest(messages=[{'role': 'user', 'content': 'Please verify if the given audio samples are from the same speaker or not.speaker prompt 1: <audio> \n speaker prompt 2: <audio>'}],
                 audios=["/aifs4su/mmdata/rawdata/speech/aishell1/data_aishell/wav/test/S0766/BAC009S0766W0172.wav", "/aifs4su/mmdata/rawdata/speech/aishell1/data_aishell/wav/test/S0914/BAC009S0914W0298.wav"]),
    InferRequest(messages=[{'role': 'user', 'content': 'Kindly determine the language spoken in the given audio clip.reference speech: <audio>'}],
                 audios=["/aifs4su/mmdata/rawdata/speech/aishell1/data_aishell/wav/test/S0905/BAC009S0905W0388.wav"]),
]
resp_list = engine.infer(infer_requests, request_config)
query0 = infer_requests[0].messages[0]['content']
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

@SylviaZiyaZhou
Copy link
Author

报错信息如下:

Image

@SylviaZiyaZhou
Copy link
Author

[INFO:swift] Loading the model using model_dir: /aifs4su/ziyaz/ms-swift/output/v5-20250506-212353/checkpoint-37404
[INFO:swift] Loading the model using model_dir: /aifs4su/ziyaz/ms-swift/models/qwen/Qwen2-Audio-7B-Instruct
[INFO:swift] Setting torch_dtype: torch.bfloat16
[WARNING:swift] Please install the package: pip install "transformers>=4.45,<4.49" -U.
[INFO:swift] model_kwargs: {'device_map': 'cuda:0'}
Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.81s/it]
/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:550: UserWarning: Model with tie_word_embeddings=True and the tied_target_modules=['language_model.model.embed_tokens'] are part of the adapter. This can lead to complications, for example when merging the adapter or converting your model to formats other than safetensors. See for example huggingface/peft#2018.
warnings.warn(
0%| | 0/2 [00:00<?, ?it/s][INFO:swift] Setting sampling_rate: 16000. You can adjust this hyperparameter through the environment variable: SAMPLING_RATE.
[WARNING:swift] max_model_len(8192) - num_tokens(62) < max_tokens(8192). Setting max_tokens: 8130
Expanding inputs for audio tokens in Qwen2Audio should be done in processing.
Traceback (most recent call last):
File "/aifs4su/ziyaz/ms-swift/qwen2-audio-tuning/infer_swift.py", line 29, in
resp_list = engine.infer(infer_requests, request_config)
File "/aifs4su/ziyaz/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 542, in infer
res += self._infer(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/aifs4su/ziyaz/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 507, in _infer
res = infer_func(**kwargs)
File "/aifs4su/ziyaz/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 365, in _infer_full
output = dict(template.generate(self.model, **generate_kwargs))
File "/aifs4su/ziyaz/ms-swift/swift/llm/template/base.py", line 507, in generate
return model.generate(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/peft/peft_model.py", line 1875, in generate
outputs = self.base_model.generate(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/generation/utils.py", line 2465, in generate
result = self._sample(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/generation/utils.py", line 3434, in _sample
outputs = model_forward(**model_inputs, return_dict=True)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/models/qwen2_audio/modeling_qwen2_audio.py", line 1141, in forward
outputs = self.language_model(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
output = func(self, *args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
outputs: BaseModelOutputWithPast = self.model(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
output = func(self, *args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 549, in forward
layer_outputs = decoder_layer(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 262, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 194, in forward
attn_output, attn_weights = attention_interface(
File "/aifs4su/ziyaz/miniconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 54, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (394) must match the existing size (63) at non-singleton dimension 3. Target sizes: [2, 32, 1, 394]. Tensor sizes: [2, 1, 1, 63]
0%|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant