Skip to content

lora微调gte embedding, merge后推理结果跟微调的结果相差很大 #4084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yesl16 opened this issue May 5, 2025 · 1 comment

Comments

@yesl16
Copy link

yesl16 commented May 5, 2025

lora微调gte embedding, 使用merge后的模型进行推理,结果跟微调的结果相差很大,甚至比初始模型效果还差

shell

swift sft \
    --model 'iic/gte_Qwen2-1.5B-instruct' \
    --train_type lora \
    --dataset '/workspace/train_df.csv' \
    --val_dataset '/workspace/test_df.csv' \
    --torch_dtype bfloat16 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 16 \
    --eval_steps 100 \
    --save_steps 100 \
    --eval_strategy steps \
    --use_chat_template false \
    --save_total_limit 2 \
    --logging_steps 5 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --learning_rate 5e-6 \
    --deepspeed zero3 \
    --dataloader_num_workers 4 \
    --task_type embedding \
    --loss_type cosine_similarity \
    --dataloader_drop_last true

Image

Image

merge

swift export \
    --adapters /workspace/output/v1/checkpoint-800 \
    --merge_lora true

合并后使用SentenceTransformer进行推理

from sentence_transformers import SentenceTransformer

# model = SentenceTransformer("iic/gte_Qwen2-1.5B-instruct", trust_remote_code=True)
model = SentenceTransformer("/workspace/output/v1/checkpoint-800-merged", trust_remote_code=True)

# In case you want to reduce the maximum length:
model.max_seq_length = 8192

queries = [...]
documents = [...]

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

scores = (query_embeddings @ document_embeddings.T)

使用iic/gte_Qwen2-1.5B-instruct模型,对测试集前十条数据计算得到的分数
Image

使用lora微调iic/gte_Qwen2-1.5B-instruc后合并的模型,对测试集前十条数据计算得到的分数
Image

@AriesJin
Copy link

AriesJin commented May 6, 2025

哈喽,我微调自己的数据,也变差了,你找到原因了嘛。而且我发现他这个显存占用特别高

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants