Skip to content

[Bug]: set_seed函数放在了模型初始化之后,导致模型初始化参数每次运行都是随机生成的 #6546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
vvwomen opened this issue Jul 28, 2023 · 1 comment
Assignees
Labels
bug Something isn't working triage

Comments

@vvwomen
Copy link

vvwomen commented Jul 28, 2023

软件环境

- paddlepaddle:paddle.version.commit:b166581a3c89cc74ae0737480eda5c4d093eed7f
- paddlepaddle-gpu: paddle.version.commit:b166581a3c89cc74ae0737480eda5c4d093eed7f
- paddlenlp: 版本:v2.5.2 commit号:e40e40be094f98d71179cd1a73f07650cc22c455

重复问题

  • I have searched the existing issues

错误描述

paddlenlp套件中ernie-3.0-xbase-zh 模型在做序列分类任务时,由于预训练权重文件不存在最后一层linear层的权重,导致这一层需要随机初始化,而set_seed函数放在了Trainer里,导致未能对模型初始化起作用,cuda上跑两次,得到的loss结果不一致。

稳定复现步骤 & 代码

https://github.com/PaddlePaddle/PaddleNLP/tree/v2.5.2/model_zoo/ernie-3.0 下运行:

python run_token_cls.py
--model_name_or_path ernie-3.0-xbase-zh
--dataset msra_ner
--output_dir ./checkpoint
--overwrite_output_dir True
--do_train
--config=configs/default.yml
--logging_steps 1
--max_steps 100
--seed 10
--device cpu
--fp16 False

@vvwomen vvwomen added the bug Something isn't working label Jul 28, 2023
@w5688414
Copy link
Contributor

w5688414 commented May 7, 2024

这个可能与下面的因素有关:
1.linear的初始化都使用了下面的方式,可以确认一下是否是一致。

if isinstance(layer.weight, paddle.Tensor):

2.训练的时候数据集有shuffle操作,并不能完全保证是一致的。

如果想完全对齐,可以在fp32训练的时候,设置seed,然后把所有的shuffle设置成false,另外把最后一层人工进行初始化。

@paddle-bot paddle-bot bot closed this as completed May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants