Skip to content

dataset中指定多个数据集,图文混合训练的问题 #3939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhang123434 opened this issue Apr 19, 2025 · 1 comment
Open

dataset中指定多个数据集,图文混合训练的问题 #3939

zhang123434 opened this issue Apr 19, 2025 · 1 comment

Comments

@zhang123434
Copy link

zhang123434 commented Apr 19, 2025

@Jintao-Huang
--dataset 中指定两个数据集之后训练的时候是将多个数据集的数据放在一起,然后再打乱吗?

输入到dataset中的一个jsonl中可以同时包含纯文本偏好数据,图像理解偏好数据,视频偏好数据吗?

期待您的解答,感谢!

@Jintao-Huang
Copy link
Collaborator

  1. --dataset 中指定两个数据集之后训练的时候是将多个数据集的数据放在一起,然后再打乱吗?
    在main分支中,默认会对dataset进行打乱,由dataset_shuffle参数控制
  2. 输入到dataset中的一个jsonl中可以同时包含纯文本偏好数据,图像理解偏好数据,视频偏好数据吗?
    可以的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants