Skip to content

自定义GRPO训练数据集加载失败 #3981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hspix opened this issue Apr 24, 2025 · 1 comment
Closed

自定义GRPO训练数据集加载失败 #3981

Hspix opened this issue Apr 24, 2025 · 1 comment

Comments

@Hspix
Copy link

Hspix commented Apr 24, 2025

Describe the bug
参考grpo下message格式,使用json文件保存数据。文件内容类似,

[
    {
        "messages": [
            {
                "role": "user",
                "content": "......"
            },
            {
                "role": "assistant",
                "content": "......"
            }
        ],
        "其他字段": "其他字段值"
    }
]

Your hardware and system info
ms-swift使用的是modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.3-modelscope1.25.0-swift3.3.0.post1镜像
CUDA:12.8

异常信息
Generating train split: 0 examples [00:00, ? examples/s]Failed to load JSON from file 'message.json' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Column() changed from object to array in row 0
Generating train split: 0 examples [00:00, ? examples/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/lib/python3.11/site-packages/datasets/packaged_modules/json/json.py", line 160, in _generate_tables
[rank0]: df = pandas_read_json(f)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/datasets/packaged_modules/json/json.py", line 38, in pandas_read_json
[rank0]: return pd.read_json(path_or_buf, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/pandas/io/json/_json.py", line 815, in read_json
[rank0]: return json_reader.read()
[rank0]: ^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/pandas/io/json/_json.py", line 1025, in read
[rank0]: obj = self._get_object_parser(self.data)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
[rank0]: obj = FrameParser(json, **kwargs).parse()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/pandas/io/json/_json.py", line 1187, in parse
[rank0]: self._parse()
[rank0]: File "/usr/local/lib/python3.11/site-packages/pandas/io/json/_json.py", line 1403, in _parse
[rank0]: ujson_loads(json, precise_float=self.precise_float), dtype=None
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: Expected object or value

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/lib/python3.11/site-packages/datasets/builder.py", line 1854, in _prepare_split_single
[rank0]: for _, table in generator:
[rank0]: File "/usr/local/lib/python3.11/site-packages/datasets/packaged_modules/json/json.py", line 163, in _generate_tables
[rank0]: raise e
[rank0]: File "/usr/local/lib/python3.11/site-packages/datasets/packaged_modules/json/json.py", line 137, in _generate_tables
[rank0]: pa_table = paj.read_json(
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
[rank0]: File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
[rank0]: File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
[rank0]: pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0

@Hspix Hspix closed this as completed Apr 24, 2025
@zhangansen
Copy link

请问你解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants