You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
在使用Streaming + Packing + resume_from_checkpoint时报错,目测是再跳过已训练的batch时出现的问题
错误日志:
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/cli/sft.py", line 7, in <module>
[rank0]: sft_main()
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/llm/train/sft.py", line 281, in sft_main
[rank0]: return SwiftSft(args).main()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/llm/base.py", line 47, in main
[rank0]: result = self.run()
[rank0]: ^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/llm/train/sft.py", line 147, in run
[rank0]: return self.train(trainer)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/llm/train/sft.py", line 207, in train
[rank0]: trainer.train(trainer.args.resume_from_checkpoint)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/swift/trainers/mixin.py", line 321, in train
[rank0]: res = super().train(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 2241, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 2482, in _inner_training_loop
[rank0]: epoch_dataloader = skip_first_batches(epoch_dataloader, steps_trained_in_current_epoch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/accelerate/data_loader.py", line 1338, in skip_first_batches
[rank0]: dataset = dataloader.dataset
[rank0]: ^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'DataLoaderDispatcher' object has no attribute 'dataset'
Describe the bug
在使用Streaming + Packing + resume_from_checkpoint时报错,目测是再跳过已训练的batch时出现的问题
错误日志:
启动脚本:
Your hardware and system info
torch==2.5.1
ms-swift==3.4.0
The text was updated successfully, but these errors were encountered: