We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
环境为:Ubuntu server 22.04 python: 3.11 cuda: 11.8 执行训练报下面错误:
[2025-05-01 18:32:21,169][root][INFO] - Validate epoch: 1, rank: 0 [2025-05-01 18:32:21,172][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1 [2025-05-01 18:32:21,291][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1 Error executing job with overrides: ['++model=iic/SenseVoiceSmall', '++train_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/train_test.jsonl', '++valid_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/val_test.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=6000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=8', '++train_conf.max_epoch=150', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=10', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs'] Traceback (most recent call last): File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 228, in <module> main_hydra() File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() ^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda> lambda: hydra.run( ^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value ^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 56, in main_hydra main(**kwargs) File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 201, in main trainer.validate_epoch(model=model, dataloader_val=dataloader_val, epoch=epoch + 1) File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 771, in validate_epoch self.forward_step(model, batch, loss_dict=loss_dict) File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step retval = model(**batch) ^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 697, in forward encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in encode [[self.textnorm_int_dict[int(style)]] for style in text[:, 3]] File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in <listcomp> [[self.textnorm_int_dict[int(style)]] for style in text[:, 3]] ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ KeyError: 6491 E0501 18:32:25.772000 140269714682944 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 77200) of binary: /root/miniconda3/envs/funasr/bin/python3.11 Traceback (most recent call last): File "/root/miniconda3/envs/funasr/bin/torchrun", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 879, in main run(args) File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run elastic_launch( File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ ../../../funasr/bin/train_ds.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-05-01_18:32:25 host : localhost.localdomain rank : 0 (local_rank: 0) exitcode : 1 (pid: 77200) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ ^C
有大佬知道如何解决吗?
The text was updated successfully, but these errors were encountered:
您好,现在解决了吗?
Sorry, something went wrong.
没
No branches or pull requests
环境为:Ubuntu server 22.04
python: 3.11
cuda: 11.8
执行训练报下面错误:
有大佬知道如何解决吗?
The text was updated successfully, but these errors were encountered: