Skip to content

基于SenseVoiceSmall训练方言时报错误KeyError: 6491 #2502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lukeewin opened this issue May 1, 2025 · 2 comments
Open

基于SenseVoiceSmall训练方言时报错误KeyError: 6491 #2502

lukeewin opened this issue May 1, 2025 · 2 comments

Comments

@lukeewin
Copy link

lukeewin commented May 1, 2025

环境为:Ubuntu server 22.04
python: 3.11
cuda: 11.8
执行训练报下面错误:

[2025-05-01 18:32:21,169][root][INFO] - Validate epoch: 1, rank: 0

[2025-05-01 18:32:21,172][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1
[2025-05-01 18:32:21,291][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1
Error executing job with overrides: ['++model=iic/SenseVoiceSmall', '++train_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/train_test.jsonl', '++valid_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/val_test.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=6000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=8', '++train_conf.max_epoch=150', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=10', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs']
Traceback (most recent call last):
  File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 228, in <module>
    main_hydra()
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 56, in main_hydra
    main(**kwargs)
  File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 201, in main
    trainer.validate_epoch(model=model, dataloader_val=dataloader_val, epoch=epoch + 1)
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 771, in validate_epoch
    self.forward_step(model, batch, loss_dict=loss_dict)
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step
    retval = model(**batch)
             ^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 697, in forward
    encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in encode
    [[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in <listcomp>
    [[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]
      ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 6491
E0501 18:32:25.772000 140269714682944 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 77200) of binary: /root/miniconda3/envs/funasr/bin/python3.11
Traceback (most recent call last):
  File "/root/miniconda3/envs/funasr/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
../../../funasr/bin/train_ds.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-05-01_18:32:25
  host      : localhost.localdomain
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 77200)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
^C

有大佬知道如何解决吗?

@deegy666
Copy link

您好,现在解决了吗?

@lukeewin
Copy link
Author

您好,现在解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants