Skip to content

Empty Recordings File in utils/fix_data_dir.sh Script #4920

Open
@pri1712

Description

@pri1712

I am using the WeSpeaker pipeline and Kaldi toolkit for a speaker diarization task, employing ResNet as the feature extractor. During the filtering of my segments file using the script utils/fix_data_dir.sh, I ran into an issue where the script filters my segments file to zero lines due to the temporary file /tmp/kaldi.XXXX/recordings having no entries

The following is the link to the script : https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/fix_data_dir.sh .

This is a part of the error output:
utils/fix_data_dir.sh: filtered /data1/XYZ/ABC/speaker_diarization/SHARC_check/tools_wespk/data/ABC_dev_fbank_seg/old_dir/segments from 8310 to 0 lines based on filter /tmp/kaldi.3oKA/recordings.
I found that the file /tmp/kaldi.XXXX/recordings generated by the script is empty which causes the script to filter out all lines from the segments file.

  1. What might be causing the /tmp/kaldi.XXXX/recordings file to be empty?

  2. Are there any known issues or additional steps required to ensure the recordings file is correctly populated?

If required I can provide the formats of the necessary files to check for any formatting errors between segments and wav.scp which is being used to generate the /tmp/kaldi.XXXX/recordings file

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugstaleStale bot on the loose

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions