Skip to content

ConcatDataset Behavior and Equal Dataset Length Requirement: #95

@lianabagh

Description

@lianabagh

The current implementation of the ConcatDataset class in the provided codebase enforces a requirement that all datasets within the ConcatDataset must have the same length for the getitem function to function correctly. This restriction is reflected in the calculation of indices for each dataset during item retrieval, which can lead to errors if the datasets have varying lengths. The requirement for equal lengths might limit the flexibility of the ConcatDataset class when dealing with datasets of different lengths.

class ConcatDataset(AudioDataset):
def init(self, datasets: list):
self.datasets = datasets

def __len__(self):
    return sum([len(d) for d in self.datasets])

def __getitem__(self, idx):
    dataset = self.datasets[idx % len(self.datasets)]
    return dataset[idx // len(self.datasets)]

Default Length in AudioDataset:
Additionally, within the Audiodataset class, there is a variable named n_examples that sets the default length of the dataset to 1000. It's important to note that this value might not be aligned with the actual length of the dataset instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions