During the internship in ByteDance, I implement an independent Dataloader which can be used in PyTorch, Tensorflow and other Deep Learning frameworks. The repo MultiDataloader is a toy demo to show the performance gap between different dataloaders [NaiveDataloader, MultiDataloader, ...].
It should be noticed that the repo just a toy demo, isn't the real implement.
- Clone the repo.
- Using
cd
command go to theMultiDataloader/
folder. - Run
sh distribute.sh
if you have installed setuptools before. - After installing successfully, you can go to the
test/
and runtest.py
Let's assume loading batch size data costs
If
It should be noticed that the formula doesn't take creating subprocesses and transmitting time into account.
It's very difficult to get the accurate speed-up rate because it's hardware related, but it's for sure that the MultiDataloader boosts the loading speed.
I roughly test on 100,000 fake data (It doesn't really load data, just sleep 0.01s), the batch_size set to 256, num_worker set to 8 for 10 times and get average on personal computer [MacBookPro2021 10cores]. The naive_dataloader spends 4.30904s and MultiDataloader spends 0.54476s, the speed-up rate is more than 7X.
The blog DataLoaders Explained: Building a Multi-Process Data Loader from Scratch inspired me a lot at the beginning.