GitHub - DongyuXu77/MultiDataloader: This package is a toy demo to show the performance between different dataloaders

During the internship in ByteDance, I implement an independent Dataloader which can be used in PyTorch, Tensorflow and other Deep Learning frameworks. The repo MultiDataloader is a toy demo to show the performance gap between different dataloaders [NaiveDataloader, MultiDataloader, ...].

It should be noticed that the repo just a toy demo, isn't the real implement.

Installation

Clone the repo.
Using cd command go to the MultiDataloader/ folder.
Run sh distribute.sh if you have installed setuptools before.
After installing successfully, you can go to the test/ and run test.py

Theoretical speed-up rate

Let's assume loading batch size data costs $C_l$ second, training in one step costs $C_t$ second.

If $C_l \lt C_t$:
$T_M=C_l+C_t*n$ $n$ is the training steps for each epoch $M$ denotes for MultiDataloader
$T_N=(C_l+C_t)*n$ $N$ denotes for NaiveDataloader
$S_{rate} = \frac{T_N}{T_M} = \frac{(C_l+C_t)n}{C_l+C_tn}$ $S_{rate}$ is the theoretical speed-up rate

It should be noticed that the formula doesn't take creating subprocesses and transmitting time into account.

Experiments

It's very difficult to get the accurate speed-up rate because it's hardware related, but it's for sure that the MultiDataloader boosts the loading speed.

I roughly test on 100,000 fake data (It doesn't really load data, just sleep 0.01s), the batch_size set to 256, num_worker set to 8 for 10 times and get average on personal computer [MacBookPro2021 10cores]. The naive_dataloader spends 4.30904s and MultiDataloader spends 0.54476s, the speed-up rate is more than 7X.

Reference

The blog DataLoaders Explained: Building a Multi-Process Data Loader from Scratch inspired me a lot at the beginning.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
dataset		dataset
loader		loader
test		test
README.md		README.md
distribute.sh		distribute.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Theoretical speed-up rate

Experiments

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DongyuXu77/MultiDataloader

Folders and files

Latest commit

History

Repository files navigation

Installation

Theoretical speed-up rate

Experiments

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages