Skip to content

DongyuXu77/MultiDataloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

During the internship in ByteDance, I implement an independent Dataloader which can be used in PyTorch, Tensorflow and other Deep Learning frameworks. The repo MultiDataloader is a toy demo to show the performance gap between different dataloaders [NaiveDataloader, MultiDataloader, ...].

It should be noticed that the repo just a toy demo, isn't the real implement.

Installation

  1. Clone the repo.
  2. Using cd command go to the MultiDataloader/ folder.
  3. Run sh distribute.sh if you have installed setuptools before.
  4. After installing successfully, you can go to the test/ and run test.py

Theoretical speed-up rate

Let's assume loading batch size data costs $C_l$ second, training in one step costs $C_t$ second.

If $C_l \lt C_t$:
   $T_M=C_l+C_t*n$    $n$ is the training steps for each epoch   $M$ denotes for MultiDataloader
   $T_N=(C_l+C_t)*n$$N$ denotes for NaiveDataloader
   $S_{rate} = \frac{T_N}{T_M} = \frac{(C_l+C_t)n}{C_l+C_tn}$$S_{rate}$ is the theoretical speed-up rate

It should be noticed that the formula doesn't take creating subprocesses and transmitting time into account.

Experiments

It's very difficult to get the accurate speed-up rate because it's hardware related, but it's for sure that the MultiDataloader boosts the loading speed.

I roughly test on 100,000 fake data (It doesn't really load data, just sleep 0.01s), the batch_size set to 256, num_worker set to 8 for 10 times and get average on personal computer [MacBookPro2021 10cores]. The naive_dataloader spends 4.30904s and MultiDataloader spends 0.54476s, the speed-up rate is more than 7X.

Reference

The blog DataLoaders Explained: Building a Multi-Process Data Loader from Scratch inspired me a lot at the beginning.

About

This package is a toy demo to show the performance between different dataloaders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published