Skip to content

saur-abh/mlops-with-dvc

 
 

Repository files navigation

MLOps - Devfest

The project is designed to for a talk at #devfestindia-2020

Topics to be convered

  1. Data versioning
  2. Building training pipelines
  3. Versioning models
  4. Deploying using docker and kubernetes

Slides

WIP

Setup and Run project

  1. Clone the repo
git clone <repo_url>
cd <repo-name>
  1. Pull the data from gdrive
dvc pull -r gdrive

It will ask for authorization, please enter the auth key from the URL

  1. Run the application
python app/app.py
  1. Update model/any ML step and run
dvc repro
dvc push -r gdrive

DVC Notes

How is the pipeline created?

dvc run -n preprocess -d src/preprocess.py -d assets/original_data/train.csv  -o assets/preprocessed/  python src/preprocess.python
dvc run -n featurize -d src/preprocess.py -d assets/preprocessed/train.csv -d assets/preprocessed/train.csv   -o assets/featurized/  python src/featurize.py 
dvc run -fn train_test_eval  -d src/model.py -d assets/featurized -p model.random,model.split  -o assets/models  -M assets/eval/scores.json  python src/model.py 
dvc run -fn train_test_eval  -d src/model.py -d assets/featurized -p model.random,model.split  -o assets/models  -M assets/eval/scores.json  python src/model.py 

About

A trial project to understand ops around a Machine Learning project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.4%
  • Smarty 7.5%
  • Dockerfile 4.7%
  • Shell 1.4%