You're reading from Machine Learning Engineering with MLflow Manage the end-to-end machine learning life cycle with MLflow

Product type Paperback

Published in Aug 2021

Publisher Packt

ISBN-13 9781800560796

Length 248 pages

Edition 1st Edition

Tools

Maven

Concepts

Machine Learning

Author (1):

Lauchande

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1: Problem Framing and Introductions

2. Chapter 1: Introducing MLflow FREE CHAPTER

3. Chapter 2: Your Machine Learning Project

4. Section 2: Model Development and Experimentation

5. Chapter 3: Your Data Science Workbench

6. Chapter 4: Experiment Management in MLflow

7. Chapter 5: Managing Models with MLflow

8. Section 3: Machine Learning in Production

9. Chapter 6: Introducing ML Systems Architecture

10. Chapter 7: Data and Feature Management

11. Chapter 8: Training Models with MLflow

12. Chapter 9: Deployment and Inference with MLflow

13. Section 4: Advanced Topics

14. Chapter 10: Scaling Up Your Machine Learning Workflow

15. Chapter 11: Performance Monitoring

16. Chapter 12: Advanced Topics with MLflow

17. Other Books You May Enjoy

Integrating MLflow with Apache Spark

Apache Spark is a very scalable and popular big data framework that allows data processing at a large scale. For more details and documentation, please go to https://spark.apache.org/. As a big data tool, it can be used to speed up parts of your ML inference, as it can be set at a training or an inference level.

In this particular case, we will illustrate how to implement it to use the model developed in the previous section on the Databricks environment to scale the batch-inference job to larger amounts of data.

In other to explore Spark integration with MLflow, we will execute the following steps:

Create a new notebook named inference_job_spark in Python, linking to a running cluster where the bitpred_poc.ipynb notebook was just created.
Upload your data to dbfs on the File/Upload data link in the environment.
Execute the following script in a cell of the notebook, changing the logged_model and df filenames for the ones...