Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Data Science and Python Machine Learning Perform data mining and machine learning efficiently using Python and Spark

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787280748

Length 420 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Mining

Author (1):

Frank Kane

View More author details

Table of Contents (11) Chapters

Preface

1. Getting Started

2. Statistics and Probability Refresher, and Python Practice FREE CHAPTER

3. Matplotlib and Advanced Probability Concepts

4. Predictive Models

5. Machine Learning with Python

6. Recommender Systems

7. More Data Mining and Machine Learning Techniques

8. Dealing with Real-World Data

9. Apache Spark - Machine Learning on Big Data

10. Testing and Experimental Design

K-fold cross-validation to avoid overfitting

Earlier in the book, we talked about train and test as a good way of preventing overfitting and actually measuring how well your model can perform on data it's never seen before. We can take that to the next level with a technique called k-fold cross-validation. So, let's talk about this powerful tool in your arsenal for fighting overfitting; k-fold cross-validation and learn how that works.

To recall from train/test, the idea was that we split all of our data that we're building a machine learning model based off of into two segments: a training dataset, and a test dataset. The idea is that we train our model only using the data in our training dataset, and then we evaluate its performance using the data that we reserved for our test dataset. That prevents us from overfitting to the data that we have because we&apos...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.

See other products by Frank Kane