You're reading from scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Product type Paperback

Published in Dec 2025

Publisher Packt

ISBN-13 9781836644453

Length 388 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1):

John Sukup

View More author details

Table of Contents (17) Chapters

Preface

1. Chapter 1: Common Conventions and API Elements of scikit-learn

2. Chapter 2: Pre-Model Workflow and Data Preprocessing FREE CHAPTER

3. Chapter 3: Dimensionality Reduction Techniques

4. Chapter 4: Building Models with Distance Metrics and Nearest Neighbors

5. Chapter 5: Linear Models and Regularization

6. Chapter 6: Advanced Logistic Regression and Extensions

7. Chapter 7: Support Vector Machines and Kernel Methods

8. Chapter 8: Tree-Based Algorithms and Ensemble Methods

9. Chapter 9: Text Processing and Multiclass Classification

10. Chapter 10: Clustering Techniques

11. Chapter 11: Novelty and Outlier Detection

12. Chapter 12: Cross-Validation and Model Evaluation Techniques

13. Chapter 13: Deploying scikit-learn Models in Production

14. Chapter 14: Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

15. Index

Why subscribe?

16. Other Books You May Enjoy

Understanding estimators

So, what exactly is an estimator anyway? The concept of estimators lies at the heart of scikit-learn. Estimators are objects (in the sense of Python’s Object-Oriented Programming (OOP)) that implement algorithms for learning from data and are consistent across the entire library. Every estimator in scikit-learn, whether a model or a transformer, follows a simple and intuitive interface. The two most essential methods of any estimator are fit() and predict(), both of which were mentioned previously. The fit() method trains the model by learning from data, while predict() is used to make predictions on new data based on the trained model. This is the raison d’être of ML.

For example, in one of the simplest—yet often most powerful—ML models, LinearRegression(), calling fit() with training data allows the model to learn the optimal coefficients for predicting outcomes. Afterward, predict() can be used on new data to generate predictions:

from sklearn.linear_model import LinearRegression
import numpy as np
# Example data
X = np.array([[1], [2], [3], [4], [5]])  # Feature matrix
y = np.array([1, 2, 3, 3.5, 5])  # Target values
# Create and fit the model
model = LinearRegression()
model.fit(X, y)
# Predict values for new data
X_new = np.array([[6], [7]])
predictions = model.predict(X_new)
print(predictions)
# Output:
[5.75, 6.7]

The library also provides a nice shortcut method, fit_predict(), that combines these operations into a single API call—a very useful tool! Now, there is a reason why scikit-learn has both the fit() and predict() methods separate, as well as fit_predict(). Typically, the fit_predict() method is applied when you want to obtain predictions within the same dataset the model was trained on. This is often the case in unsupervised learning. An example of this can be seen here regarding KMeans, where our data does not contain a target variable we are trying to predict in the training data. In supervised learning scenarios where we do have a target, the fit() method would be applied to the training data, and the predict() method would be applied to our holdout dataset.

This is not to say you can’t use fit_predict() in unsupervised learning scenarios. Datasets can still be split into training, validation, and testing sets:

# Fit_predict is not used in LinearRegression,
# but as an example for clustering:
from sklearn.cluster import KMeans
# Example data
X = np.array([[1], [2], [3], [4], [5]])
# KMeans Clustering example
kmeans = KMeans(n_clusters=2)
labels = kmeans.fit_predict(X)
print(labels)
# Output:
[0,0,0,1,1]

scikit-learn’s design ensures that whether you are working with simple linear regression or more complex algorithms such as random forests, the pattern remains the same, promoting consistency and ease of use.

Throughout this book, we will explore various estimators, including LinearRegression() (Chapter 5), DecisionTreeClassifier() (Chapter 8), and KNeighborsClassifier() (Chapter 4), while demonstrating how to use them to train models, evaluate performance, and make predictions, all using the familiar fit() and predict() structure.

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Table of Contents (17) Chapters

Understanding estimators

Authors (1)

Personalised recommendations for you

You're reading from scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Table of Contents (17) Chapters

Understanding estimators

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access