Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
scikit-learn Cookbook

You're reading from   scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Arrow left icon
Product type Paperback
Published in Dec 2025
Publisher Packt
ISBN-13 9781836644453
Length 388 pages
Edition 3rd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
John Sukup John Sukup
Author Profile Icon John Sukup
John Sukup
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Chapter 1: Common Conventions and API Elements of scikit-learn 2. Chapter 2: Pre-Model Workflow and Data Preprocessing FREE CHAPTER 3. Chapter 3: Dimensionality Reduction Techniques 4. Chapter 4: Building Models with Distance Metrics and Nearest Neighbors 5. Chapter 5: Linear Models and Regularization 6. Chapter 6: Advanced Logistic Regression and Extensions 7. Chapter 7: Support Vector Machines and Kernel Methods 8. Chapter 8: Tree-Based Algorithms and Ensemble Methods 9. Chapter 9: Text Processing and Multiclass Classification 10. Chapter 10: Clustering Techniques 11. Chapter 11: Novelty and Outlier Detection 12. Chapter 12: Cross-Validation and Model Evaluation Techniques 13. Chapter 13: Deploying scikit-learn Models in Production 14. Chapter 14: Unlock Your Exclusive Benefits 15. Index 16. Other Books You May Enjoy

Understanding estimators

So, what exactly is an estimator anyway? The concept of estimators lies at the heart of scikit-learn. Estimators are objects (in the sense of Python’s Object-Oriented Programming (OOP)) that implement algorithms for learning from data and are consistent across the entire library. Every estimator in scikit-learn, whether a model or a transformer, follows a simple and intuitive interface. The two most essential methods of any estimator are fit() and predict(), both of which were mentioned previously. The fit() method trains the model by learning from data, while predict() is used to make predictions on new data based on the trained model. This is the raison d’être of ML.

For example, in one of the simplest—yet often most powerful—ML models, LinearRegression(), calling fit() with training data allows the model to learn the optimal coefficients for predicting outcomes. Afterward, predict() can be used on new data to generate predictions:

from sklearn.linear_model import LinearRegression
import numpy as np
# Example data
X = np.array([[1], [2], [3], [4], [5]])  # Feature matrix
y = np.array([1, 2, 3, 3.5, 5])  # Target values
# Create and fit the model
model = LinearRegression()
model.fit(X, y)
# Predict values for new data
X_new = np.array([[6], [7]])
predictions = model.predict(X_new)
print(predictions)
# Output:
[5.75, 6.7]

The library also provides a nice shortcut method, fit_predict(), that combines these operations into a single API call—a very useful tool! Now, there is a reason why scikit-learn has both the fit() and predict() methods separate, as well as fit_predict(). Typically, the fit_predict() method is applied when you want to obtain predictions within the same dataset the model was trained on. This is often the case in unsupervised learning. An example of this can be seen here regarding KMeans, where our data does not contain a target variable we are trying to predict in the training data. In supervised learning scenarios where we do have a target, the fit() method would be applied to the training data, and the predict() method would be applied to our holdout dataset.

This is not to say you can’t use fit_predict() in unsupervised learning scenarios. Datasets can still be split into training, validation, and testing sets:

# Fit_predict is not used in LinearRegression,
# but as an example for clustering:
from sklearn.cluster import KMeans
# Example data
X = np.array([[1], [2], [3], [4], [5]])
# KMeans Clustering example
kmeans = KMeans(n_clusters=2)
labels = kmeans.fit_predict(X)
print(labels)
# Output:
[0,0,0,1,1]

scikit-learn’s design ensures that whether you are working with simple linear regression or more complex algorithms such as random forests, the pattern remains the same, promoting consistency and ease of use.

Throughout this book, we will explore various estimators, including LinearRegression() (Chapter 5), DecisionTreeClassifier() (Chapter 8), and KNeighborsClassifier() (Chapter 4), while demonstrating how to use them to train models, evaluate performance, and make predictions, all using the familiar fit() and predict() structure.

lock icon The rest of the chapter is locked
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
scikit-learn Cookbook
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon