Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
scikit-learn Cookbook
scikit-learn Cookbook

scikit-learn Cookbook: Over 80 recipes for machine learning in Python with scikit-learn , Third Edition

Arrow left icon
Profile Icon John Sukup
Arrow right icon
Early Access Early Access Publishing in Dec 2025
$49.99
Paperback Dec 2025 388 pages 3rd Edition
eBook
$35.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon John Sukup
Arrow right icon
Early Access Early Access Publishing in Dec 2025
$49.99
Paperback Dec 2025 388 pages 3rd Edition
eBook
$35.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$35.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

scikit-learn Cookbook

Working with Metadata: Tags and More

Scikit-learn uses metadata, such as estimator tags, to control how models behave in various contexts including cross-validation and pipeline processing, and their capabilities like supported output types. Additionally, tags can provide information about an estimator such as whether it can handle multi-output data or missing values, enabling scikit-learn to optimize workflows dynamically.

scikit-learn’s metadata captures information related to model inputs and outputs and then typically uses this information to control the flow of data between different tasks in a Pipeline. Metadata objects come in two varieties, routers and consumers, where routers move metadata to consumers and consumers use that metadata in their calculations. This is known as Metadata Routing in scikit-learn.

More on metadata routing

Metadata routing in scikit-learn is a feature that allows users to control how metadata is passed between router and consumer objects in a...

Best Practices for API Usage

Once you get a feel for the underlying scikit-learn programming paradigm, you realize how powerful it is! When working with scikit-learn’s API, following best practices ensures that your code remains clear, modular, and maintainable. This includes leveraging reusable components like pipelines, adhering to the consistent fit(), predict(), and transform() methods, and making effective use of hyperparameter tuning tools like GridSearchCV(). Keeping models and data processing steps modular allows for easy debugging and scaling of your machine learning workflows.

Here are a few additional model development best practices and key takeaways as they relate to scikit-learn functionality to keep in mind as we move forward and explore some of the concepts in this chapter in further, more granular, detail:

  • Uniform API: All estimators in scikit-learn follow the same basic pattern of fit(), transform()(for transformers), and predict() methods, making code more readable...

Summary

In this chapter, we began with a high-level overview of the scikit-learn library and some of its most important features we will explore moving forward. Keep in mind, there are many additional features we haven’t yet talked about that we may stumble upon in later chapters. When applicable, callout boxes will be provided for clarity.

In the next chapter, we will begin to build our Cookbook with recipes for one of the most important stages in ML model development: data preprocessing. Let’s get going!

Handling missing data

Missing data can arise from various sources, including human error, technical failures, or data corruption. It is important to address missing values before training ML models, as most algorithms cannot handle them directly, and most scikit-learn methods won’t even execute when they are detected in your training data. Sometimes, with large enough datasets, we can simply drop the records that contain missing values with little impact on the resulting model, but this isn’t always viable. Thankfully, scikit-learn provides several strategies for imputing missing values, allowing practitioners to fill in gaps with estimated values based on available data. This recipe introduces three of the most commonly used methods for imputing missing values in a dataset with scikit-learn.

Getting ready

To begin, we will create a toy dataset composed of random, quantitative data, 10 features, and several missing data values randomly spread throughout. We will...

Scaling techniques

When working with datasets, features can have vastly different scales. For instance, a feature representing age may range from 0 to 100, while another feature representing income could range from 0 to 100,000. Many ML algorithms, such as KNN and gradient descent-based methods (e.g., linear regression), are sensitive to these differences in scale. Therefore, scaling helps ensure that no single feature dominates the learning process. This recipe covers the three most commonly used scaling techniques in ML.

The following are key concepts. It is worth noting that sometimes these two terms are used interchangeably, but they are not the same and should not be implemented as such!

  • Standardization (Z-score transformation) changes the data to have a mean of 0 and a standard deviation of 1
  • Normalization changes the range of the data distribution so values fall between 0 and 1

Getting ready

We will use the previously defined iterative_imputed_df DataFrame...

Encoding categorical variables

Categorical variables are a common feature in many datasets, representing discrete values such as categories, labels, or groups. However, most ML algorithms (well, computers in general, it should be said) require numerical input, making it essential to convert categorical data into a suitable format.

Categorical variables can be divided into two main types:

  • Nominal variables: These represent categories without any intrinsic ordering (e.g., color, brand)
  • Ordinal variables: These have a clear ordering among categories (e.g., ratings from 1 to 5)

Choosing the right encoding method depends on the type of categorical variable and the specific requirements of the ML algorithm being used. These recipes teach us how to convert non-numeric variables into a numeric representation that our training algorithms can utilize appropriately.

Getting ready

To begin, like we did earlier, we will create a toy dataset, only this time, our features...

Introduction to pipelines in scikit-learn

In ML, managing the workflow of data preprocessing and model training can become complex, especially when multiple steps are involved (which is almost always the case in the real world). The Pipeline() class in scikit-learn offers a powerful solution to streamline this process. By allowing users to chain together various preprocessing steps and model training into a single object, pipelines enhance code efficiency and reduce the likelihood of errors. This recipe will introduce you to the concept of pipelines, demonstrating how to create and utilize them effectively for data preprocessing in scikit-learn. We’ll be utilizing pipelines throughout the book as we add more steps to our model development workflow.

What is a pipeline?

A pipeline in scikit-learn is essentially a sequence of steps that are executed in order. Each step in the pipeline consists of a name and an associated transformer or estimator (refer to Chapter 1 if you...

Feature engineering

Feature engineering is really an umbrella term that generally refers to two main activities: feature extraction and feature selection. Effective feature engineering can significantly enhance model performance by providing algorithms with more informative inputs and reducing or removing noisy and/or uninformative ones. These recipes will teach common approaches to feature engineering using existing features to generate new features that may (“may” being the keyword) improve model performance.

Understanding feature engineering

Feature engineering encompasses two main activities:

  1. Creating new features (feature extraction): This involves transforming existing data into new variables that may capture important patterns or relationships. For example, you might derive a total spending feature by combining price and quantity features.
  2. Selecting relevant features (feature selection): This process identifies and retains the most informative...

Practical exercises on data preprocessing

In this chapter, we’ve covered several methods commonly applied to data preprocessing. Now it’s time to put it all together! Can you guess what tool might be helpful for this exercise? You got it: the Pipeline() class!

How to do it…

For these exercises, we will use a publicly available dataset, California Housing, which is included in the scikit-learn library. The dataset contains 20,640 records and 9 features, where the target value (what we are trying to predict with our model) is the average home price per 100,000 homes.

You are tasked with building a comprehensive data pipeline composed of steps you learned in this chapter. In the Jupyter notebook for Chapter 2, you will find an incomplete code block at the end called Comprehensive Pipeline, where you should add your code to complete the following steps:

  1. Load the California Housing dataset.
  2. Split the data.
  3. Create a comprehensive pipeline with...
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Solve complex business problems with data-driven approaches
  • Master tools associated with developing predictive and prescriptive models
  • Build robust ML pipelines for real-world applications, avoiding common pitfalls
  • Free with your book: PDF Copy, AI Assistant, and Next-Gen Reader

Description

Trusted by data scientists, ML engineers, and software developers alike, scikit-learn offers a versatile, user-friendly framework for implementing a wide range of ML algorithms, enabling the efficient development and deployment of predictive models in real-world applications. This third edition of scikit-learn Cookbook will help you master ML with real-world examples and scikit-learn 1.5 features. This updated edition takes you on a journey from understanding the fundamentals of ML and data preprocessing, through implementing advanced algorithms and techniques, to deploying and optimizing ML models in production. Along the way, you’ll explore practical, step-by-step recipes that cover everything from feature engineering and model selection to hyperparameter tuning and model evaluation, all using scikit-learn. By the end of this book, you’ll have gained the knowledge and skills needed to confidently build, evaluate, and deploy sophisticated ML models using scikit-learn, ready to tackle a wide range of data-driven challenges.

Who is this book for?

This book is for data scientists as well as machine learning and software development professionals looking to deepen their understanding of advanced ML techniques. To get the most out of this book, you should have proficiency in Python programming and familiarity with commonly used ML libraries; e.g., pandas, NumPy, matplotlib, and sciPy. An understanding of basic ML concepts, such as linear regression, decision trees, and model evaluation metrics will be helpful. Familiarity with mathematical concepts such as linear algebra, calculus, and probability will also be invaluable.

What you will learn

  • Implement a variety of ML algorithms, from basic classifiers to complex ensemble methods, using scikit-learn
  • Perform data preprocessing, feature engineering, and model selection to prepare datasets for optimal model performance
  • Optimize ML models through hyperparameter tuning and cross-validation techniques to improve accuracy and reliability
  • Deploy ML models for scalable, maintainable real-world applications
  • Evaluate and interpret models with advanced metrics and visualizations in scikit-learn
  • Explore comprehensive, hands-on recipes tailored to scikit-learn version 1.5
Estimated delivery fee Deliver to Chile

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 19, 2025
Length: 388 pages
Edition : 3rd
Language : English
ISBN-13 : 9781836644453
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Chile

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Publication date : Dec 19, 2025
Length: 388 pages
Edition : 3rd
Language : English
ISBN-13 : 9781836644453
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Table of Contents

16 Chapters
Chapter 1: Common Conventions and API Elements of scikit-learn Chevron down icon Chevron up icon
Chapter 2: Pre-Model Workflow and Data Preprocessing Chevron down icon Chevron up icon
Chapter 3: Dimensionality Reduction Techniques Chevron down icon Chevron up icon
Chapter 4: Building Models with Distance Metrics and Nearest Neighbors Chevron down icon Chevron up icon
Chapter 5: Linear Models and Regularization Chevron down icon Chevron up icon
Chapter 6: Advanced Logistic Regression and Extensions Chevron down icon Chevron up icon
Chapter 7: Support Vector Machines and Kernel Methods Chevron down icon Chevron up icon
Chapter 8: Tree-Based Algorithms and Ensemble Methods Chevron down icon Chevron up icon
Chapter 9: Text Processing and Multiclass Classification Chevron down icon Chevron up icon
Chapter 10: Clustering Techniques Chevron down icon Chevron up icon
Chapter 11: Novelty and Outlier Detection Chevron down icon Chevron up icon
Chapter 12: Cross-Validation and Model Evaluation Techniques Chevron down icon Chevron up icon
Chapter 13: Deploying scikit-learn Models in Production Chevron down icon Chevron up icon
Chapter 14: Unlock Your Exclusive Benefits Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
Modal Close icon
Modal Close icon