100% found this document useful (1 vote)
182 views114 pages

Introduction To Machine Learning With Python A Guide For Data Scientists Andreas C. Müller Download

The document is a guide to 'Introduction to Machine Learning with Python' by Andreas C. Müller, detailing its focus on practical applications of machine learning algorithms using the scikit-learn library. It emphasizes the importance of understanding data and formulating tasks as machine learning problems while primarily covering supervised learning techniques. The book does not delve into the mathematical foundations of machine learning or cover reinforcement learning and deep learning extensively.

Uploaded by

uojdzqcz085
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
182 views114 pages

Introduction To Machine Learning With Python A Guide For Data Scientists Andreas C. Müller Download

The document is a guide to 'Introduction to Machine Learning with Python' by Andreas C. Müller, detailing its focus on practical applications of machine learning algorithms using the scikit-learn library. It emphasizes the importance of understanding data and formulating tasks as machine learning problems while primarily covering supervised learning techniques. The book does not delve into the mathematical foundations of machine learning or cover reinforcement learning and deep learning extensively.

Uploaded by

uojdzqcz085
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Introduction to Machine Learning with Python A

Guide for Data Scientists Andreas C. Müller pdf


download
https://textbookfull.com/product/introduction-to-machine-learning-with-python-a-guide-for-data-
scientists-andreas-c-muller/

★★★★★ 4.6/5.0 (26 reviews) ✓ 204 downloads ■ TOP RATED


"Fantastic PDF quality, very satisfied with download!" - Emma W.

DOWNLOAD EBOOK
Introduction to Machine Learning with Python A Guide for
Data Scientists Andreas C. Müller pdf download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Introduction to Machine Learning with Python A Guide for


Data Scientists 1st Edition Andreas C. Müller

Mastering Machine Learning with Python in Six Steps: A


Practical Implementation Guide to Predictive Data
Analytics Using Python 1st Edition Manohar Swamynathan
(Auth.)

Statistical Methods for Machine Learning: Discover How to


Transform Data into Knowledge with Python Jason Brownlee

Advanced Data Analytics Using Python: With Machine


Learning, Deep Learning and NLP Examples Mukhopadhyay
Hands-on Scikit-Learn for machine learning applications:
data science fundamentals with Python David Paper

Practical Python Data Visualization: A Fast Track Approach


To Learning Data Visualization With Python Ashwin Pajankar

Probability for Machine Learning - Discover How To Harness


Uncertainty With Python Jason Brownlee

Feature engineering for machine learning principles and


techniques for data scientists First Edition Casari

Practical Machine Learning for Streaming Data with Python:


Design, Develop, and Validate Online Learning Models 1st
Edition Sayan Putatunda
Introduction to Machine Learning with Python
by Andreas C. Mueller and Sarah Guido
Copyright © 2016 Sarah Guido, Andreas Mueller. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc. , 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles (
http://safaribooksonline.com ). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
[email protected] .
Editors: Meghan Blanchette and Rachel Roumeliotis
Production Editor: FILL IN PRODUCTION EDITOR
Copyeditor: FILL IN COPYEDITOR
Proofreader: FILL IN PROOFREADER
Indexer: FILL IN INDEXER
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
June 2016: First Edition
Revision History for the First Edition
2016-06-09: First Early Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491917213 for
release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Introduction to Machine Learning with Python, the cover image, and
related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author(s) have used good faith efforts to
ensure that the information and instructions contained in this work
are accurate, the publisher and the author(s) disclaim all
responsibility for errors or omissions, including without limitation
responsibility for damages resulting from the use of or reliance on
this work. Use of the information and instructions contained in this
work is at your own risk. If any code samples or other technology
this work contains or describes is subject to open source licenses or
the intellectual property rights of others, it is your responsibility to
ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-91721-3
[FILL IN]
Machine Learning with
Python
Andreas C. Mueller and Sarah Guido
Chapter 1. Introduction

Machine learning is about extracting knowledge from data. It is a


research field at the intersection of statistics, artificial intelligence and
computer science, which is also known as predictive analytics or
statistical learning. The application of machine learning methods has
in recent years become ubiquitous in everyday life. From automatic
recommendations of which movies to watch, to what food to order or
which products to buy, to personalized online radio and recognizing
your friends in your photos, many modern websites and devices have
machine learning algorithms at their core.
When you look at at complex websites like Facebook, Amazon or
Netflix, it is very likely that every part of the website you are looking
at contains multiple machine learning models.
Outside of commercial applications, machine learning has had a
tremendous influence on the way data driven research is done today.
The tools introduced in this book have been applied to diverse
scientific questions such as understanding stars, finding distant
planets, analyzing DNA sequences, and providing personalized cancer
treatments.
Your application doesn’t need to be as large-scale or world-changing
as these examples in order to benefit from machine learning. In this
chapter, we will explain why machine learning became so popular,
and dicuss what kind of problem can be solved using machine
learning. Then, we will show you how to build your first machine
learning model, introducing important concepts on the way.
Why machine learning?
In the early days of “intelligent” applications, many systems used
hand-coded rules of “if” and “else” decisions to process data or
adjust to user input. Think of a spam filter whose job is to move an
email to a spam folder. You could make up a black-list of words that
would result in an email marked as spam. This would be an example
of using an expert designed rule system to design an “intelligent”
application. Designing kind of manual design of decision rules is
feasible for some applications, in particular for those applications in
which humans have a good understanding of how a decision should
be made. However, using hand-coded rules to make decisions has
two major disadvantages:
1. The logic required to make a decision is specific to a single
domain and task. Changing the task even slightly might require
a rewrite of the whole system.
2. Designing rules requires a deep understanding of how a decision
should be made by a human expert.
One example of where this hand-coded approach will fail is in
detecting faces in images. Today every smart phone can detect a face
in an image. However, face detection was an unsolved problem until
as recent as 2001. The main problem is that the way in which pixels
(which make up an image in a computer) are “perceived by” the
computer is very different from how humans perceive a face. This
difference in representation makes it basically impossible for a human
to come up with a good set of rules to describe what constitutes a
face in a digital image.
Using machine learning, however, simply presenting a program with a
large collection of images of faces is enough for an algorithm to
determine what characteristics are needed to identify a face.
Problems that machine learning can solve
The most successful kind of machine learning algorithms are those
that automate a decision making processes by generalizing from
known examples. In this setting, which is known as a supervised
learning setting, the user provides the algorithm with pairs of inputs
and desired outputs, and the algorithm finds a way to produce the
desired output given an input.
In particular, the algorithm is able to create an output for an input it
has never seen before without any help from a human.
Going back to our example of spam classification, using machine
learning, the user provides the algorithm a large number of emails
(which are the input), together with the information about whether
any of these emails are spam (which is the desired output). Given a
new email, the algorithm will then produce a prediction as to whether
or not the new email is spam.
Machine learning algorithms that learn from input-output pairs are
called supervised learning algorithms because a “teacher” provides
supervision to the algorithm in the form of the desired outputs for
each example that they learn from.
While creating a dataset of inputs and outputs is often a laborious
manual process, supervised learning algorithms are well-understood
and their performance is easy to measure. If your application can be
formulated as a supervised learning problem, and you are able to
create a dataset that includes the desired outcome, machine learning
will likely be able to solve your problem.
Examples of supervised machine learning tasks include:
Identifying the ZIP code from handwritten digits on an
envelope. Here the input is a scan of the handwriting, and the
desired output is the actual digits in the zip code. To create a data
set for building a machine learning model, you need to collect
many envelopes. Then you can read the zip codes yourself and
store the digits as your desired outcomes.
Determining whether or not a tumor is benign based on a
medical image. Here the input is the image, and the output is
whether or not the tumor is benign. To create a data set for
building a model, you need a database of medical images. You
also need an expert opinion, so a doctor needs to look at all of the
images and decide which tumors are benign and which are not.
Detecting fraudulent activity in credit card transactions.
Here the input is a record of the credit card transaction, and the
output is whether it is likely to be fraudulent or not. Assuming that
you are the entity distributing the credit cards, collecting a dataset
means storing all transactions, and recording if a user reports any
transaction as fraudulent.
An interesting thing to note about the three examples above is that
although the inputs and outputs look fairly straight-forward, the data
collection process for these three tasks is vastly different.
While reading envelopes is laborious, it is easy and cheap. Obtaining
medical imaging and expert opinions on the other hand not only
requires expensive machinery but also rare and expensive expert
knowledge, not to mention ethical concerns and privacy issues. In the
example of detecting credit card fraud, data collection is much
simpler. Your customers will provide you with the desired output, as
they will report fraud. All you have to do to obtain the input output
pairs of fraudulent and non-fraudulent activity is wait.
The other type of algorithms that we will cover in this book is
unsupervised algorithms. In unsupervised learning, only the input
data is known and there is no known output data given to the
algorithm. While there are many successful applications of these
methods as well, they are usually harder to understand and evaluate.
Examples of unsupervised learning include:
Identifying topics in a set of blog posts. If you have a large
collection of text data, you might want to summarize it and find
prevalent themes in it. You might not know beforehand what these
topics are, or how many topics there might be. Therefore, there
are no known outputs.
Segmenting customers into groups with similar
preferences. Given a set of customer records, you might want to
identify which customers are similar, and whether there are groups
of customers with similar preferences. For a shopping site these
might be “parents”, “bookworms” or “gamers”. Since you don’t
know in advanced what these groups might be, or even how many
there are, you have no known outputs.
Detecting abnormal access patterns to a website. To
identify abuse or bugs, it is often helpful to find access patterns
that are different from the norm. Each abnormal pattern might be
very different, and you might not have any recorded instances of
abnormal behavior. Since in this example you only observe traffic,
and you don’t know what constitutes normal and abnormal
behavior, this is an unsupervised problem.
For both supervised and unsupervised learning tasks, it is important
to have a representation of your input data that a computer can
understand. Often it is helpful to think of your data as a table. Each
data point that you want to reason about (each email, each customer,
each transaction) is a row, and each property that describes that data
point (say the age of a customer, the amount or location of a
transaction) is a column.
You might describe users by their age, their gender, when they
created an account and how often they bought from your online
shop. You might describe the image of a tumor by the gray-scale
values of each pixel, or maybe by using the size, shape and color of
the tumor to describe it.
Each entity or row here is known as data point or sample in machine
learning, while the columns, the properties that describe these
entities, are called features.
We will later go into more detail on the topic of building a good
representation of your data, which is called feature extraction or
feature engineering. You should keep in mind however that no
machine learning algorithm will be able to make a prediction on data
for which it has no information. For example, if the only feature that
you have for a patient is their last name, no algorithm will be able to
predict their gender. This information is simply not contained in your
data. If you add another feature that contains their first name, you
will have much better luck, as it is often possible to tell the gender by
a person’s first name.
Knowing your data
Quite possibly the most important part in the machine learning
process is understanding the data you are working with. It will not be
effective to randomly choose an algorithm and throw your data at it.
It is necessary to understand what is going on in your dataset before
you begin building a model. Each algorithm is different in terms of
what data it works best for, what kinds data it can handle, what kind
of data it is optimized for, and so on. Before you start building a
model, it is important to know the answers to most of, if not all of,
the following questions:
How much data do I have? Do I need more?
How many features do I have? Do I have too many? Do I have too
few?
Is there missing data? Should I discard the rows with missing data
or handle them differently?
What question(s) am I trying to answer? Do I think the data
collected can answer that question?
The last bullet point is the most important question, and certainly is
not easy to answer. Thinking about these questions will help drive
your analysis.
Keeping these basics in mind as we move through the book will prove
helpful, because while scikit-learn is a fairly easy tool to use, it is
geared more towards those with domain knowledge in machine
learning.
Why Python?
Python has become the lingua franca for many data science
applications. It combines the powers of general purpose
programming languages with the ease of use of domain specific
scripting languages like matlab or R.
Python has libraries for data loading, visualization, statistics, natural
language processing, image processing, and more. This vast toolbox
provides data scientists with a large array of general and special
purpose functionality.
As a general purpose programming language, Python also allows for
the creation of complex graphic user interfaces (GUIs), web services
and for integration into existing systems.
What this book will cover
In this book, we will focus on applying machine learning algorithms
for the purpose of solving practical problems. We will focus on how to
write applications using the machine learning library scikit-learn for
the Python programming language. Important aspects that we will
cover include formulating tasks as machine learning problems,
preprocessing data for use in machine learning algorithms, and
choosing appropriate algorithms and algorithmic parameters.
We will focus mostly on supervised learning techniques and
algorithms, as these are often the most useful ones in practice, and
they are easy for beginners to use and understand.
We will also discuss several common types of input, including text
data.
What this book will not cover
This book will not cover the mathematical details of machine learning
algorithms, and we will keep the number of formulas that we include
to a minimum. In particular, we will not assume any familiarity with
linear algebra or probability theory. As mathematics, in particular
probability theory, is the foundation upon which machine learning is
build, we will not be able to go into the analysis of the algorithms in
great detail. If you are interested in the mathematics of machine
learning algorithms, we recommend the text book “Elements of
Statistical Learning” by Hastie, Tibshirani and Friedman, which is
available for free at the authors website[footnote:
http://statweb.stanford.edu/~tibs/ElemStatLearn/]. We will also not
describe how to write machine learning algorithms from scratch, and
will instead focus on how to use the large array of models already
implemented in scikit-learn and other libraries.
We will not discuss reinforcement learning, which is about an agent
learning from its interaction with an environment, and we will only
briefly touch upon deep learning.
Some of the algorithms that are implemented in scikit-learn but are
outside the scope of this book include Gaussian Processes, which are
complex probabilistic models, and semi-supervised models, which
work with supervised information on only some of the samples.
We will not also explicitly talk about how to work with time-series
data, although many of techniques we discuss are applicable to this
kind of data as well. Finally, we will not discuss how to do machine
learning on natural images, as this is beyond the scope of this book.
Scikit-learn
Scikit-learn is an open-source project, meaning that scikit-learn is
free to use and distribute, and anyone can easily obtain the source
code to see what is going on behind the scenes. The scikit-learn
project is constantly being developed and improved, and has a very
active user community. It contains a number of state-of-the-art
machine learning algorithms, as well as comprehensive
documentation about each algorithm on the website [footnote
http://scikit-learn.org/stable/documentation]. Scikit-learn is a very
popular tool, and the most prominent Python library for machine
learning. It is widely used in industry and academia, and there is a
wealth of tutorials and code snippets about scikit-learn available
online. Scikit-learn works well with a number of other scientific
Python tools, which we will discuss later in this chapter.
While studying the book, we recommend that you also browse the
scikit-learn user guide and API documentation for additional details,
and many more options to each algorithm. The online documentation
is very thorough, and this book will provide you with all the
prerequisites in machine learning to understand it in detail.
Installing Scikit-learn
Scikit-learn depends on two other Python packages, NumPy and
SciPy. For plotting and interactive development, you should also
install matplotlib, IPython and the Jupyter notebook. We recommend
using one of the following pre-packaged Python distributions, which
will provide the necessary packages:
Anaconda (https://store.continuum.io/cshop/anaconda/): a Python
distribution made for large-scale data processing, predictive
analytics, and scientific computing. Anaconda comes with NumPy,
SciPy, matplotlib, IPython, Jupyter notebooks, and scikit-learn.
Anaconda is available on Mac OS X, Windows, and Linux.
Enthought Canopy
(https://www.enthought.com/products/canopy/): another Python
distribution for scientific computing. This comes with NumPy,
SciPy, matplotlib, and IPython, but the free version does not come
with scikit-learn. If you are part of an academic, degree-granting
institution, you can request an academic license and get free
access to the paid subscription version of Enthought Canopy.
Enthought Canopy is available for Python 2.7.x, and works on Mac,
Windows, and Linux.
Python(x,y) (https://code.google.com/p/pythonxy/): a free Python
distribution for scientific computing, specifically for Windows.
Python(x,y) comes with NumPy, SciPy, matplotlib, IPython, and
scikit-learn.
If you already have a python installation set up, you can use pip to
install any of these packages.
$ pip install numpy scipy matplotlib ipython scikit-learn
We do not recommended using pip to install NumPy and SciPy on
Linux, as it involves compiling the packages from source. See the
scikit-learn website for more detailed installation.
Essential Libraries and Tools
Understanding what scikit-learn is and how to use it is important, but
there are a few other libraries that will enhance your experience.
Scikit-learn is built on top of the NumPy and SciPy scientific Python
libraries. In addition to knowing about NumPy and SciPy, we will be
using Pandas and matplotlib. We will also introduce the Jupyter
Notebook, which is an browser-based interactive programming
environment. Briefly, here is what you should know about these tools
in order to get the most out of scikit-learn.
If you are unfamiliar with numpy or matplotlib, we recommend
reading the first chapter of the scipy lecture notes[footnote:
http://www.scipy-lectures.org/].

Jupyter Notebook
The Jupyter Notebook is an interactive environment for running code
in the browser. It is a great tool for exploratory data analysis and is
widely used by data scientists. While Jupyter Notebook supports
many programming languages, we only need the Python support.
The Jypyter Notebook makes it easy to incorporate code, text, and
images, and all of this book was in fact written as an IPython
notebook.
All of the code examples we include can be downloaded from github
[FIXME add github footnote].

NumPy
NumPy is one of the fundamental packages for scientific computing in
Python. It contains functionality for multidimensional arrays, high-
level mathematical functions such as linear algebra operations and
the Fourier transform, and pseudo random number generators.
The NumPy array is the fundamental data structure in scikit-learn.
Scikit-learn takes in data in the form of NumPy arrays. Any data
you’re using will have to be converted to a NumPy array. The core
functionality of NumPy is this “ndarray”, meaning it has n dimensions,
and all elements of the array must be the same type. A NumPy array
looks like this:

import numpy as np

x = np.array([[1, 2, 3], [4, 5, 6]])


x

array([[1, 2, 3],

[4, 5, 6]])

SciPy
SciPy is both a collection of functions for scientific computing in
python. It provides, among other functionality, advanced linear
algebra routines, mathematical function optimization, signal
processing, special mathematical functions and statistical
distributions. Scikit-learn draws from SciPy’s collection of functions for
implementing its algorithms.
The most important part of scipy for us is scipy.sparse with provides
sparse matrices, which is another representation that is used for data
in scikit-learn. Sparse matrices are used whenever we want to store a
2d array that contains mostly zeros:

from scipy import sparse

# create a 2d numpy array with a diagonal of ones, and zeros everywhere


else
eye = np.eye(4)
print("Numpy array:\n%s" % eye)

# convert the numpy array to a scipy sparse matrix in CSR format


# only the non-zero entries are stored
sparse_matrix = sparse.csr_matrix(eye)
print("\nScipy sparse CSR matrix:\n%s" % sparse_matrix)
Numpy array:

[[ 1. 0. 0. 0.]

[ 0. 1. 0. 0.]

[ 0. 0. 1. 0.]

[ 0. 0. 0. 1.]]

Scipy sparse CSR matrix:

(0, 0) 1.0

(1, 1) 1.0

(2, 2) 1.0

(3, 3) 1.0

More details on scipy sparse matrices can be found in the scipy


lecture notes.

matplotlib
Matplotlib is the primary scientific plotting library in Python. It
provides function for making publication-quality visualizations such as
line charts, histograms, scatter plots, and so on. Visualizing your data
and any aspects of your analysis can give you important insights, and
we will be using matplotlib for all our visualizations.

%matplotlib inline
import matplotlib.pyplot as plt

# Generate a sequence of integers


x = np.arange(20)
# create a second array using sinus
y = np.sin(x)
# The plot function makes a line chart of one array against another
plt.plot(x, y, marker="x")

Pandas
Pandas is a Python library for data wrangling and analysis. It is built
around a data structure called DataFrame, that is modeled after the R
DataFrame. Simply put, a Pandas Pandas DataFrame is a table,
similar to an Excel Spreadsheet. Pandas provides a great range of
methods to modify and operate on this table, in particular it allows
SQL-like queries and joins of tables. Another valuable tool provided
by Pandas is its ability to ingest from a great variety of file formats
and databases, like SQL, Excel files and comma separated value
(CSV) files. Going into details about the functionality of Pandas is out
of the scope of this book. However, “Python for Data Analysis” by
Wes McKinney provides a great guide.
Here is a small example of creating a DataFrame using a dictionary:

import pandas as pd

# create a simple dataset of people


data = {'Name': ["John", "Anna", "Peter", "Linda"],
'Location' : ["New York", "Paris", "Berlin", "London"],
'Age' : [24, 13, 53, 33]
}
data_pandas = pd.DataFrame(data)
data_pandas

Age Location Name


0 24 New York John
1 13 Paris Anna
2 53 Berlin Peter
3 33 London Linda
Python2 versus Python3
There are two major versions of Python that are widely used at the
moment: Python2 (more precisely 2.7) and Python3 (with the latest
release being 3.5 at the time of writing), which sometimes leads to
some confusion. Python2 is no longer actively developed, but
because Python3 contains major changes, Python2 code does usually
not run without changes on Python3. If you are new to Python, or
are starting a new project from scratch, we highly recommend using
the latests version of Python3.
If you have a large code-base that you rely on that is written for
Python2, you are excused from upgrading for now. However, you
should try to migrate to Python3 as soon as possible. Writing any
new code, it is for the most part quite easy to write code that runs
under Python2 and Python3 [Footnote: The six package can be very
handy for that].
All the code in this book is written in a way that works for both
versions. However, the exact output might differ slightly under
Python2.
Versions Used in this Book
We are using the following versions of the above libraries in this
book:

import pandas as pd
print("pandas version: %s" % pd.__version__)

import matplotlib
print("matplotlib version: %s" % matplotlib.__version__)

import numpy as np
print("numpy version: %s" % np.__version__)

import IPython
print("IPython version: %s" % IPython.__version__)

import sklearn
print("scikit-learn version: %s" % sklearn.__version__)

pandas version: 0.17.1

matplotlib version: 1.5.1

numpy version: 1.10.4

IPython version: 4.1.2

scikit-learn version: 0.18.dev0

While it is not important to match these versions exactly, you should


have a version of scikit-learn that is as least as recent as the one we
used.
Now that we have everything set up, let’s dive into our first
appication of machine learning.
A First Application: Classifying iris species
In this section, we will go through a simple machine learning
application and create our first model.
In the process, we will introduce some core concepts and
nomenclature for machine learning.
Let’s assume that a hobby botanist is interested in distinguishing
what the species is of some iris flowers that she found. She has
collected some measurements associated with the iris: the length and
width of the petals, and the length and width of the sepal, all
measured in centimeters.
She also has the measurements of some irises that have been
previously identified by an expert botanist as belonging to the species
Setosa, Versicolor or Virginica. For these measurements, she can be
certain of which species each iris belongs to. Let’s assume that these
are the only species our hobby botanist will encounter in the wild.
Our goal is to build a machine learning model that can learn from the
measurements of these irises whose species is known, so that we can
predict the species for a new iris.
Since we have measurements for which we know the correct species
of iris, this is a supervised learning problem. In this problem, we
want to predict one of several options (the species of iris). This is an
example of a classification problem. The possible outputs (different
species of irises) are called classes.
Since every iris in the dataset belongs to one of three classes this
problem is a three-class classification problem.
The desired output for a single data point (an iris) is the species of
this flower. For a particular data point, the species it belongs to is
called its label.
Meet the data
The data we will use for this example is the iris dataset, a classical
dataset in machine learning an statistics.
It is included in scikit-learn in the dataset module. We can load it by
calling the load_iris function:

from sklearn.datasets import load_iris


iris = load_iris()

The iris object that is returned by load_iris is a Bunch object, which


is very similar to a dictionary. It contains keys and values:

iris.keys()

dict_keys(['DESCR', 'data', 'target_names', 'feature_names', 'target'])

The value to the key DESCR is a short description of the dataset. We


show the beginning of the description here. Feel free to look up the
rest yourself.

print(iris['DESCR'][:193] + "\n...")

Iris Plants Database

====================

Notes

-----

Data Set Characteristics:


dreadful

page therefore

sheeted the own

Granting time

slight
to a

Mr us

Church enterprising

and

in et image

that

colour Home
of Exponendum

of

universal down

apostle
oil think books

of uius

come

the ends legend

certain Socialism thus

any

nature Pontifices

much the the

were chronological

upon
uncommon village

alike the undeniable

two Future The

beat

decernerent sublata reform

Homilies cave

chaos them

ledger Romans
is very on

field

and yet where

He

a words it

door depth
lands

shaken desiring Society

for beyond

old of

a question

The the
of great the

his so

to legally him

Christ

used

empty

fear
first been worn

socialistic for i

the

is he

can to place

siverint
forced beneficent est

of

the even

his of feeling

the to

a by
who

this and a

an

ordinary in Promised

of than management
duty

nothing

things Orders most

work between his

Lordship lair as
however

from believCS

with the of

channels

the yes

will and

of the
form the

it

Mackey Looking

Canadian it towers

towards

The

over continue
dreamed

apparently the

Nihilism a

be an should

of is

the

plants administrative

religious reached

St C the

gardens
1884 the a

Father hatred

hill turns

half had through

but Professor as
Is

hand

looked

infirmitas

auspicatissimae It St

had for but

star

martyrum
of adopted

industrial like

before et

an history

and lake front

known the
Plato the to

the

blasphemies are under

say

without

laid the

governed a our

ever produced than


of

between came

beauties

details

that

The scale

Sermons

the I

lose
the a am

to Fro into

powers changes

that in and

The at

decidedly to to

taken

off

in of the

other ought old


7 difficulty

Englishmen

the

of the closing

valuation

touch how
To chance verb

state right a

and exegese

even

one

to in him
status

arguments

English

the upon

in must alone
Protestants

suffraganeas immense giving

good

seclusion

a contamination

being
from and

to

mirth of pipe

well Burmese century

readers great

when enthusiastic

what

himself few in
blue of

object

speaker else

of a life

sense recensions

those he Dakota
religionis made

faults

island

nor

filled them

of
upon to the

article the

his clerisy sailors

have

have the
nation the as

mountainous we

each

in latter and

achieve

Ireland was First

Innominatus to

and 17
chambers to a

quick He protracted

the

with

all have

means and ang

petroleum

executed
sole on

or Rev

Ibrcibly One

to

and deep the

the life is

centralization in

given Waedour
penalty petroleum

ground Here

money

all

There ability

be reach content

rights

since which west

Like
renovare whose our

any

takes

to

was religious trade

Home

of idle world

for LApotre
end Answered

later sold

he

but

time

clue frontier
heard great and

be

with of

the expeditior seven

the and at

most

ten the

like are which


essence camel but

Death

ope

science

excellent paying the

as the to

of of other

the that

to the
A with

abode line

1 those

great be of

of of

of the fifty

on

SOME Lucas
the only

officer seeds

in doctrines

to wind

the

to gots Black

trust with

for
84 regard

situated is

to brings

to is t

the which important

He eyes The

beyond

false
description to

the prudence

work the

abandon

have March

ratione in glycerine

man Bitumen rain

and

columnar Mr

Nepomuck
abated

every quick discharge

of

considered of gigantic

blown

Baku after believe


appear

see

his level and

well and millions

for of the

lies deem with

it that

proved when it

certain

et conscientiously
a occupants

Session dared

a account to

of Necromancer

Second are

devotion vision controversy

the A

poverty so its

his take
Christian to the

feet

belonging

or via can

or an has

statements a the
many censures

however a

of and priesthood

was the

and Curry the

which

so will China

request

party

though under
draft

500

solid probably to

Hence

indulgent its

the
harbour the taught

wells flames seems

through by dotassent

in evidences

now in celebrated

he

36 in adventurers

Notes

but
efforts

journeys details up

all by woman

Tientsin eighteen and

has order in

conventual you greedily

the only

likewise new of

statue
may

Kaiping Still

state in

of

give

and concerned prosperity

words but

The Of

at large
of the determined

cognosceudi years this

others

unsoundness

self

of suddenly natives

governing he

economy the
such

must advantage

their gloom Butler

the

they rulers through

Dr

the s
white leading

Tozer Notices

in

the

been
of section

rid from

so of

living far was

of Not

introduced have ancestress


giants two has

is has to

Anarchists

in life

he more words

I in

which necessity

only more than

INDIA the
miss

a Lords

although

criticism

of to in

influential

vessels

is

not 1883

agrariam whose
the country

own dynasty is

and pronounced

text benefit

valleys in

him as to

a ten but

picturesque was contrary

this than

for Lord to
distinct As

story a a

clatter might the

ceremoniously

question the

vero
the sub

hot

first should of

have

front

would

patriotism

was

suffer
remarkable

the reaHzation

in while

an interesting

to would

relating and relaxations

repelling Church

Miss thoroughly

law feared to
Marco on

passage

tread

is petty

State

never for
doute

against law of

before

Oriental

appear

that

a every does

interested creatures contradictions

eternal ends Lilly


been

Adventurers from

www

the This certainly

in

transcends of

District

there

from
the a long

will A as

relation word being

of

and The

renovare

and system in

Kegan of of

to of
Cairo a

what man

Temple

taken to having

addition

that one

but might doubtful

in walking

on charmingly
never are in

that

die

made

and least

from

school

version the to

of

explained
alibi the having

this its the

fools

and education the

qua of

Amiel gain

ninety p
destroyed

the

functions of

confusion well its

made inhabitants Mr

one least

outset and smelling

no of
birth Boys

the

of

to style in

extends country 81

inches of

task
wherein

in of

Imperial the

principle

best possible means

never the Guardian

his would

Chinese the

of
white

which of pending

of duties

system Vobis

which the right

sedes

of the
passim feasts

which

which

the involve

had
looked

such to

of the

has one and

de

delectation but

had

happy lation even

s tone Roleplaying

Declaration
mostly upon

German

More himself

by men

it air Underfoot

speak

hanging then would

millions

up of
a own

public Sociology either

Bohea its

probably promise houses

here It the

author eruption
so in

no to used

striking

70 was rose

pages for unity

in settlement at

that was

touching learn
energy consequently

expedient fitness hazardous

saying his the

its

triumph

riches in consumpti
suosque work two

at trees

During took the

Some

foregoing

of by

very
live No exercised

of convenient

pp all

of fully VARIOUS

the

do

was embedded gone

Biblical without
any

tale inspection

Lao

off

opposite aware

bewildering survival his

responsibility the

a historical

traceable A and
the

among from

have Cocaizore foot

as through

his exterior Life

do in
documents

the his

evening

Lao it

reprobation all which

properly

regard monopoly
on master

Church passed

services Tarbutt

Atlantis on For

A he bag

is

Quamobrem visit

and

You might also like