Skip to content

Commit 288b26c

Browse files
committed
initial commit of tutorial materials
1 parent 3badfb6 commit 288b26c

24 files changed

+6957
-2
lines changed

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,10 @@ docs/_build/
5555

5656
# PyBuilder
5757
target/
58+
59+
# IPython
60+
.ipynb_checkpoints
61+
notebooks/.ipynb_checkpoints
62+
63+
# Emacs
64+
*~

README.md

Lines changed: 56 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,56 @@
1-
# sklearn_tutorial
2-
Materials for my scikit-learn tutorial
1+
# Scikit-learn Tutorial
2+
3+
*Jake VanderPlas*
4+
5+
6+
- twitter: [@jakevdp](https://twitter.com/jakevdp)
7+
- github: [jakevdp](http://github.com/jakevdp)
8+
9+
This repository contains notebooks and other files associated with my
10+
[Scikit-learn](http://scikit-learn.org) tutorial.
11+
12+
## Installation Notes
13+
This tutorial requires the following packages:
14+
15+
- Python version 2.6-2.7 or 3.3+
16+
- `numpy` version 1.5 or later: http://www.numpy.org/
17+
- `scipy` version 0.10 or later: http://www.scipy.org/
18+
- `matplotlib` version 1.3 or later: http://matplotlib.org/
19+
- `scikit-learn` version 0.14 or later: http://scikit-learn.org
20+
- `ipython` version 2.0 or later, with notebook support: http://ipython.org
21+
- `seaborn` version 0.5 or later
22+
23+
The easiest way to get these is to use the [conda](https://store.continuum.io/) environment manager.
24+
I suggest downloading and installing [miniconda](http://conda.pydata.org/miniconda.html).
25+
26+
Once this is installed, the following command will install all required packages in your Python environment:
27+
```
28+
$ conda install numpy scipy matplotlib scikit-learn ipython-notebook seaborn
29+
```
30+
31+
Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.
32+
33+
## Downloading the Tutorial Materials
34+
I would highly recommend using git, not only for this tutorial, but for the
35+
general betterment of your life. Once git is installed, you can clone the
36+
material in this tutorial by using the git address shown above:
37+
38+
git clone git://github.com/jakevdp/sklearn_tutorial.git
39+
40+
If you can't or don't want to install git, there is a link above to download
41+
the contents of this repository as a zip file. I may make minor changes to
42+
the repository in the days before the tutorial, however, so cloning the
43+
repository is a much better option.
44+
45+
46+
## Notebook Listing
47+
You can [view the tutorial materials](http://nbviewer.ipython.org/github/jakevdp/sklearn_tutorial/blob/master/notebooks/Index.ipynb) using the excellent nbviewer service.
48+
49+
Note, however, that you cannot modify or run the contents within nbviewer.
50+
To modify them, first download the tutorial repository, change to the notebooks directory, and run ``ipython notebook``.
51+
You should see the list in the ipython notebook launch page in your web browser.
52+
For more information on the IPython notebook, see http://ipython.org/notebook.html
53+
54+
Note also that some of the code in these notebooks will not work outside the
55+
directory structure of this tutorial, so it is important to clone the full
56+
repository if possible.

notebooks/01-Preliminaries.ipynb

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
{
2+
"metadata": {
3+
"name": "",
4+
"signature": "sha256:e15002059f80bd12a6b29c25b2179198b517b5b98da80bcfa018729797f25aea"
5+
},
6+
"nbformat": 3,
7+
"nbformat_minor": 0,
8+
"worksheets": [
9+
{
10+
"cells": [
11+
{
12+
"cell_type": "markdown",
13+
"metadata": {},
14+
"source": [
15+
"<small><i>This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com). Source and license info is on [GitHub](https://github.com/jakevdp/sklearn_tutorial/).</i></small>"
16+
]
17+
},
18+
{
19+
"cell_type": "heading",
20+
"level": 1,
21+
"metadata": {},
22+
"source": [
23+
"An Introduction to scikit-learn: Machine Learning in Python"
24+
]
25+
},
26+
{
27+
"cell_type": "heading",
28+
"level": 2,
29+
"metadata": {},
30+
"source": [
31+
"Goals of this Tutorial"
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"- **Introduce the basics of Machine Learning**, and some skills useful in practice.\n",
39+
"- **Introduce the syntax of scikit-learn**, so that you can make use of the rich toolset available."
40+
]
41+
},
42+
{
43+
"cell_type": "heading",
44+
"level": 2,
45+
"metadata": {},
46+
"source": [
47+
"Schedule:"
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"metadata": {},
53+
"source": [
54+
"**10:00 - 10:15** Preliminaries: Setup & introduction\n",
55+
"* Making sure your computer is set-up\n",
56+
"\n",
57+
"**10:15 - 11:00** Basic Principles of Machine Learning and the Scikit-learn Interface\n",
58+
"* What is Machine Learning?\n",
59+
"* Machine learning data layout\n",
60+
"* Supervised Learning\n",
61+
" - Classification\n",
62+
" - Regression\n",
63+
" - Measuring performance\n",
64+
"* Unsupervised Learning\n",
65+
" - Clustering\n",
66+
" - Dimensionality Reduction\n",
67+
" - Density Estimation\n",
68+
"* Evaluation of Learning Models\n",
69+
"* Choosing the right algorithm for your dataset\n",
70+
"\n",
71+
"**11:00 - 12:00** Supervised learning in-depth\n",
72+
"* Support Vector Machines\n",
73+
"* Decision Trees and Random Forests\n",
74+
"\n",
75+
"*The tutorial repository contains additional material which we will not cover here. My hope is that you will find it useful to read-through on your own if you want to go deeper!*"
76+
]
77+
},
78+
{
79+
"cell_type": "heading",
80+
"level": 2,
81+
"metadata": {},
82+
"source": [
83+
"Preliminaries"
84+
]
85+
},
86+
{
87+
"cell_type": "markdown",
88+
"metadata": {},
89+
"source": [
90+
"This tutorial requires the following packages:\n",
91+
"\n",
92+
"- Python version 2.6-2.7 or 3.3-3.4\n",
93+
"- `numpy` version 1.5 or later: http://www.numpy.org/\n",
94+
"- `scipy` version 0.10 or later: http://www.scipy.org/\n",
95+
"- `matplotlib` version 1.3 or later: http://matplotlib.org/\n",
96+
"- `scikit-learn` version 0.14 or later: http://scikit-learn.org\n",
97+
"- `ipython` version 2.0 or later, with notebook support: http://ipython.org\n",
98+
"- `seaborn`: version 0.5 or later, used mainly for plot styling\n",
99+
"\n",
100+
"The easiest way to get these is to use the [conda](http://store.continuum.io/) environment manager.\n",
101+
"I suggest downloading and installing [miniconda](http://conda.pydata.org/miniconda.html).\n",
102+
"\n",
103+
"The following command will install all required packages:\n",
104+
"```\n",
105+
"$ conda install numpy scipy matplotlib scikit-learn ipython-notebook\n",
106+
"```\n",
107+
"\n",
108+
"Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/."
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"### Checking your installation\n",
116+
"\n",
117+
"You can run the following code to check the versions of the packages on your system:\n",
118+
"\n",
119+
"(in IPython notebook, press `shift` and `return` together to execute the contents of a cell)"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"collapsed": false,
125+
"input": [
126+
"from __future__ import print_function\n",
127+
"\n",
128+
"import IPython\n",
129+
"print('IPython:', IPython.__version__)\n",
130+
"\n",
131+
"import numpy\n",
132+
"print('numpy:', numpy.__version__)\n",
133+
"\n",
134+
"import scipy\n",
135+
"print('scipy:', scipy.__version__)\n",
136+
"\n",
137+
"import matplotlib\n",
138+
"print('matplotlib:', matplotlib.__version__)\n",
139+
"\n",
140+
"import sklearn\n",
141+
"print('scikit-learn:', sklearn.__version__)\n",
142+
"\n",
143+
"import seaborn\n",
144+
"print('seaborn', seaborn.__version__)"
145+
],
146+
"language": "python",
147+
"metadata": {},
148+
"outputs": [
149+
{
150+
"output_type": "stream",
151+
"stream": "stdout",
152+
"text": [
153+
"IPython: 2.4.1\n",
154+
"numpy:"
155+
]
156+
},
157+
{
158+
"output_type": "stream",
159+
"stream": "stdout",
160+
"text": [
161+
" 1.9.2\n",
162+
"scipy: 0.15.1\n",
163+
"matplotlib: 1.4.3\n",
164+
"scikit-learn:"
165+
]
166+
},
167+
{
168+
"output_type": "stream",
169+
"stream": "stdout",
170+
"text": [
171+
" 0.15.2\n",
172+
"seaborn"
173+
]
174+
},
175+
{
176+
"output_type": "stream",
177+
"stream": "stdout",
178+
"text": [
179+
" 0.5.1\n"
180+
]
181+
}
182+
],
183+
"prompt_number": 1
184+
},
185+
{
186+
"cell_type": "heading",
187+
"level": 2,
188+
"metadata": {},
189+
"source": [
190+
"Useful Resources"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"- **scikit-learn:** http://scikit-learn.org (see especially the narrative documentation)\n",
198+
"- **matplotlib:** http://matplotlib.org (see especially the gallery section)\n",
199+
"- **IPython:** http://ipython.org (also check out http://nbviewer.ipython.org)"
200+
]
201+
}
202+
],
203+
"metadata": {}
204+
}
205+
]
206+
}

notebooks/02.1-Machine-Learning-Intro.ipynb

Lines changed: 587 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)