Skip to content

Commit aaa20ca

Browse files
committed
recommit joel comments
1 parent 4cf9865 commit aaa20ca

File tree

2 files changed

+51
-46
lines changed

2 files changed

+51
-46
lines changed

doc/documentation.rst

Lines changed: 31 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@ Documentation of scikit-learn 0.15
3838
<!-- Documentation overview -->
3939
<div class="row-fluid">
4040
<div class="span4 box">
41-
<h2><a href="tutorial/statistical_inference/index.html">Statistical Learning Tutorial</a></h2>
42-
<blockquote>A tutorial on statistical learning for
43-
data analysis. Contains a more in-depth discussion
44-
of important concepts.
41+
<h2><a href="tutorial/index.html">Tutorials</a></h2>
42+
<blockquote>Useful tutorials for developing a feel
43+
for some of scikit-learn's applications in the
44+
machine learning field.
4545
</blockquote>
4646
</div>
4747
<div class="span4 box">
@@ -52,9 +52,29 @@ Documentation of scikit-learn 0.15
5252
</blockquote>
5353
</div>
5454
<div class="span4 box">
55-
<h2><a href="presentations.html">Additional Resources</a></h2>
56-
<blockquote>Talks given, slide-sets and other information relevant to scikit-learn.
57-
</blockquote>
55+
<!-- doc versions -->
56+
<h2>Other Versions</h2>
57+
<ul>
58+
<li><a href="http://scikit-learn.org/stable/user_guide.html">scikit-learn 0.14 (stable)</a></li>
59+
<li>scikit-learn 0.15 (development)</li>
60+
<li><a href="http://scikit-learn.org/0.13/user_guide.html">scikit-learn 0.13</a></li>
61+
<li><a href="http://scikit-learn.org/0.12/user_guide.html">scikit-learn 0.12</a></li>
62+
<li><a href="http://scikit-learn.org/0.11/user_guide.html">scikit-learn 0.11</a></li>
63+
<li id="other-versions">Older versions
64+
<a class="btn dropdown-toggle" data-toggle="dropdown">
65+
<span class="caret"></span>
66+
</a>
67+
<ul class="dropdown-menu">
68+
<li><a href="http://scikit-learn.org/0.10/user_guide.html">scikit-learn 0.10</a></li>
69+
<li><a href="http://scikit-learn.org/0.9/user_guide.html">scikit-learn 0.9</a></li>
70+
<li><a href="http://scikit-learn.org/0.8/user_guide.html">scikit-learn 0.8</a></li>
71+
<li><a href="http://scikit-learn.org/0.7/user_guide.html">scikit-learn 0.7</a></li>
72+
<li><a href="http://scikit-learn.org/0.6/user_guide.html">scikit-learn 0.6</a></li>
73+
<li><a href="http://scikit-learn.org/0.5/user_guide.html">scikit-learn 0.5</a></li>
74+
</ul>
75+
</li>
76+
</ul>
77+
5878
</div>
5979
<!-- row -->
6080
</div>
@@ -63,11 +83,9 @@ Documentation of scikit-learn 0.15
6383
<!-- row -->
6484
<div class="row-fluid">
6585
<div class="span4 box">
66-
<h2><a href="tutorial/text_analytics/working_with_text_data.html">Text Analysis Tutorial</a></h2>
67-
<blockquote>This tutorial explores an application of machine learning, namely, Text-analysis.
68-
The goal of this guide is to utilize some of the main scikit-learn tools in order to analyze
69-
a collection of text documents.
70-
</blockquote>
86+
<h2><a href="presentations.html">Additional Resources</a></h2>
87+
<blockquote>Talks given, slide-sets and other information relevant to scikit-learn.
88+
</blockquote>
7189
</div>
7290
<div class="span4 box">
7391
<h2><a href="developers/index.html">Contributing</a></h2>
@@ -76,29 +94,6 @@ Documentation of scikit-learn 0.15
7694
how to build their own estimators.
7795
</blockquote>
7896
</div>
79-
<!-- doc versions -->
80-
<div class="span4 box">
81-
<h2>Other Versions</h2>
82-
<ul>
83-
<li><a href="http://scikit-learn.org/stable/user_guide.html">scikit-learn 0.14 (stable)</a></li>
84-
<li>scikit-learn 0.15 (development)</li>
85-
<li><a href="http://scikit-learn.org/0.13/user_guide.html">scikit-learn 0.13</a></li>
86-
<li><a href="http://scikit-learn.org/0.12/user_guide.html">scikit-learn 0.12</a></li>
87-
<li><a href="http://scikit-learn.org/0.11/user_guide.html">scikit-learn 0.11</a></li>
88-
<li id="other-versions">Older versions
89-
<a class="btn dropdown-toggle" data-toggle="dropdown">
90-
<span class="caret"></span>
91-
</a>
92-
<ul class="dropdown-menu">
93-
<li><a href="http://scikit-learn.org/0.10/user_guide.html">scikit-learn 0.10</a></li>
94-
<li><a href="http://scikit-learn.org/0.9/user_guide.html">scikit-learn 0.9</a></li>
95-
<li><a href="http://scikit-learn.org/0.8/user_guide.html">scikit-learn 0.8</a></li>
96-
<li><a href="http://scikit-learn.org/0.7/user_guide.html">scikit-learn 0.7</a></li>
97-
<li><a href="http://scikit-learn.org/0.6/user_guide.html">scikit-learn 0.6</a></li>
98-
<li><a href="http://scikit-learn.org/0.5/user_guide.html">scikit-learn 0.5</a></li>
99-
</ul>
100-
</li>
101-
</ul>
102-
</div>
97+
10398
</div>
10499

doc/tutorial/text_analytics/working_with_text_data.rst

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,12 @@ reference the filenames are also available::
119119

120120
Let's print the first lines of the first loaded file::
121121

122-
>>> print "\n".join(twenty_train.data[0].split("\n")[:3])
122+
>>> print ("\n".join(twenty_train.data[0].split("\n")[:3]))
123123
From: [email protected] (Michael Collier)
124124
Subject: Converting images to HP LaserJet III?
125125
Nntp-Posting-Host: hampton
126126

127-
>>> print twenty_train.target_names[twenty_train.target[0]]
127+
>>> print (twenty_train.target_names[twenty_train.target[0]])
128128
comp.graphics
129129

130130
Supervised learning algorithms will require a category label for each
@@ -143,7 +143,7 @@ integer id of each sample is stored in the ``target`` attribute::
143143
It is possible to get back the category names as follows::
144144

145145
>>> for t in twenty_train.target[:10]:
146-
... print twenty_train.target_names[t]
146+
... print (twenty_train.target_names[t])
147147
...
148148
comp.graphics
149149
comp.graphics
@@ -168,6 +168,8 @@ Extracting features from text files
168168
In order to perform machine learning on text documents, we first need to
169169
turn the text content into numerical feature vectors.
170170

171+
.. currentmodule:: sklearn.feature_extraction.text
172+
171173

172174
Bags of words
173175
~~~~~~~~~~~~~
@@ -212,7 +214,7 @@ dictionary of features and transform documents to feature vectors::
212214
>>> X_train_counts.shape
213215
(2257, 35788)
214216

215-
``CountVectorizer`` supports counts of N-grams of words or consequective characters.
217+
:class:`CountVectorizer` supports counts of N-grams of words or consequective characters.
216218
Once fitted, the vectorizer has built a dictionary of feature indices::
217219

218220
>>> count_vect.vocabulary_.get(u'algorithm')
@@ -262,6 +264,14 @@ Both **tf** and **tf–idf** can be computed as follows::
262264
>>> X_train_tf.shape
263265
(2257, 35788)
264266

267+
In the above example-code, we firstly use the ``fit(..)`` method to fit our
268+
estimator to the data and secondly the ``transform(..)`` method to transform
269+
our count-matrix to a tf-idf representation.
270+
These two steps can be combined to achieve the same end result faster
271+
by skipping redundant processing. This is done through using the
272+
``fit_transform(..)`` method as shown below, and as mentioned in the note
273+
in the previous section.
274+
265275
>>> tfidf_transformer = TfidfTransformer()
266276
>>> X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
267277
>>> X_train_tfidf.shape
@@ -271,7 +281,7 @@ Both **tf** and **tf–idf** can be computed as follows::
271281
Training a classifier
272282
---------------------
273283

274-
Now that we have our feature, we can train a classifier to try to predict
284+
Now that we have our features, we can train a classifier to try to predict
275285
the category of a post. Let's start with a :ref:`naïve Bayes <naive_bayes>`
276286
classifier, which
277287
provides a nice baseline for this task. ``scikit-learn`` includes several
@@ -293,7 +303,7 @@ on the transformers, since they have already been fit to the training set::
293303
>>> predicted = clf.predict(X_new_tfidf)
294304

295305
>>> for doc, category in zip(docs_new, predicted):
296-
... print '%r => %s' % (doc, twenty_train.target_names[category])
306+
... print ('%r => %s' % (doc, twenty_train.target_names[category]))
297307
...
298308
'God is love' => soc.religion.christian
299309
'OpenGL on the GPU is fast' => comp.graphics
@@ -316,7 +326,7 @@ The names ``vect``, ``tfidf`` and ``clf`` (classifier) are arbitrary.
316326
We shall see their use in the section on grid search, below.
317327
We can now train the model with a single command::
318328

319-
>>> _ = text_clf.fit(twenty_train.data, twenty_train.target)
329+
>>> text_clf = text_clf.fit(twenty_train.data, twenty_train.target)
320330

321331

322332
Evaluation of the performance on the test set
@@ -354,8 +364,8 @@ classifier object into our pipeline::
354364
analysis of the results::
355365

356366
>>> from sklearn import metrics
357-
>>> print metrics.classification_report(twenty_test.target, predicted,
358-
... target_names=twenty_test.target_names)
367+
>>> print (metrics.classification_report(twenty_test.target, predicted,
368+
... target_names=twenty_test.target_names))
359369
... # doctest: +NORMALIZE_WHITESPACE
360370
precision recall f1-score support
361371
<BLANKLINE>
@@ -444,7 +454,7 @@ we can do::
444454

445455
>>> best_parameters, score, _ = max(gs_clf.grid_scores_, key=lambda x: x[1])
446456
>>> for param_name in sorted(parameters.keys()):
447-
... print "%s: %r" % (param_name, best_parameters[param_name])
457+
... print ("%s: %r" % (param_name, best_parameters[param_name]))
448458
...
449459
clf__alpha: 0.001
450460
tfidf__use_idf: True

0 commit comments

Comments
 (0)