Skip to content

Commit 46c4bbb

Browse files
committed
where to from here section
1 parent aaa20ca commit 46c4bbb

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

doc/tutorial/text_analytics/working_with_text_data.rst

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -536,3 +536,32 @@ English.
536536
Bonus point if the utility is able to give a confidence level for its
537537
predictions.
538538

539+
540+
Where to From Here
541+
------------------
542+
543+
Here are a few suggestions to help further your scikit-learn intuition
544+
upon the completion of this tutorial:
545+
546+
547+
- Try playing around with the `analyzer` and `token normalisation` under
548+
:class:`CountVectorizer`
549+
550+
- If you don't have labels, try using
551+
:ref:`Clustering <example_document_clustering.py>`
552+
on your problem.
553+
554+
- If you have multiple labels per document, e.g categories, have a look
555+
at the :ref:`Multiclass and multilabel section <multiclass>`
556+
557+
- Try using :ref:`PCA (Principal Component Analysis) <decompositions>` for
558+
`latent semantic analysis <http://en.wikipedia.org/wiki/Latent_semantic_analysis>`_.
559+
560+
- Have a look at using
561+
:ref:`Out-of-core Classification
562+
<example_applications_plot_out_of_core_classification.py>` to
563+
learn from data that would not fit into the computer main memory.
564+
565+
- If you have too many sparse features, try using the :ref:`Hashing Vectorizer
566+
<hashing_vectorizer>`.
567+

0 commit comments

Comments
 (0)