Add doc to clarify the use of DictVectorizer when categorical features are represented as numeric values

yenchenlin · yenchenlin · commit aec8dd08e8e7 · 2016-02-05T15:34:50.000+08:00
diff --git a/sklearn/feature_extraction/dict_vectorizer.py b/sklearn/feature_extraction/dict_vectorizer.py
@@ -37,6 +37,11 @@ class DictVectorizer(BaseEstimator, TransformerMixin):
     a feature "f" that can take on the values "ham" and "spam" will become two
     features in the output, one signifying "f=ham", the other "f=spam".
 
+    However, note that this transformer will only do a binary one-hot encoding
+    when feature values are of type string. If categorical features are
+    represented as numeric values such as int, the DictVectorizer can be
+    followed by OneHotEncoder to complete binary one-hot encoding.
+
     Features that do not occur in a sample (mapping) will have a zero value
     in the resulting array/matrix.