Skip to content

Commit a8e7e48

Browse files
committed
Merge pull request scikit-learn#6285 from yenchenlin1994/update-DictVectorizer-doc-about-one-hot-encoding
[MRG+1] Doc Add doc in DictVectorizer when categorical features are numeric values (fixes scikit-learn#4413)
2 parents b7998c2 + aec8dd0 commit a8e7e48

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

sklearn/feature_extraction/dict_vectorizer.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ class DictVectorizer(BaseEstimator, TransformerMixin):
3737
a feature "f" that can take on the values "ham" and "spam" will become two
3838
features in the output, one signifying "f=ham", the other "f=spam".
3939
40+
However, note that this transformer will only do a binary one-hot encoding
41+
when feature values are of type string. If categorical features are
42+
represented as numeric values such as int, the DictVectorizer can be
43+
followed by OneHotEncoder to complete binary one-hot encoding.
44+
4045
Features that do not occur in a sample (mapping) will have a zero value
4146
in the resulting array/matrix.
4247

0 commit comments

Comments
 (0)