Skip to content

Commit aec8dd0

Browse files
committed
Add doc to clarify the use of DictVectorizer when categorical features are represented as numeric values
1 parent 070ff21 commit aec8dd0

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

sklearn/feature_extraction/dict_vectorizer.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ class DictVectorizer(BaseEstimator, TransformerMixin):
3737
a feature "f" that can take on the values "ham" and "spam" will become two
3838
features in the output, one signifying "f=ham", the other "f=spam".
3939
40+
However, note that this transformer will only do a binary one-hot encoding
41+
when feature values are of type string. If categorical features are
42+
represented as numeric values such as int, the DictVectorizer can be
43+
followed by OneHotEncoder to complete binary one-hot encoding.
44+
4045
Features that do not occur in a sample (mapping) will have a zero value
4146
in the resulting array/matrix.
4247

0 commit comments

Comments
 (0)