@@ -508,15 +508,13 @@ Such features can be efficiently coded as integers, for instance
508508``[1, 2, 1] ``.
509509
510510To convert categorical features to such integer codes, we can use the
511- :class: `CategoricalEncoder `. When specifying that we want to perform an
512- ordinal encoding, the estimator transforms each categorical feature to one
511+ :class: `OrdinalEncoder `. This estimator transforms each categorical feature to one
513512new feature of integers (0 to n_categories - 1)::
514513
515- >>> enc = preprocessing.CategoricalEncoder(encoding='ordinal' )
514+ >>> enc = preprocessing.OrdinalEncoder( )
516515 >>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
517516 >>> enc.fit(X) # doctest: +ELLIPSIS
518- CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,
519- encoding='ordinal', handle_unknown='error')
517+ OrdinalEncoder(categories='auto', dtype=<... 'numpy.float64'>)
520518 >>> enc.transform([['female', 'from US', 'uses Safari']])
521519 array([[0., 1., 1.]])
522520
@@ -528,18 +526,19 @@ browsers was ordered arbitrarily).
528526Another possibility to convert categorical features to features that can be used
529527with scikit-learn estimators is to use a one-of-K, also known as one-hot or
530528dummy encoding.
531- This type of encoding is the default behaviour of the :class: `CategoricalEncoder `.
532- The :class: ` CategoricalEncoder ` then transforms each categorical feature with
529+ This type of encoding can be obtained with the :class: `OneHotEncoder `,
530+ which transforms each categorical feature with
533531``n_categories `` possible values into ``n_categories `` binary features, with
534532one of them 1, and all others 0.
535533
536534Continuing the example above::
537535
538- >>> enc = preprocessing.CategoricalEncoder ()
536+ >>> enc = preprocessing.OneHotEncoder ()
539537 >>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
540538 >>> enc.fit(X) # doctest: +ELLIPSIS
541- CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,
542- encoding='onehot', handle_unknown='error')
539+ OneHotEncoder(categorical_features=None, categories=None,
540+ dtype=<... 'numpy.float64'>, handle_unknown='error',
541+ n_values=None, sparse=True)
543542 >>> enc.transform([['female', 'from US', 'uses Safari'],
544543 ... ['male', 'from Europe', 'uses Safari']]).toarray()
545544 array([[1., 0., 0., 1., 0., 1.],
@@ -558,14 +557,15 @@ dataset::
558557 >>> genders = ['female', 'male']
559558 >>> locations = ['from Africa', 'from Asia', 'from Europe', 'from US']
560559 >>> browsers = ['uses Chrome', 'uses Firefox', 'uses IE', 'uses Safari']
561- >>> enc = preprocessing.CategoricalEncoder (categories=[genders, locations, browsers])
560+ >>> enc = preprocessing.OneHotEncoder (categories=[genders, locations, browsers])
562561 >>> # Note that for there are missing categorical values for the 2nd and 3rd
563562 >>> # feature
564563 >>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
565564 >>> enc.fit(X) # doctest: +ELLIPSIS
566- CategoricalEncoder(categories=[...],
567- dtype=<... 'numpy.float64'>, encoding='onehot',
568- handle_unknown='error')
565+ OneHotEncoder(categorical_features=None,
566+ categories=[...],
567+ dtype=<... 'numpy.float64'>, handle_unknown='error',
568+ n_values=None, sparse=True)
569569 >>> enc.transform([['female', 'from Asia', 'uses Chrome']]).toarray()
570570 array([[1., 0., 0., 1., 0., 0., 1., 0., 0., 0.]])
571571
@@ -577,11 +577,12 @@ during transform, no error will be raised but the resulting one-hot encoded
577577columns for this feature will be all zeros
578578(``handle_unknown='ignore' `` is only supported for one-hot encoding)::
579579
580- >>> enc = preprocessing.CategoricalEncoder (handle_unknown='ignore')
580+ >>> enc = preprocessing.OneHotEncoder (handle_unknown='ignore')
581581 >>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
582582 >>> enc.fit(X) # doctest: +ELLIPSIS
583- CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,
584- encoding='onehot', handle_unknown='ignore')
583+ OneHotEncoder(categorical_features=None, categories=None,
584+ dtype=<... 'numpy.float64'>, handle_unknown='ignore',
585+ n_values=None, sparse=True)
585586 >>> enc.transform([['female', 'from Asia', 'uses Chrome']]).toarray()
586587 array([[1., 0., 0., 0., 0., 0.]])
587588
0 commit comments