- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 2.0.0-dev0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
OneHotEncoder(
    drop: typing.Optional[typing.Literal["most_frequent"]] = None,
    min_frequency: typing.Optional[int] = None,
    max_categories: typing.Optional[int] = None,
)Encode categorical features as a one-hot format.
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka 'one-of-K' or 'dummy') encoding scheme.
Note that this method deviates from Scikit-Learn; instead of producing sparse
binary columns, the encoding is a single column of STRUCT<index INT64, value DOUBLE>.
Examples:
Given a dataset with two features, we let the encoder find the unique
values per feature and transform the data to a binary one-hot encoding.
>>> from bigframes.ml.preprocessing import OneHotEncoder
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> enc = OneHotEncoder()
>>> X = bpd.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
>>> enc.fit(X)
OneHotEncoder()
>>> print(enc.transform(bpd.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))
                onehotencoded_a               onehotencoded_b
0  [{'index': 1, 'value': 1.0}]  [{'index': 1, 'value': 1.0}]
1  [{'index': 2, 'value': 1.0}]  [{'index': 0, 'value': 1.0}]
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| drop | Optional[Literal["most_frequent"]], default NoneSpecifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into an unregularized linear regression model. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models. Default None: retain all the categories. "most_frequent": Drop the most frequent category found in the string expression. Selecting this value causes the function to use dummy encoding. | 
| min_frequency | Optional[int], default NoneSpecifies the minimum frequency below which a category will be considered infrequent. Default None. int: categories with a smaller cardinality will be considered infrequent as index 0. | 
| max_categories | Optional[int], default NoneSpecifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. Default None. Set limit to 1,000,000. | 
Methods
__repr__
__repr__()Print the estimator's constructor with all non-default parameter values.
fit
fit(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series], y=None
) -> bigframes.ml.preprocessing.OneHotEncoderFit OneHotEncoder to X.
| Parameters | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.SeriesThe DataFrame or Series with training data. | 
| y | default NoneIgnored. | 
| Returns | |
|---|---|
| Type | Description | 
| OneHotEncoder | Fitted encoder. | 
fit_transform
fit_transform(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y: typing.Optional[
        typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
    ] = None,
) -> bigframes.dataframe.DataFrameAPI documentation for fit_transform method.
get_params
get_params(deep: bool = True) -> typing.Dict[str, typing.Any]Get parameters for this estimator.
| Parameter | |
|---|---|
| Name | Description | 
| deep | bool, default TrueDefault  | 
| Returns | |
|---|---|
| Type | Description | 
| Dictionary | A dictionary of parameter names mapped to their values. | 
to_gbq
to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.base._TSave the transformer as a BigQuery model.
| Parameters | |
|---|---|
| Name | Description | 
| model_name | strThe name of the model. | 
| replace | bool, default FalseDetermine whether to replace if the model already exists. Default to False. | 
transform
transform(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
) -> bigframes.dataframe.DataFrameTransform X using one-hot encoding.
| Parameter | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.SeriesThe DataFrame or Series to be transformed. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | The result is categorized as index: number, value: number, where index is the position of the dict seeing the category, and value is 0 or 1. |