@@ -121,7 +121,31 @@ The names of the classes is stored in the last attribute, namely
121121Handling categorical features
122122-----------------------------
123123
124- TODO
124+ Sometimes people describe samples with categorical descriptors that
125+ have no obvious numerical representation. For instance assume that
126+ each flower is further described by a color name among a fixed list
127+ of color names::
128+
129+ color in ['purple', 'blue', 'red']
130+
131+ The simple way to turn this categorical feature into numerical
132+ features suitable for machine learning is to create new features
133+ for each distinct color name that can be valued to ``1.0 `` if the
134+ category is matching or ``0.0 `` if not.
135+
136+ The enriched iris feature set would hence be in this case:
137+
138+
139+ :Features in the extended Iris dataset:
140+
141+ 0. sepal length in cm
142+ 1. sepal width in cm
143+ 2. petal length in cm
144+ 3. petal width in cm
145+ 4. color#purple (1.0 or 0.0)
146+ 5. color#blue (1.0 or 0.0)
147+ 6. color#red (1.0 or 0.0)
148+
125149
126150Extracting features from unstructured data
127151------------------------------------------
@@ -184,8 +208,7 @@ How to evaluate the quality of feature extraction strategy
184208----------------------------------------------------------
185209
186210The rule of thumb is two samples that seem close or related to
187-
188- And conversely, samples that seem close in
211+ TODO
189212
190213
191214Supervised Learning: ``model.fit(X, y) ``
0 commit comments