@@ -227,67 +227,6 @@ alpha parameter, the fewer features selected.
227227 Processing Magazine [120] July 2007
228228 http://dsp.rice.edu/sites/dsp.rice.edu/files/cs/baraniukCSlecture07.pdf
229229
230- .. _randomized_l1 :
231-
232- Randomized sparse models
233- -------------------------
234-
235- .. currentmodule :: sklearn.linear_model
236-
237- In terms of feature selection, there are some well-known limitations of
238- L1-penalized models for regression and classification. For example, it is
239- known that the Lasso will tend to select an individual variable out of a group
240- of highly correlated features. Furthermore, even when the correlation between
241- features is not too high, the conditions under which L1-penalized methods
242- consistently select "good" features can be restrictive in general.
243-
244- To mitigate this problem, it is possible to use randomization techniques such
245- as those presented in [B2009 ]_ and [M2010 ]_. The latter technique, known as
246- stability selection, is implemented in the module :mod: `sklearn.linear_model `.
247- In the stability selection method, a subsample of the data is fit to a
248- L1-penalized model where the penalty of a random subset of coefficients has
249- been scaled. Specifically, given a subsample of the data
250- :math: `(x_i, y_i), i \in I`, where :math: `I \subset \{ 1 , 2 , \ldots , n\}` is a
251- random subset of the data of size :math: `n_I`, the following modified Lasso
252- fit is obtained:
253-
254- .. math :: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j},
255-
256- where :math: `s_j \in \{ s, 1 \}` are independent trials of a fair Bernoulli
257- random variable, and :math: `0 <s<1 ` is the scaling factor. By repeating this
258- procedure across different random subsamples and Bernoulli trials, one can
259- count the fraction of times the randomized procedure selected each feature,
260- and used these fractions as scores for feature selection.
261-
262- :class: `RandomizedLasso ` implements this strategy for regression
263- settings, using the Lasso, while :class: `RandomizedLogisticRegression ` uses the
264- logistic regression and is suitable for classification tasks. To get a full
265- path of stability scores you can use :func: `lasso_stability_path `.
266-
267- .. figure :: ../auto_examples/linear_model/images/sphx_glr_plot_sparse_recovery_003.png
268- :target: ../auto_examples/linear_model/plot_sparse_recovery.html
269- :align: center
270- :scale: 60
271-
272- Note that for randomized sparse models to be more powerful than standard
273- F statistics at detecting non-zero features, the ground truth model
274- should be sparse, in other words, there should be only a small fraction
275- of features non zero.
276-
277- .. topic :: Examples:
278-
279- * :ref: `sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py `: An example
280- comparing different feature selection approaches and discussing in
281- which situation each approach is to be favored.
282-
283- .. topic :: References:
284-
285- .. [B2009 ] F. Bach, "Model-Consistent Sparse Estimation through the
286- Bootstrap." https://hal.inria.fr/hal-00354771/
287-
288- .. [M2010 ] N. Meinshausen, P. Buhlmann, "Stability selection",
289- Journal of the Royal Statistical Society, 72 (2010)
290- http://arxiv.org/pdf/0809.2932.pdf
291230
292231Tree-based feature selection
293232----------------------------
0 commit comments