L1-norm and L2-norm regularization doc #3586

wschin · 2019-04-25T20:20:29Z

shmoradims · 2019-04-25T20:39:27Z

docs/api-reference/algo-details-sdca.md

+This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) to formulate the optimized problem built upon collected data.
+If the training data does not contain enough data points (for example, to train a linear model in $n$-dimensional space, we at least need $n$ data points),
+(overfitting)(https://en.wikipedia.org/wiki/Overfitting) may happen so the trained model is good at describing training data but may fail to predict correct results in unseen events.
+[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measureed by [norm function](https://en.wikipedia.org/wiki/Norm_(mathematics))) of model parameters.


measureed [](start = 167, length = 9)

typo: measured #Resolved

shmoradims · 2019-04-25T20:40:23Z

docs/api-reference/algo-details-sdca.md

+If the training data does not contain enough data points (for example, to train a linear model in $n$-dimensional space, we at least need $n$ data points),
+(overfitting)(https://en.wikipedia.org/wiki/Overfitting) may happen so the trained model is good at describing training data but may fail to predict correct results in unseen events.
+[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measureed by [norm function](https://en.wikipedia.org/wiki/Norm_(mathematics))) of model parameters.
+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.


penalizing [](start = 115, length = 10)

penalizes #Resolved

ganik · 2019-04-25T20:41:32Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    /// Using L1-norm can increase sparsity of the trained $\textbf{w}_c$.
-    /// When working with high-dimensional data, it shrinks small weights of irrelevant features to 0 and therefore no resource will be spent on those bad features when making prediction.
-    /// L2-norm regularization is preferable for data that is not sparse and it largely penalizes the existence of large weights.
+    /// Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.


Togehter [](start = 8, length = 8)

typo #Resolved

ganik · 2019-04-25T20:42:11Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    /// Using L1-norm can increase sparsity of the trained $\textbf{w}_c$.
-    /// When working with high-dimensional data, it shrinks small weights of irrelevant features to 0 and therefore no resource will be spent on those bad features when making prediction.
-    /// L2-norm regularization is preferable for data that is not sparse and it largely penalizes the existence of large weights.
+    /// Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.


Togehter [](start = 8, length = 8)

typo #Resolved

shmoradims · 2019-04-25T20:42:44Z

docs/api-reference/algo-details-sdca.md

+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimention and sparse data set, if user carefully select the coefficient of L1-norm, it is possible to achieve a good prediction quality with a model with a few of non-zeros (e.g., 1% values) in $\textbf{w}_1,\dots,\textbf{w}_m$ without affecting its .


. [](start = 258, length = 2)

missing a word #Resolved

ganik · 2019-04-25T20:43:11Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    /// Using L1-norm can increase sparsity of the trained $\textbf{w}_c$.
-    /// When working with high-dimensional data, it shrinks small weights of irrelevant features to 0 and therefore no resource will be spent on those bad features when making prediction.
-    /// L2-norm regularization is preferable for data that is not sparse and it largely penalizes the existence of large weights.
+    /// Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.


Togehter [](start = 8, length = 8)

uses #Pending

I guess you mean users.

In reply to: 278723627 [](ancestors = 278723627)

ganik · 2019-04-25T20:43:40Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    /// An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.
-    ///
-    /// This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+    /// This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) to formulate the optimized problem built upon collected data.


optimized [](start = 129, length = 9)

optimization ? #Resolved

ganik · 2019-04-25T20:44:35Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    /// This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization): a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+    /// This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) to formulate the optimized problem built upon collected data.
+    /// If the training data does not contain enough data points (for example, to train a linear model in $n$-dimensional space, we at least need $n$ data points),
+    /// (overfitting)(https://en.wikipedia.org/wiki/Overfitting) may happen so the trained model is good at describing training data but may fail to predict correct results in unseen events.


so [](start = 76, length = 2)

when #Resolved

It should be so that.

In reply to: 278724124 [](ancestors = 278724124)

shmoradims · 2019-04-25T20:45:59Z

docs/api-reference/algo-details-sdca.md

+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimention and sparse data set, if user carefully select the coefficient of L1-norm, it is possible to achieve a good prediction quality with a model with a few of non-zeros (e.g., 1% values) in $\textbf{w}_1,\dots,\textbf{w}_m$ without affecting its .


dimention [](start = 9, length = 9)

dimension

please use spell checker plugin #Resolved

It doesn't show anything.. It was ok yesterday. Let me try Vim's.

In reply to: 278724688 [](ancestors = 278724688)

you're right, the plugin is useless in markdown sections. not sure it there's option to make it look at those sections. You might be able to temporarily remove the CDATA tags, and see if typos show up like the way they do in
#Resolved

shmoradims · 2019-04-25T20:47:52Z

docs/api-reference/algo-details-sdca.md

+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimention and sparse data set, if user carefully select the coefficient of L1-norm, it is possible to achieve a good prediction quality with a model with a few of non-zeros (e.g., 1% values) in $\textbf{w}_1,\dots,\textbf{w}_m$ without affecting its .


with a few of non-zeros [](start = 158, length = 23)

that has only a few non-zero weights #Resolved

shmoradims · 2019-04-25T20:48:33Z

docs/api-reference/algo-details-sdca.md

+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimention and sparse data set, if user carefully select the coefficient of L1-norm, it is possible to achieve a good prediction quality with a model with a few of non-zeros (e.g., 1% values) in $\textbf{w}_1,\dots,\textbf{w}_m$ without affecting its .


values [](start = 192, length = 6)

1% of weights #Resolved

shmoradims · 2019-04-25T20:52:53Z

docs/api-reference/algo-details-sdca.md

+(overfitting)(https://en.wikipedia.org/wiki/Overfitting) may happen so the trained model is good at describing training data but may fail to predict correct results in unseen events.
+[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measureed by [norm function](https://en.wikipedia.org/wiki/Norm_(mathematics))) of model parameters.
+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.


add blank line for new paragraph. #Resolved

shmoradims · 2019-04-25T20:54:23Z

docs/api-reference/algo-details-sdca.md

+[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measureed by [norm function](https://en.wikipedia.org/wiki/Norm_(mathematics))) of model parameters.
+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization) which penalizing a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations.
+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+Togehter with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the $\textbf{w}_1,\dots,\textbf{w}_m$.


$\textbf{w}_1,\dots,\textbf{w}_m$ [](start = 110, length = 33)

let's call this something like, model weights, or model parameters, and not repeat it below. #Resolved

shmoradims · 2019-04-25T20:56:17Z

docs/api-reference/algo-details-sdca.md

-values to the error of the hypothesis. An accurate model with extreme
-coefficient values would be penalized more, but a less accurate model with more
-conservative values would be penalized less.
+This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) to formulate the optimized problem built upon collected data.


This class use empricial risk minimization to formulat [](start = 0, length = 115)

please limit the line width, so that it's easy to review without going left and right. it's also a good practice for viewing the file on github. #Resolved

najeeb-kazmi · 2019-04-25T23:08:29Z

docs/api-reference/regularization-l1-l2.md

+This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization)
+to formulate the optimization problem built upon collected data.
+If the training data does not contain enough data points
+(for example, to train a linear model in $n$-dimensional space, we at least need $n$ data points),


at least need $n$ [](start = 67, length = 17)

need at least $n$ #Resolved

najeeb-kazmi · 2019-04-25T23:09:12Z

docs/api-reference/regularization-l1-l2.md

+[Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) is a common technique to alleviate
+such a phenomenon by penalizing the magnitude (usually measured by
+[norm function](https://en.wikipedia.org/wiki/Norm_(mathematics))) of model parameters.
+This trainer supports [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization)


) [](start = 107, length = 1)

Add comma after ) (in general whenever the next word is "which") #Resolved

najeeb-kazmi · 2019-04-25T23:11:40Z

docs/api-reference/regularization-l1-l2.md

+Sometimes, using L2-norm leads to a better prediction quality, so users may still want to try it and fine tune the coefficients of L1-norm and L2-norm.
+Note that conceptually, using L1-norm implies that the distribution of all model parameters is a
+[Laplace distribution](https://en.wikipedia.org/wiki/Laplace_distribution) while
+L2-norm means that a [Gaussian distribution](https://en.wikipedia.org/wiki/Normal_distribution) for them.


means that a [](start = 8, length = 12)

"assumes a Gaussian distribution" or "implies a Gaussian distribution" #Resolved

najeeb-kazmi · 2019-04-25T23:12:16Z

docs/api-reference/regularization-l1-l2.md

+L2-norm means that a [Gaussian distribution](https://en.wikipedia.org/wiki/Normal_distribution) for them.
+
+An aggressive regularization (that is, assigning large coefficients to L1-norm or L2-norm regularization terms)
+can harm predictive capacity by excluding important variables out of the model.


out of the model [](start = 62, length = 16)

from the model #Resolved

najeeb-kazmi · 2019-04-25T23:13:50Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

@@ -59,22 +59,12 @@ namespace Microsoft.ML.Trainers
    /// In other cases, the output score vector is just $[\hat{y}^1, \dots, \hat{y}^m]$.
    ///
    /// ### Training Algorithm Details
-    /// The optimization algorithm is an extension of (http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf) following a similar path proposed in an earlier [paper](https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf).
-    /// It is usually much faster than [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) and [truncated Newton methods](https://en.wikipedia.org/wiki/Truncated_Newton_method) for large-scale and sparse data set.
+    /// The optimization algorithm is an extension of (http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf)


(http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf) [](start = 54, length = 73)

Maybe give this link some display text [like this] ? #Resolved

najeeb-kazmi · 2019-04-25T23:14:16Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

+    /// The optimization algorithm is an extension of (http://jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf)
+    /// following a similar path proposed in an earlier [paper](https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf).
+    /// It is usually much faster than [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) and
+    /// [truncated Newton methods](https://en.wikipedia.org/wiki/Truncated_Newton_method) for large-scale and sparse data set.


data set [](start = 117, length = 8)

data sets #Resolved

najeeb-kazmi · 2019-04-25T23:18:20Z

docs/api-reference/regularization-l1-l2.md

+to formulate the optimization problem built upon collected data.
+If the training data does not contain enough data points
+(for example, to train a linear model in $n$-dimensional space, we at least need $n$ data points),
+(overfitting)(https://en.wikipedia.org/wiki/Overfitting) may happen so that


(overfitting) [](start = 0, length = 13)

square brackets [overfitting] #Resolved

najeeb-kazmi · 2019-04-25T23:20:41Z

docs/api-reference/regularization-l1-l2.md

+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+
+Together with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the model weights, $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimension and sparse data set, if users carefully select the coefficient of L1-norm,


high-dimension [](start = 4, length = 14)

"high-dimensional" perhaps? #Resolved

najeeb-kazmi · 2019-04-25T23:20:54Z

docs/api-reference/regularization-l1-l2.md

+L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.
+
+Together with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the model weights, $\textbf{w}_1,\dots,\textbf{w}_m$.
+For high-dimension and sparse data set, if users carefully select the coefficient of L1-norm,


data set [](start = 30, length = 8)

data sets #Resolved

docs/api-reference/regularization-l1-l2.md

najeeb-kazmi

docs/api-reference/regularization-l1-l2.md

natke · 2019-04-26T00:27:40Z

docs/api-reference/regularization-l1-l2.md

@@ -0,0 +1,27 @@
+This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization)


Why is this in the l1-norm and l2-norm regularization include? #Resolved

We are optimize regularized ERM. ER is also known as loss function. #Resolved

I will add Note that empricial risk is usually measured by applying a loss function on the model's predictions on collected data points.

In reply to: 278778511 [](ancestors = 278778511)

docs/api-reference/regularization-l1-l2.md

natke · 2019-04-26T00:39:48Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    ///
-    /// An aggressive regularization (that is, assigning large coefficients to L1-norm or L2-norm regularization terms) can harm predictive capacity by excluding important variables out of the model.
-    /// Therefore, choosing the right regularization coefficients is important in practice.
+    /// [!include[regularization](~/../docs/samples/docs/api-reference/regularization-l1-l2.md)]


Why are we including this in the base class definition for multiclass and in the derived classes for binary classification? #ByDesign

It's a common behavior shared by all derived classes. #Resolved

But it is inconsistent between the multiclass and binary ... #Resolved

Why? You mean their doc contents are different?

In reply to: 279005984 [](ancestors = 279005984)

The reason is that they are written by different persons. Multiclass' XML doc is referenced in derived classes' XML docs, so there is no difference actually. I honestly don't have much time for writing style.

In reply to: 278777687 [](ancestors = 278777687)

Co-Authored-By: wschin <[email protected]>

codecov · 2019-04-26T01:36:29Z

Codecov Report

❗ No coverage uploaded for pull request base (master@51b10fc). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #3586   +/-   ##
=========================================
  Coverage          ?   72.76%           
=========================================
  Files             ?      808           
  Lines             ?   145458           
  Branches          ?    16244           
=========================================
  Hits              ?   105844           
  Misses            ?    35191           
  Partials          ?     4423

Flag	Coverage Δ
#Debug	`72.76% <ø> (?)`
#production	`68.27% <ø> (?)`
#test	`89.04% <ø> (?)`

Impacted Files	Coverage Δ
...LogisticRegression/MulticlassLogisticRegression.cs	`67.61% <ø> (ø)`
...oft.ML.StandardTrainers/Standard/SdcaMulticlass.cs	`91.12% <ø> (ø)`
...oft.ML.StandardTrainers/Standard/SdcaRegression.cs	`95.83% <ø> (ø)`
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs	`72.95% <ø> (ø)`

natke · 2019-04-26T15:49:43Z

src/Microsoft.ML.StandardTrainers/Standard/SdcaMulticlass.cs

-    ///
-    /// An aggressive regularization (that is, assigning large coefficients to L1-norm or L2-norm regularization terms) can harm predictive capacity by excluding important variables out of the model.
-    /// Therefore, choosing the right regularization coefficients is important in practice.
+    /// [!include[regularization](~/../docs/samples/docs/api-reference/regularization-l1-l2.md)]


But it is inconsistent between the multiclass and binary ... #Resolved

natke · 2019-04-26T15:58:47Z

docs/api-reference/regularization-l1-l2.md

@@ -0,0 +1,27 @@
+This class uses [empirical risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization)


It's not clear how this connects with l1-norm and l2-norm regularization. Let me know if you want to discuss this offline. #ByDesign

ERM ---> without enough data ---> overfit ---> use regularization to overcome overfit.

In reply to: 279009294 [](ancestors = 279009294)

shmoradims · 2019-05-16T21:58:47Z

docs/api-reference/regularization-l1-l2.md

@@ -0,0 +1,27 @@
+This class uses [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) (i.e., ERM)


This [](start = 0, length = 5)

Please add a header '### Regularization' so that the following text becomes a separate section. Also move it after all the algo details.
#ByDesign

No. I don't only mean regularization. It is a brief introduction to the whole optimization problem.

In reply to: 284914354 [](ancestors = 284914354)

shmoradims · 2019-05-16T22:01:00Z

docs/api-reference/algo-details-sdca-refs.md

+  Minimization.](http://www.jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf)
+
+
+Check the See Also section for links to examples of the usage.


you don't need this file. you can just move it to the end of algo-details.sdca.md. It's ok that regularization details come after this, b/c it will be a separate section. #ByDesign

It looks super strange... Regularization is the reason why SDCA can exist. As you may have known, SDCA solves "Dual form" of the original optimization problem. Without regularization, that dual form may not exist. Consequently, user should learn regularization before jumping into details of SDCA.

In reply to: 284914930 [](ancestors = 284914930)

shmoradims

L1-norm and L2-norm regularization doc

1cb88a0

wschin requested review from codemzs, natke and shmoradims April 25, 2019 20:20

shmoradims reviewed Apr 25, 2019

View reviewed changes

ganik reviewed Apr 25, 2019

View reviewed changes

shmoradims reviewed Apr 25, 2019

View reviewed changes

ganik reviewed Apr 25, 2019

View reviewed changes

shmoradims reviewed Apr 25, 2019

View reviewed changes

Address comments

ad3d98e

wschin requested review from shmoradims and ganik and removed request for shmoradims April 25, 2019 21:35

wschin added 2 commits April 25, 2019 15:58

Shared content

a05da4a

reuse content

31eaa2d

najeeb-kazmi reviewed Apr 25, 2019

View reviewed changes

Address comments

b7c88c6

najeeb-kazmi reviewed Apr 26, 2019

View reviewed changes

docs/api-reference/regularization-l1-l2.md Outdated Show resolved Hide resolved

najeeb-kazmi reviewed Apr 26, 2019

View reviewed changes

docs/api-reference/regularization-l1-l2.md Show resolved Hide resolved

najeeb-kazmi approved these changes Apr 26, 2019

View reviewed changes

natke reviewed Apr 26, 2019

View reviewed changes

natke and others added 3 commits April 25, 2019 17:47

Update docs/api-reference/regularization-l1-l2.md

4cca10d

Co-Authored-By: wschin <[email protected]>

Update docs/api-reference/regularization-l1-l2.md

5c95b53

Co-Authored-By: wschin <[email protected]>

Update regularization-l1-l2.md

8feae96

wschin requested a review from natke April 26, 2019 15:30

natke reviewed Apr 26, 2019

View reviewed changes

wschin requested a review from natke April 26, 2019 17:15

wschin added 2 commits April 26, 2019 16:40

Clear

bf948b6

Merge branch 'l1l2' of github.com:wschin/machinelearning into l1l2

74e81ff

shmoradims reviewed May 16, 2019

View reviewed changes

shmoradims mentioned this pull request May 21, 2019

L1 and L2 Regularization #3356

Closed

wschin added 2 commits May 24, 2019 10:33

Merge branch 'master' into l1l2

2cb578a

Explain ER and loss

0c7d9f9

wschin requested a review from shmoradims May 24, 2019 20:19

shmoradims approved these changes May 28, 2019

View reviewed changes

wschin merged commit a1b5eaa into dotnet:master May 28, 2019

ghost locked as resolved and limited conversation to collaborators Mar 22, 2022

		@@ -0,0 +1,27 @@
		This class use [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization)

		@@ -0,0 +1,27 @@
		This class uses [empirical risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization)

		@@ -0,0 +1,27 @@
		This class uses [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) (i.e., ERM)

		Minimization.](http://www.jmlr.org/papers/volume14/shalev-shwartz13a/shalev-shwartz13a.pdf)


		Check the See Also section for links to examples of the usage.

L1-norm and L2-norm regularization doc #3586

L1-norm and L2-norm regularization doc #3586

Conversation

wschin commented Apr 25, 2019

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

ganik Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

ganik Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

ganik Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganik Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

ganik Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

wschin Apr 25, 2019 • edited Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi Apr 25, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

najeeb-kazmi left a comment

Choose a reason for hiding this comment

natke Apr 26, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

wschin Apr 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natke Apr 26, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

wschin Apr 26, 2019 • edited Loading

Choose a reason for hiding this comment

natke Apr 26, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 26, 2019 • edited Loading

Codecov Report

natke Apr 26, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

natke Apr 26, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmoradims May 16, 2019 • edited by wschin Loading

Choose a reason for hiding this comment

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

ganik Apr 25, 2019 •

edited by wschin

Loading

ganik Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

ganik Apr 25, 2019 •

edited by wschin

Loading

ganik Apr 25, 2019 •

edited by wschin

Loading

ganik Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

wschin Apr 25, 2019 •

edited

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

shmoradims Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

najeeb-kazmi Apr 25, 2019 •

edited by wschin

Loading

natke Apr 26, 2019 •

edited by wschin

Loading

wschin Apr 26, 2019 •

edited

Loading

natke Apr 26, 2019 •

edited by wschin

Loading

wschin Apr 26, 2019 •

edited

Loading

natke Apr 26, 2019 •

edited by wschin

Loading

codecov bot commented Apr 26, 2019 •

edited

Loading

natke Apr 26, 2019 •

edited by wschin

Loading

natke Apr 26, 2019 •

edited by wschin

Loading

shmoradims May 16, 2019 •

edited by wschin

Loading

shmoradims May 16, 2019 •

edited by wschin

Loading