File tree Expand file tree Collapse file tree 1 file changed +10
-10
lines changed
tensorflow_addons/optimizers Expand file tree Collapse file tree 1 file changed +10
-10
lines changed Original file line number Diff line number Diff line change 23
23
24
24
@tf .keras .utils .register_keras_serializable (package = "Addons" )
25
25
class NovoGrad (tf .keras .optimizers .Optimizer ):
26
- """The NovoGrad Optimizer was first proposed in [Stochastic Gradient
27
- Methods with Layerwise Adaptvie Moments for training of Deep
28
- Networks](https://arxiv.org/pdf/1905.11286.pdf)
29
-
30
- NovoGrad is a first-order SGD-based algorithm, which computes second
31
- moments per layer instead of per weight as in Adam. Compared to Adam,
32
- NovoGrad takes less memory, and has been found to be more numerically
33
- stable. More specifically we compute (for more information on the
34
- computation please refer to this
35
- [link](https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html):
26
+ """Optimizer that implements NovoGrad.
27
+
28
+ The NovoGrad Optimizer was first proposed in [Stochastic Gradient
29
+ Methods with Layerwise Adaptive Moments for training of Deep
30
+ Networks](https://arxiv.org/pdf/1905.11286.pdf) NovoGrad is a
31
+ first-order SGD-based algorithm, which computes second moments per
32
+ layer instead of per weight as in Adam. Compared to Adam, NovoGrad
33
+ takes less memory, and has been found to be more numerically stable.
34
+ (For more information on the computation please refer to this
35
+ [link](https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html))
36
36
37
37
Second order moment = exponential moving average of Layer-wise square
38
38
of grads:
You can’t perform that action at this time.
0 commit comments