Skip to content

LSTM language model baseline gap #95

@Stonesjtu

Description

@Stonesjtu

The test ppl didn't reach the ppl of 113 as documented.

System

GTX 1070
Driver Version: 367.57
cuDNN: 5
CUDA: 8.0
Intel i7 3770

| epoch   1 |   200/ 2323 batches | lr 20.00 | ms/batch 15.86 | loss  6.78 | ppl   883.54
| epoch   1 |   400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  6.11 | ppl   451.70
| epoch   1 |   600/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  5.81 | ppl   332.98
| epoch   1 |   800/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  5.65 | ppl   283.32
| epoch   1 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.53 | ppl   252.06
| epoch   1 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.47 | loss  5.45 | ppl   232.68
| epoch   1 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.29 | ppl   197.84
| epoch   1 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.40 | loss  5.27 | ppl   193.50
| epoch   1 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  5.26 | ppl   192.84
| epoch   1 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.52 | loss  5.11 | ppl   165.52
| epoch   1 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.00 | ppl   149.01
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 24.19s | valid loss  5.15 | valid ppl   172.34
-----------------------------------------------------------------------------------------
| epoch   2 |   200/ 2323 batches | lr 20.00 | ms/batch  9.50 | loss  5.01 | ppl   150.18
| epoch   2 |   400/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  5.07 | ppl   159.75
| epoch   2 |   600/ 2323 batches | lr 20.00 | ms/batch  9.48 | loss  4.97 | ppl   143.50
| epoch   2 |   800/ 2323 batches | lr 20.00 | ms/batch  9.71 | loss  4.92 | ppl   137.16
| epoch   2 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.92 | ppl   136.96
| epoch   2 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.89 | ppl   133.62
| epoch   2 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.78 | ppl   118.79
| epoch   2 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.83 | ppl   125.03
| epoch   2 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.87 | ppl   130.80
| epoch   2 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.69 | ppl   109.35
| epoch   2 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.64 | ppl   103.29
-----------------------------------------------------------------------------------------
| end of epoch   2 | time: 22.96s | valid loss  4.96 | valid ppl   142.18
-----------------------------------------------------------------------------------------
| epoch   3 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.67 | ppl   106.62
| epoch   3 |   400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.79 | ppl   120.30
| epoch   3 |   600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.68 | ppl   107.72
| epoch   3 |   800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.65 | ppl   104.60
| epoch   3 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.67 | ppl   106.95
| epoch   3 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.66 | ppl   105.12
| epoch   3 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.55 | ppl    94.70
| epoch   3 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.62 | ppl   101.98
| epoch   3 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.68 | ppl   108.26
| epoch   3 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.48 | ppl    88.55
| epoch   3 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.45 | ppl    85.87
-----------------------------------------------------------------------------------------
| end of epoch   3 | time: 22.89s | valid loss  4.90 | valid ppl   133.71
-----------------------------------------------------------------------------------------
| epoch   4 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.48 | ppl    88.58
| epoch   4 |   400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.63 | ppl   102.72
| epoch   4 |   600/ 2323 batches | lr 20.00 | ms/batch  9.48 | loss  4.52 | ppl    91.82
| epoch   4 |   800/ 2323 batches | lr 20.00 | ms/batch  9.58 | loss  4.50 | ppl    89.90
| epoch   4 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.57 | loss  4.53 | ppl    92.52
| epoch   4 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.59 | loss  4.52 | ppl    91.63
| epoch   4 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.42 | ppl    82.96
| epoch   4 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.50 | ppl    90.31
| epoch   4 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.57 | ppl    96.44
| epoch   4 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.37 | ppl    78.93
| epoch   4 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.34 | ppl    77.00
-----------------------------------------------------------------------------------------
| end of epoch   4 | time: 23.00s | valid loss  4.89 | valid ppl   133.30
-----------------------------------------------------------------------------------------
| epoch   5 |   200/ 2323 batches | lr 20.00 | ms/batch  9.47 | loss  4.38 | ppl    79.91
| epoch   5 |   400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.53 | ppl    92.42
| epoch   5 |   600/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.42 | ppl    83.08
| epoch   5 |   800/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.40 | ppl    81.46
| epoch   5 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.44 | ppl    84.81
| epoch   5 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.44 | ppl    84.47
| epoch   5 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.34 | ppl    76.87
| epoch   5 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.42 | ppl    83.43
| epoch   5 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.49 | ppl    89.41
| epoch   5 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.30 | ppl    73.41
| epoch   5 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  4.28 | ppl    71.96
-----------------------------------------------------------------------------------------
| end of epoch   5 | time: 22.90s | valid loss  4.89 | valid ppl   132.54
-----------------------------------------------------------------------------------------
| epoch   6 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.32 | ppl    74.99
| epoch   6 |   400/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  4.47 | ppl    87.01
| epoch   6 |   600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.36 | ppl    77.89
| epoch   6 |   800/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.34 | ppl    76.46
| epoch   6 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.38 | ppl    79.95
| epoch   6 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.53 | loss  4.37 | ppl    79.05
| epoch   6 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.29 | ppl    72.78
| epoch   6 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.37 | ppl    79.35
| epoch   6 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.44 | ppl    84.42
| epoch   6 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.24 | ppl    69.63
| epoch   6 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.23 | ppl    68.58
-----------------------------------------------------------------------------------------
| end of epoch   6 | time: 22.92s | valid loss  4.89 | valid ppl   132.85
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss  4.86 | test ppl   128.44

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions