-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Closed
Description
The test ppl didn't reach the ppl of 113 as documented.
System
GTX 1070
Driver Version: 367.57
cuDNN: 5
CUDA: 8.0
Intel i7 3770
| epoch 1 | 200/ 2323 batches | lr 20.00 | ms/batch 15.86 | loss 6.78 | ppl 883.54
| epoch 1 | 400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 6.11 | ppl 451.70
| epoch 1 | 600/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 5.81 | ppl 332.98
| epoch 1 | 800/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 5.65 | ppl 283.32
| epoch 1 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.53 | ppl 252.06
| epoch 1 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.47 | loss 5.45 | ppl 232.68
| epoch 1 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.29 | ppl 197.84
| epoch 1 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.40 | loss 5.27 | ppl 193.50
| epoch 1 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 5.26 | ppl 192.84
| epoch 1 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.52 | loss 5.11 | ppl 165.52
| epoch 1 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.00 | ppl 149.01
-----------------------------------------------------------------------------------------
| end of epoch 1 | time: 24.19s | valid loss 5.15 | valid ppl 172.34
-----------------------------------------------------------------------------------------
| epoch 2 | 200/ 2323 batches | lr 20.00 | ms/batch 9.50 | loss 5.01 | ppl 150.18
| epoch 2 | 400/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 5.07 | ppl 159.75
| epoch 2 | 600/ 2323 batches | lr 20.00 | ms/batch 9.48 | loss 4.97 | ppl 143.50
| epoch 2 | 800/ 2323 batches | lr 20.00 | ms/batch 9.71 | loss 4.92 | ppl 137.16
| epoch 2 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.92 | ppl 136.96
| epoch 2 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.89 | ppl 133.62
| epoch 2 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.78 | ppl 118.79
| epoch 2 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.83 | ppl 125.03
| epoch 2 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.87 | ppl 130.80
| epoch 2 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.69 | ppl 109.35
| epoch 2 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.64 | ppl 103.29
-----------------------------------------------------------------------------------------
| end of epoch 2 | time: 22.96s | valid loss 4.96 | valid ppl 142.18
-----------------------------------------------------------------------------------------
| epoch 3 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.67 | ppl 106.62
| epoch 3 | 400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.79 | ppl 120.30
| epoch 3 | 600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.68 | ppl 107.72
| epoch 3 | 800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.65 | ppl 104.60
| epoch 3 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.67 | ppl 106.95
| epoch 3 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.66 | ppl 105.12
| epoch 3 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.55 | ppl 94.70
| epoch 3 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.62 | ppl 101.98
| epoch 3 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.68 | ppl 108.26
| epoch 3 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.48 | ppl 88.55
| epoch 3 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.45 | ppl 85.87
-----------------------------------------------------------------------------------------
| end of epoch 3 | time: 22.89s | valid loss 4.90 | valid ppl 133.71
-----------------------------------------------------------------------------------------
| epoch 4 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.48 | ppl 88.58
| epoch 4 | 400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.63 | ppl 102.72
| epoch 4 | 600/ 2323 batches | lr 20.00 | ms/batch 9.48 | loss 4.52 | ppl 91.82
| epoch 4 | 800/ 2323 batches | lr 20.00 | ms/batch 9.58 | loss 4.50 | ppl 89.90
| epoch 4 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.57 | loss 4.53 | ppl 92.52
| epoch 4 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.59 | loss 4.52 | ppl 91.63
| epoch 4 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.42 | ppl 82.96
| epoch 4 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.50 | ppl 90.31
| epoch 4 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.57 | ppl 96.44
| epoch 4 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.37 | ppl 78.93
| epoch 4 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.34 | ppl 77.00
-----------------------------------------------------------------------------------------
| end of epoch 4 | time: 23.00s | valid loss 4.89 | valid ppl 133.30
-----------------------------------------------------------------------------------------
| epoch 5 | 200/ 2323 batches | lr 20.00 | ms/batch 9.47 | loss 4.38 | ppl 79.91
| epoch 5 | 400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.53 | ppl 92.42
| epoch 5 | 600/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.42 | ppl 83.08
| epoch 5 | 800/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.40 | ppl 81.46
| epoch 5 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.44 | ppl 84.81
| epoch 5 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.44 | ppl 84.47
| epoch 5 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.34 | ppl 76.87
| epoch 5 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.42 | ppl 83.43
| epoch 5 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.49 | ppl 89.41
| epoch 5 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.30 | ppl 73.41
| epoch 5 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 4.28 | ppl 71.96
-----------------------------------------------------------------------------------------
| end of epoch 5 | time: 22.90s | valid loss 4.89 | valid ppl 132.54
-----------------------------------------------------------------------------------------
| epoch 6 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.32 | ppl 74.99
| epoch 6 | 400/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 4.47 | ppl 87.01
| epoch 6 | 600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.36 | ppl 77.89
| epoch 6 | 800/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.34 | ppl 76.46
| epoch 6 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.38 | ppl 79.95
| epoch 6 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.53 | loss 4.37 | ppl 79.05
| epoch 6 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.29 | ppl 72.78
| epoch 6 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.37 | ppl 79.35
| epoch 6 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.44 | ppl 84.42
| epoch 6 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.24 | ppl 69.63
| epoch 6 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.23 | ppl 68.58
-----------------------------------------------------------------------------------------
| end of epoch 6 | time: 22.92s | valid loss 4.89 | valid ppl 132.85
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss 4.86 | test ppl 128.44
Metadata
Metadata
Assignees
Labels
No labels