You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -24,19 +24,32 @@ If you use these models in your research, please cite:
24
24
year = {2015}
25
25
}
26
26
27
-
### Disclaimer and Known Issues
27
+
### Disclaimer and known issues
28
28
29
-
0. These models are converted from our own implementation to a recent version of Caffe. There might be numerical differences.
29
+
0. These models are converted from our own implementation to a recent version of Caffe (2016/2/3, b590f1d). The numerical results using this code are as in the tables below.
30
30
0. These models are for the usage of testing or fine-tuning.
31
31
0. These models were **not** trained using this version of Caffe.
32
32
0. If you want to train these models using this version of Caffe without modifications, please notice that:
33
33
- GPU memory might be insufficient for extremely deep models.
34
-
- Implementation of data augmentation might be different (see our paper about the data augmentation we used).
35
34
- Changes of mini-batch size should impact accuracy (we use a mini-batch of 256 images on 8 GPUs, that is, 32 images per GPU).
36
-
0. In our BN layers, the provided mean and variance are strictly computed using average (**not** moving average) on a sufficiently large training batch after the training procedure. Using moving average might lead to different results.
35
+
- Implementation of data augmentation might be different (see our paper about the data augmentation we used).
36
+
- There might be some other untested issues.
37
+
0. In our BN layers, the provided mean and variance are strictly computed using average (**not** moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.
37
38
0. In the BN paper, the BN layer learns gamma/beta. To implement BN in this version of Caffe, we use its provided "batch_norm_layer" (which has no gamma/beta learned) followed by "scale_layer" (which learns gamma/beta).
38
39
0. We use Caffe's implementation of SGD: W := momentum\*W + lr\*g. **If you want to port these models to other libraries (e.g., Torch), please pay careful attention to the possibly different implementation of SGD**: W := momentum\*W + (1-momentum)\*lr\*g, which changes the effective learning rates.
0 commit comments