Skip to content

Commit 872cc54

Browse files
author
Kaiming He
committed
prototxt
1 parent b4b9c44 commit 872cc54

File tree

7 files changed

+21
-16
lines changed

7 files changed

+21
-16
lines changed

README.md

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ Microsoft Research Asia (MSRA).
66

77
### Table of Contents
88
0. [Introduction](#introduction)
9-
0. [Disclaimer and Known Issues](#disclaimer-and-known-issues)
9+
0. [Disclaimer and known issues](#disclaimer-and-known-issues)
10+
0. [Models](#models)
1011
0. [Results](#results)
11-
0. [Download](#downloads)
12-
0. [Third-party Re-implementations](#third-party-re-implementations)
12+
0. [Third-party re-implementations](#third-party-re-implementations)
1313

1414
### Introduction
1515

@@ -24,19 +24,32 @@ If you use these models in your research, please cite:
2424
year = {2015}
2525
}
2626

27-
### Disclaimer and Known Issues
27+
### Disclaimer and known issues
2828

29-
0. These models are converted from our own implementation to a recent version of Caffe. There might be numerical differences.
29+
0. These models are converted from our own implementation to a recent version of Caffe (2016/2/3, b590f1d). The numerical results using this code are as in the tables below.
3030
0. These models are for the usage of testing or fine-tuning.
3131
0. These models were **not** trained using this version of Caffe.
3232
0. If you want to train these models using this version of Caffe without modifications, please notice that:
3333
- GPU memory might be insufficient for extremely deep models.
34-
- Implementation of data augmentation might be different (see our paper about the data augmentation we used).
3534
- Changes of mini-batch size should impact accuracy (we use a mini-batch of 256 images on 8 GPUs, that is, 32 images per GPU).
36-
0. In our BN layers, the provided mean and variance are strictly computed using average (**not** moving average) on a sufficiently large training batch after the training procedure. Using moving average might lead to different results.
35+
- Implementation of data augmentation might be different (see our paper about the data augmentation we used).
36+
- There might be some other untested issues.
37+
0. In our BN layers, the provided mean and variance are strictly computed using average (**not** moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.
3738
0. In the BN paper, the BN layer learns gamma/beta. To implement BN in this version of Caffe, we use its provided "batch_norm_layer" (which has no gamma/beta learned) followed by "scale_layer" (which learns gamma/beta).
3839
0. We use Caffe's implementation of SGD: W := momentum\*W + lr\*g. **If you want to port these models to other libraries (e.g., Torch), please pay careful attention to the possibly different implementation of SGD**: W := momentum\*W + (1-momentum)\*lr\*g, which changes the effective learning rates.
40+
3941

42+
### Models
43+
44+
0. Visualizations of network structures:
45+
- [ResNet-50] (http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006)
46+
- [ResNet-101] (http://ethereon.github.io/netscope/#/gist/b21e2aae116dc1ac7b50)
47+
- [ResNet-152] (http://ethereon.github.io/netscope/#/gist/d38f3e6091952b45198b)
48+
49+
0. Model files:
50+
- MSR download: [link] (http://research.microsoft.com/en-us/um/people/kahe/resnet/models.zip)
51+
- OneDrive download: [link](https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777)
52+
4053
### Results
4154

4255
0. 1-crop validation error on ImageNet (center 224x224 crop from resized image with shorter side=256):
@@ -56,12 +69,7 @@ If you use these models in your research, please cite:
5669
ResNet-101|21.8%|6.1%
5770
ResNet-152|21.4%|5.7%
5871

59-
### Downloads
60-
61-
- [OneDrive](https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777)
62-
- [BaiduYun](http://pan.baidu.com/s/1o7xQ8Ka)
63-
64-
### Third-party Re-implementations
72+
### Third-party re-implementations
6573

6674
Deep residual networks are very easy to implement and train. We recommend to see also the following third-party re-implementations and extensions:
6775

ResNet-101/README.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

ResNet-152/README.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

ResNet-50/README.md

Lines changed: 0 additions & 1 deletion
This file was deleted.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)