Skip to content

Commit 9c1458e

Browse files
authored
[200708] Edit invisible equation
1 parent 74377c6 commit 9c1458e

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

DETR/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,27 +17,27 @@
1717

1818
- <img src="https://latex.codecogs.com/svg.latex?\;\mathcal{L}_{match}(y_i, \hat{y}_{\sigma(i)})" title="\mathcal{L}_{match}(y_i, \hat{y}_{\sigma(i)})" /> : a pair-wise *matching cost* between ground truth <img src="https://latex.codecogs.com/svg.latex?\;y_i" title="y_i" /> and a prediction with index <img src="https://latex.codecogs.com/svg.latex?\;\sigma(i)" title="\sigma(i)" />
1919
- Hungarian algorithm [[arxiv]](https://arxiv.org/abs/1506.04878)
20-
- <img src="https://latex.codecogs.com/svg.latex?\;\mathcal{L}_{match}(y_i, \hat{y}_{\sigma(i)}) = \mathbb{1}_{\{c_{i} \neq \emptyset\}} \hat{p}_{\sigma(i)}(c_i) + \mathbb{1}_{c_{i} \neq \emptyset} \mathcal{L}_{box}(b_i, \hat{b}_{\sigma(i)})" title="\mathcal{L}_{match}(y_i, \hat{y}_{\sigma(i)}) = \mathbb{1}_{\{c_{i} \neq \emptyset\}} \hat{p}_{\sigma(i)}(c_i) + \mathbb{1}_{c_{i} \neq \emptyset} \mathcal{L}_{box}(b_i, \hat{b}_{\sigma(i)})" />
20+
- <img src="https://latex.codecogs.com/svg.latex?\;\mathcal{L}_{match}(y_i,\hat{y}_{\sigma(i)})=\mathbb{1}_{\{c_{i}\neq\emptyset\}}\hat{p}_{\sigma(i)}(c_i)+\mathbb{1}_{c_{i}\neq\emptyset}\mathcal{L}_{box}(b_i,\hat{b}_{\sigma(i)})" title="\mathcal{L}_{match}(y_i,\hat{y}_{\sigma(i)})=\mathbb{1}_{\{c_{i}\neq\emptyset\}}\hat{p}_{\sigma(i)}(c_i)+\mathbb{1}_{c_{i}\neq\emptyset}\mathcal{L}_{box}(b_i,\hat{b}_{\sigma(i)})" />
2121
- *Hungarian loss*
2222
<p align="center"><img width="100%" src="img/eq2.PNG" /></p>
2323

2424
- <img src="https://latex.codecogs.com/svg.latex?\;\hat{\sigma}" title="\hat{\sigma}" /> : the optimal assignment computed in the first step (eq.1)
2525
### Bounding box loss
2626
- A linear combination of the <img src="https://latex.codecogs.com/svg.latex?\;l_1" title="l_1" /> loss
2727
- The generalized IoU loss
28-
- <img src="https://latex.codecogs.com/svg.latex?\;\mathcal{L}_{box}(b_i, \hat{b}_{\sigma(i)}) = \lambda_{iou} \mathcal{L}_{iou} (b_i, \hat{b}_{\sigma(i)}) + \lambda_{L1} ||b_i - \hat{b}_{\sigma(i)}||_1" title="\mathcal{L}_{box}(b_i, \hat{b}_{\sigma(i)}) = \lambda_{iou} \mathcal{L}_{iou} (b_i, \hat{b}_{\sigma(i)}) + \lambda_{L1} ||b_i - \hat{b}_{\sigma(i)}||_1" />
28+
- <img src="https://latex.codecogs.com/svg.latex?\;\mathcal{L}_{box}(b_i,\hat{b}_{\sigma(i)})=\lambda_{iou}\mathcal{L}_{iou}(b_i,\hat{b}_{\sigma(i)})+\lambda_{L1}||b_i-\hat{b}_{\sigma(i)}||_1" title="\mathcal{L}_{box}(b_i,\hat{b}_{\sigma(i)})=\lambda_{iou}\mathcal{L}_{iou}(b_i,\hat{b}_{\sigma(i)})+\lambda_{L1}||b_i-\hat{b}_{\sigma(i)}||_1" />
2929

3030
## DETR architecture
3131
<p align="center"><img width="100%" src="img/fig2.PNG" /></p>
3232

3333
### Backbone
34-
- <img src="https://latex.codecogs.com/svg.latex?\;x_{img} \in \mathbb{R}^{3 \times H_0 \times W_0}" title="x_{img} \in \mathbb{R}^{3 \times H_0 \times W_0}" /> : the initial image
35-
- <img src="https://latex.codecogs.com/svg.latex?\;f \in \mathbb{R}^{C \times H \times W}" title="f \in \mathbb{R}^{C \times H \times W}" /> : a lower-resolution activation map
36-
- <img src="https://latex.codecogs.com/svg.latex?\;C=2048\ \text{and}\ H,W = \frac{H_0}{32}, \frac{W_0}{32}" title="C=2048\ \text{and}\ H,W = \frac{H_0}{32}, \frac{W_0}{32}" />
34+
- <img src="https://latex.codecogs.com/svg.latex?\;x_{img}\in\mathbb{R}^{3\times{H_0}\times{W_0}}" title="x_{img}\in\mathbb{R}^{3\times{H_0}\times{W_0}}" /> : the initial image
35+
- <img src="https://latex.codecogs.com/svg.latex?\;f\in\mathbb{R}^{C\times{H}\times{W}}" title="f\in\mathbb{R}^{C\times{H}\times{W}}" /> : a lower-resolution activation map
36+
- <img src="https://latex.codecogs.com/svg.latex?\;C=2048" title="C=2048" /> and <img src="https://latex.codecogs.com/svg.latex?\;{H},{W}=\frac{H_0}{32},\frac{W_0}{32}" title="{H},{W}=\frac{H_0}{32},\frac{W_0}{32}" />
3737

3838
### Transformer encoder
39-
- A 1x1 convolution reduces the channel dimension of the high-level activation map <img src="https://latex.codecogs.com/svg.latex?\;f" title="f" /> from <img src="https://latex.codecogs.com/svg.latex?\;C" title="C" /> to a smaller dimension <img src="https://latex.codecogs.com/svg.latex?\;d" title="d" /> creating a new feature map <img src="https://latex.codecogs.com/svg.latex?\;z_0 \in \mathbb{R}^{d \times H \times W}" title="z_0 \in \mathbb{R}^{d \times H \times W}" />.
40-
- The spatial dimensions of <img src="https://latex.codecogs.com/svg.latex?\;z_0" title="z_0" /> is collapsed into one dimension, resulting in a <img src="https://latex.codecogs.com/svg.latex?\;d \times HW" title="d \times HW" /> feature map.
39+
- A 1x1 convolution reduces the channel dimension of the high-level activation map <img src="https://latex.codecogs.com/svg.latex?\;f" title="f" /> from <img src="https://latex.codecogs.com/svg.latex?\;C" title="C" /> to a smaller dimension <img src="https://latex.codecogs.com/svg.latex?\;d" title="d" /> creating a new feature map <img src="https://latex.codecogs.com/svg.latex?\;z_0\in\mathbb{R}^{d\times{H}\times{W}}" title="z_0\in\mathbb{R}^{d\times{H}\times{W}}" />.
40+
- The spatial dimensions of <img src="https://latex.codecogs.com/svg.latex?\;z_0" title="z_0" /> is collapsed into one dimension, resulting in a <img src="https://latex.codecogs.com/svg.latex?\;d\times{HW}" title="d\times{HW}" /> feature map.
4141
- Each encoder layer has a standard architecture and consists of a multi-head self-attention module and a feed forward network (FFN).
4242
- Each encoder layer is supplemented with fixed positional encodings that are added to the input of each attention layer.
4343

@@ -52,4 +52,4 @@
5252
- Because of a fixed size set of <img src="https://latex.codecogs.com/svg.latex?\;N" title="N" /> bounding boxes, an additional special class label <img src="https://latex.codecogs.com/svg.latex?\;\emptyset" title="\emptyset" /> is used to represent that no object is detected within a slot. This class plays a similar role to the "background" class in the standard object detection approaches.
5353

5454
### Auxiliary decoding losses
55-
- To help the model output the correct number of objects of each class
55+
- To help the model output the correct number of objects of each class

0 commit comments

Comments
 (0)