hye-log

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 09_DAY 43 ๋ณธ๋ฌธ

Boostcourse/AI Tech 4๊ธฐ

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 09_DAY 43

iihye_ 2022. 11. 18. 19:02

๐ŸŒฟ ๊ฐœ๋ณ„ํ•™์Šต


[5] 1 Stage Detectors

1. 1 Stage Detectors

1) Background

(1) 2 Stage Detectors

- RCNN, FastRCNN, SPPNet, FasterRCNN, ...

- Localization(ํ›„๋ณด ์˜์—ญ ์ฐพ๊ธฐ) -> Classification(ํ›„๋ณด ์˜์—ญ์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜)

- ์†๋„๊ฐ€ ๋Š๋ฆผ -> real world์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ค์›€

(2) 1 Stage Detectors

- Yolo, SSD, RetinaNet, ...

- Localization & Classification ๋™์‹œ์— ์ง„ํ–‰

- ์ „์ฒด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํŠน์ง• ์ถ”์ถœ, ๊ฐ์ฒด ๊ฒ€์ถœ์ด ์ด๋ฃจ์–ด์ง -> ๊ฐ์ฒด์— ๋Œ€ํ•œ ๋งฅ๋ฝ์  ์ดํ•ด๊ฐ€ ๋†’์Œ

- ์†๋„๊ฐ€ ๋น ๋ฆ„ (real-time detection)

2) History

 

2. YOLO v1

1) Overview

- ์ „์ฒด ์ด๋ฏธ์ง€์—์„œ bounding box + class ์˜ˆ์ธก์„ ๋™์‹œ์— ์ง„ํ–‰

2) Pipeline

(1) GoogleNet์˜ ๋ณ€ํ˜•

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In  Proceedings of the IEEE conference on computer vision and pattern recognition  (pp. 779-788).

- 24๊ฐœ์˜ conv layer ๋กœ ํŠน์ง• ์ถ”์ถœ

- 2๊ฐœ์˜ fully connected layer ๋กœ box ์ขŒํ‘œ๊ฐ’ ๋ฐ ํ™•๋ฅ  ๊ณ„์‚ฐ

(2) ์ˆœ์„œ

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In  Proceedings of the IEEE conference on computer vision and pattern recognition  (pp. 779-788).

- ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ S×S ๊ทธ๋ฆฌ๋“œ ์˜์—ญ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ

- ๊ฐ ๊ทธ๋ฆฌ๋“œ๋งˆ๋‹ค B๊ฐœ์˜ bounding box์™€ confidence score ๊ณ„์‚ฐ

- ๊ฐ ๊ทธ๋ฆฌ๋“œ๋งˆ๋‹ค C๊ฐœ์˜ class์— ๋Œ€ํ•œ ํด๋ž˜์Šค ํ™•๋ฅ  ๊ณ„์‚ฐ

3) Result

- Faster R-CNN์— ๋น„ํ•ด 6๋ฐฐ ๋น ๋ฅธ ์†๋„

- ๋‹ค๋ฅธ real-time detector์— ๋น„ํ•ด 2๋ฐฐ ๋†’์€ ์ •ํ™•๋„

- ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— ํด๋ž˜์Šค์™€ ์‚ฌ์ง„์— ๋Œ€ํ•œ ๋งฅ๋ฝ์  ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ

- ๋ฌผ์ฒด์˜ ์ผ๋ฐ˜ํ™”๋œ ํ‘œํ˜„์„ ํ•™์Šต -> ์ƒˆ๋กœ์šด ๋„๋ฉ”์ธ๊ณผ dataset์—๋„ ์„ฑ๋Šฅ์ด ์ข‹์Œ

 

3.SSD

1) Overview

(1) YOLO์˜ ๋‹จ์ 

- 7×7 ๊ทธ๋ฆฌ๋“œ ์˜์—ญ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋ฆฌ๋“œ๋ณด๋‹ค ์ž‘์€ ํฌ๊ธฐ์˜ ๋ฌผ์ฒด๋Š” ๊ฒ€์ถœํ•˜์ง€ ๋ชปํ•จ

- ๋งˆ์ง€๋ง‰ feature๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์ •ํ™•๋„๊ฐ€ ํ•˜๋ฝํ•จ

(2) YOLO vs SSD

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In  European conference on computer vision  (pp. 21-37). Springer, Cham.

- YOLO : 448×448 / SSD : 300×300

- YOLO : FC layer๋กœ ์ธํ•ด์„œ ์†๋„๊ฐ€ ๋Š๋ ค์ง / SSD : 1×1 conv๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๋„๊ฐ€ ๋น ๋ฆ„

- YOLO : ๋งˆ์ง€๋ง‰ feature์—์„œ detection / SSD : ๋งˆ์ง€๋ง‰ feature map์—์„œ extra feature layer๋ฅผ ์ด์šฉํ•˜์—ฌ detection ์ˆ˜ํ–‰

(3) SSD์˜ ํŠน์ง•

- extra conv layers์—์„œ ๋‚˜์˜จ ๋ชจ๋“  feature map์—์„œ detection ์ˆ˜ํ–‰

-> ํฐ feature map์€ ์ž‘์€ ๋ฌผ์ฒด ํƒ์ง€. ์ž‘์€ feature map์€ ํฐ ๋ฌผ์ฒด ํƒ์ง€

- FC ๋Œ€์‹  conv layer ์‚ฌ์šฉํ•˜์—ฌ ์†๋„ ํ–ฅ์ƒ

- Default box ์‚ฌ์šฉ(anchor box)

2) Pipeline

- VGG-16(backbone) + Extra Conv Layers

- ์ž…๋ ฅ ์ด๋ฏธ์ง€ 300×300

- Multi-scale feature map ์‚ฌ์šฉ

 

4. YOLO Follow-up

1) YOLO v2

(1) Concepts

- ์ •ํ™•๋„(better), ์†๋„(faster), ๋” ๋งŽ์€ ํด๋ž˜์Šค ์˜ˆ์ธก(stronger) ํ–ฅ์ƒ

(2) ์ •ํ™•๋„

- batch normalization

- high resolution classifier : 448×448 ์ด๋ฏธ์ง€๋กœ ์ƒˆ๋กญ๊ฒŒ fine tuning

- convolution with anchor boxes : FC ์ œ๊ฑฐํ•˜๊ณ  anchor box ๋„์ž…

- fine-grained features : early feature map์„ late feature map์— ํ•ฉ์ณ์ฃผ๋Š” passthrough layer

- multi-scale training : ๋‹ค์–‘ํ•œ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ

(3) ์†๋„

- Darknet-19 ์‚ฌ์šฉ

(4) ๋” ๋งŽ์€ ํด๋ž˜์Šค ์˜ˆ์ธก

- Imagenet, Coco ํ•จ๊ป˜ ์‚ฌ์šฉ

- wordtree ๊ตฌ์„ฑ(๊ณ„์ธต์ ์ธ ํŠธ๋ฆฌ)

2) YOLO v3

(1) Darknet-53

- skip connection ์ ์šฉ

- max pooling X, conv stride 2 ์‚ฌ์šฉ

(2) Multi-scale Feature maps

- ์„œ๋กœ ๋‹ค๋ฅธ 3๊ฐœ์˜ scale ์‚ฌ์šฉ

- FPN(Feature Pyramid Network) ์‚ฌ์šฉ

 

5. RetinaNet

1) Overview

(1) 1 Stage Detector Problems

- Class imbalance : ๊ฐ์ฒด ๋ณด๋‹ค ๋ฐฐ๊ฒฝ ์˜์—ญ์ด ๋” ๋งŽ์Œ

- Anchor box ๋Œ€๋ถ€๋ถ„ Negative Samples(background)

2) Concept

(1) ์ƒˆ๋กœ์šด loss function ์ œ์‹œ : cross entropy loss + scailing factor  -> ์‰ฌ์šด ์˜ˆ์ œ์— ์ž‘์€ ๊ฐ€์ค‘์น˜, ์–ด๋ ค์šด ์˜ˆ์ œ์— ํฐ ๊ฐ€์ค‘์น˜

(2) Focal loss

- 1 stage method์˜ ๋‹จ์ ์ด์—ˆ๋˜ ์„ฑ๋Šฅ์—์„œ ํฐ ํ–ฅ์ƒ์„ ์ด๋ฃธ

- Class imbalance๊ฐ€ ์‹ฌํ•œ Dataset์„ ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉ



๐ŸŒฟ ์˜ค๋Š˜์˜ ํšŒ๊ณ 

1 Stage Detector ๊ฐ•์˜ ๋“ค์œผ๋ฉด์„œ ํ•˜๋ฃจ ์‹œ์ž‘! 2 Stage ๋งŒ ๋ณด๋‹ค๊ฐ€ 1 Stage ๋ณด๋‹ˆ๊นŒ ํ™•์‹คํžˆ ๋ชจ๋ธ์ด ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๋Š” ๊ณผ์ •์ด ๊ฐ„๋‹จํ•ด์ ธ์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ค˜๋‹ค๋Š” ์ ์ด ํŠน์ง•์ด๋‹ค. ๋Œ€ํšŒ ๊ด€๋ จ์œผ๋กœ๋Š” mmdetection, detectron ๋“ฑ ๋‹ค์–‘ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์žˆ์ง€๋งŒ, mmdetection์ด config ํŒŒ์ผ๋งŒ ๋ฐ”๊ฟ”์„œ ์‹คํ—˜ํ•  ์ˆ˜ ์žˆ์–ด์„œ ๊ฐ„๋‹จํ•˜๊ณ , ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ์ ์šฉ์‹œ์ผœ๋ณผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— mmdetection์„ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ํ–ˆ๋‹ค. mmdetection์— wandb๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋‹ค๊ฐ€ ์—๋Ÿฌ๊ฐ€ ๋‚˜์„œ.. ํ•ด๊ฒฐํ•˜๋Š๋ผ ์กฐ๊ธˆ ์–ด๋ ค์› ์ง€๋งŒ.. (์กฐ๊ธˆ.. ์กฐ๊ธˆ ์–ด๋ ค์› ๋˜๊ฑธ๋กœ..) ๊ทธ๋ž˜๋„ wandb์— ์ฐํžˆ๋Š” log๋ฅผ ๋ณด๋‹ˆ๊นŒ ํ™•์‹คํžˆ ์‹œ๊ฐํ™”๋ผ๋Š” ํšจ๊ณผ๊ฐ€ ๋Œ€๋‹จํ•˜๋‹ค๊ณ  ๋Š๊ผˆ๋‹ค! ์ด์ œ wandb ์—ฐ๊ฒฐ๋„ ํ–ˆ์œผ๋‹ˆ dataset ๊ตฌ์„ฑํ•˜๊ณ  ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ ๋Œ๋ฆด์ง€๋งŒ ๊ณ ๋ฏผํ•˜๋ฉด ๋œ๋‹ค! ํ”ผ์–ด์„ธ์…˜ ๋•Œ์—๋Š” ์ด๋ฒˆ ์ฃผ ํŒ€ ํšŒ๊ณ ๋ฅผ ์ž‘์„ฑํ•˜๊ณ , detection task์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” augmentation์— ๋Œ€ํ•ด์„œ ๋…ผ์˜ํ•ด๋ณด์•˜๋‹ค. detection์ด๋ผ bounding box๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ• ์ง€๊ฐ€ ํฐ ๊ด€๊ฑด...(!!!) ๋ฒŒ์จ level 2์˜ ํ•œ ์ฃผ๊ฐ€ ์ง€๋‚ฌ๋Š”๋ฐ ํ™•์‹คํžˆ level 1 ์˜ ํ•œ ์ฃผ๋ณด๋‹ค ๋นจ๋ฆฌ ์ง€๋‚˜๊ฐ€๋Š” ๊ฒƒ ๊ฐ™๋‹ค(์ ์‘ํ•ด์„œ ๊ทธ๋Ÿฐ๊ฐ€)๐Ÿค”

728x90
Comments