hye-log

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 04_DAY 18 ๋ณธ๋ฌธ

Boostcourse/AI Tech 4๊ธฐ

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 04_DAY 18

iihye_ 2022. 10. 14. 18:35

๐Ÿš€ ๊ฐœ๋ณ„ํ•™์Šต


[5] Object Detection

1. Object Detection

1) classfication + box localization(bounding box)

2) ์ž์œจ์ฃผํ–‰, OCR ๋“ฑ์—์„œ ํ™œ์šฉ

 

2. Two-stage detector

1) R-CNN

- ์˜์ƒ์—์„œ region proposal์„ ์ œ์‹œํ•œ ํ›„ CNN ๋ชจ๋ธ์„ ํ†ตํ•ด object detection ์ˆ˜ํ–‰

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).

- Input์œผ๋กœ image๋ฅผ ๋„ฃ์Œ

- ์•ฝ 2000๊ฐœ ์ดํ•˜๋กœ region proposal์„ ๊ตฌํ•จ

- ๊ฐ region proposal์„ ์‚ฌ์ด์ฆˆ(224×224)์— ๋งž๊ฒŒ ์ž˜๋ผ์„œ CNN์„ ํƒœ์›€

- classify(๋ถ„๋ฅ˜)ํ•จ

- ํ•œ๊ณ„ : region proposal ํ•˜๋‚˜์”ฉ CNN์— ํƒœ์šฐ๋‹ค๋ณด๋‹ˆ ์†๋„๊ฐ€ ๋Š๋ฆฌ๊ณ  ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ

2) Fast R-CNN

- ์˜์ƒ ์ „์ฒด์— ๋Œ€ํ•œ feature ์ถ”์ถœ ํ›„ ์žฌํ™œ์šฉํ•ด์„œ ์—ฌ๋Ÿฌ object detection ์ˆ˜ํ–‰

Girshick, R. (2015). Fast r-cnn. In  Proceedings of the IEEE international conference on computer vision  (pp. 1440-1448).

- input image์—์„œ conv feature map์„ ์ถ”์ถœ

- feature map์„ ํ†ตํ•ด ROI(Region of Interest)์— ํ•ด๋‹นํ•˜๋Š” feature ์ถ”์ถœ

- ๊ฐ๊ฐ์˜ ROI์—์„œ class์™€ box ์˜ˆ์ธก

- ์žฅ์  : feature map์œผ๋กœ R-CNN์— ๋น„ํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃธ

- ํ•œ๊ณ„ : ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐ์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ

3) Faster R-CNN

- neural region proposal๋กœ end-to-end ๋ฐฉ์‹์˜ object detection ์ˆ˜ํ–‰

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks.  Advances in neural information processing systems ,  28 .

- feature map์—์„œ Region Proposal Network(RPN)์œผ๋กœ ์—ฌ๋Ÿฌ proposal์„ ์ œ์•ˆํ•œ ํ›„ ROI pooling์„ ์ˆ˜ํ–‰

4) R-CNN family ์š”์•ฝ

https://lilianweng.github.io/posts/2017-12-31-object-recognition-part-3/

 

3. Single-stage detector

0) one-stage vs. two-stage

- one-stage detector : ์ •ํ™•๋„๋Š” ๋–จ์–ด์ง€์ง€๋งŒ ์‹œ๊ฐ„์ด ๋‹จ์ถ•๋จ(์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ์— ์œ ์šฉ)

- two-stage detector : ์ •ํ™•๋„๊ฐ€ ๋†’์ง€๋งŒ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ(R-CNN family์™€ ๊ฐ™์ด ๋ช‡๋ช‡ sampling ๋œ region์„ ์„ ๋ณ„ํ•˜์—ฌ detection ์ง„ํ–‰)

1) YOLO(You Only Look Once)

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In  Proceedings of the IEEE conference on computer vision and pattern recognition  (pp. 779-788).

- input ์ด๋ฏธ์ง€๋ฅผ S×S grid๋กœ ๋‚˜๋ˆˆ ํ›„ bounding box๋ฅผ ์ฐพ๊ณ , class score๋ฅผ ๊ตฌํ•จ

- bounding box ์˜์—ญ์˜ class score๋ฅผ ์ตœ์ข…์ ์œผ๋กœ ์ถœ๋ ฅ

- ํ•œ๊ณ„ : ๋งˆ์ง€๋ง‰์—๋งŒ classification์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— localization ์ •ํ™•๋„๋Š” ๋–จ์–ด์ง

2) SSD(Single Shot multibox Detector)

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In  European conference on computer vision  (pp. 21-37). Springer, Cham.

- multi-scale object๋ฅผ ์ž˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ค‘๊ฐ„ feature map์„ ํ•ด์ƒ๋„์— ๋งž๊ฒŒ ์ถœ๋ ฅํ•จ

 

4. Two-stage detector vs. One-stage detector

1) Focal loss

- one-stage detector๋Š” ๋ชจ๋“  ์˜์—ญ์—์„œ loss๋ฅผ ๊ณ„์‚ฐํ•จ

- ์ผ๋ฐ˜์ ์œผ๋กœ ์˜์ƒ์—์„œ object ์˜์—ญ๋ณด๋‹ค background ์˜์—ญ์ด ๋„“์Œ -> class imbalance ๋ฐœ์ƒ

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In  Proceedings of the IEEE international conference on computer vision  (pp. 2980-2988).

- cross entropy์˜ ํ™•์žฅ ๋ฒ„์ „์œผ๋กœ, ์ •๋‹ต์—์„œ ๋ฉ€์ˆ˜๋ก ๊ฐ•ํ•œ weight, ์ •๋‹ต์—์„œ ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ž‘์€ weight๋ฅผ ๋ถ€์—ฌ

2) Retinanet

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In  Proceedings of the IEEE international conference on computer vision  (pp. 2980-2988).

- one-stage network

- Feature Pyramid Networks(FPN)์„ ํ†ตํ•ด์„œ low level์˜ ํŠน์ง•๊ณผ high level์˜ ํŠน์ง•์„ ๋ชจ๋‘ ๊ฐ€์ ธ๊ฐ

- class์™€ box prediction branches๋ฅผ ์ฑ„ํƒ

 

5. Detection with Transformer

0) Transformer

- NLP์—์„œ๋งŒ ์‚ฌ์šฉํ•˜๋˜ transformer๋ฅผ computer vision task์—๋„ ์ ์šฉํ•˜๋ ค๋Š” ์›€์ง์ž„์ด ์žˆ์Œ

- ViT(Vision Transformer) by Google

- DeiT(Data-efficient image Transformer) by Facebook

- DETR(DEtection TRansformer) by Facebook

1) DETR

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In  European conference on computer vision  (pp. 213-229). Springer, Cham.

- transformer๋ฅผ object detection์— ์ ์šฉํ•œ ์‚ฌ๋ก€

- CNN๊ณผ positional encoding์œผ๋กœ input token์„ ๋„ฃ๊ณ , object queries๋ฅผ ํ†ตํ•ด ๋ฌผ์ฒด๊ฐ€ ์žˆ๋Š”์ง€(class/no object), ์žˆ๋‹ค๋ฉด ์–ด๋””์— ์žˆ๋Š”์ง€(box)๋ฅผ ์˜ˆ์ธกํ•จ



๐Ÿš€ ์˜ค๋Š˜์˜ ํšŒ๊ณ 

๋ฐ์ผ๋ฆฌ ์Šคํฌ๋Ÿผ ์‹œ๊ฐ„์—๋Š” ํ•˜๋ฃจ๋™์•ˆ ๋ชฉํ‘œ๋Ÿ‰ ์ •ํ•˜๊ณ  ์–ด์ œ ์Šฌ๋ž™์— ์˜ฌ๋ ธ๋˜ ๋ฌธ์ œ ํ•˜๋‚˜์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐํ–ˆ๋‹ค. ๊ณผ์ œ์—์„œ ๋” ๋ชจ๋ธ ํ•™์Šตํ•ด๋ณด๋Š” ๋ฌธ์ œ์˜€๋Š”๋ฐ ๊ฐ’์„ ๋ฐ”๊ฟ€ ๋•Œ๋งˆ๋‹ค ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€์„œ ์–ด๋–ค ๊ฒƒ์ด ๋ฌธ์ œ์—์„œ ์˜๋„ํ•˜๋Š” ๋ฐ”์ธ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์–ด์„œ ์งˆ๋ฌธํ–ˆ๋‹ค. ๊ฐ•์˜ ํ•˜๋‚˜ ๋“ฃ๊ณ  ์ ์‹ฌ ๋จน๊ณ  ๋‹ค์‹œ ๋Œ์•„์™€์„œ ๊ฐ•์˜ ๋‚ด์šฉ ์ •๋ฆฌํ–ˆ๋‹ค. ์ด๋ฒˆ ์ฃผ ๋‚ด๋‚ด ๋ญ”๊ฐ€ ์ซ“๊ธฐ๋“ฏ์ด ๊ณต๋ถ€ํ•˜๋Š๋ผ ๊ฐ•์˜ ๋“ฃ๊ณ  ๋ฐ”๋กœ ์ •๋ฆฌํ•˜์ง€ ๋ชปํ–ˆ๋Š”๋ฐ, ํ™•์‹คํžˆ ๊ฐ•์˜ ๋“ฃ๊ณ  ๋ฐ”๋กœ ์ •๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ๋” ์ •๋ฆฌํ•˜๊ธฐ ํŽธํ•œ ๊ฒƒ ๊ฐ™๋‹ค. ๊ธˆ์š”์ผ์€ 3์‹œ๊ฐ„ ๋™์•ˆ ์คŒ.. ํ•˜๋Š” ๋‚ ...! ์ŠคํŽ˜์…œ ํ”ผ์–ด์„ธ์…˜์„ ์‹œ์ž‘์œผ๋กœ ํ”ผ์–ด์„ธ์…˜, ๋งˆ์Šคํ„ฐ ํด๋ž˜์Šค๊นŒ์ง€ ์™„์ฃผํ–ˆ๋‹ค! ์ŠคํŽ˜์…œ ํ”ผ์–ด์„ธ์…˜ ๋•Œ๋Š” ์•„๋ฌด๋ž˜๋„ ์š”์ฆ˜ ์ตœ๋Œ€ ๊ด€์‹ฌ์‚ฌ๊ฐ€ ํŒ€ ๋นŒ๋”ฉ์ด๋‹ค๋ณด๋‹ˆ ํŒ€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•˜๊ณ  ์žˆ๋Š”์ง€, ์–ด๋–ค ๋ถ„์•ผ์— ๊ด€์‹ฌ์ด ์žˆ๋Š”์ง€ ์ด์•ผ๊ธฐํ–ˆ๋‹ค. ์•„๋ฌด๋ž˜๋„ ๊ฐ€์žฅ ๊ฑฑ์ •๋˜๋Š” ๊ฑด ์ทจ์—…๊ณผ ํƒˆ์ฃผ..๊ฐ€ ์•„๋‹๊นŒ ์‹ถ๋‹ค..(ใ…‹ใ…‹) ํ”ผ์–ด์„ธ์…˜ ๋•Œ์—๋Š” ์ŠคํŽ˜์…œ ํ”ผ์–ด์„ธ์…˜ ๋•Œ ์–ด๋–ค ์ด์•ผ๊ธฐ ๋‚˜๋ˆ„์—ˆ๋Š”์ง€ ์ด์•ผ๊ธฐํ•˜๊ณ , ์ผ์ฃผ์ผ ํšŒ๊ณ ๋ก ์ž‘์„ฑํ–ˆ๋‹ค. ๋งˆ์Šคํ„ฐํด๋ž˜์Šค์—์„œ๋Š” ๊ณผ์ œ ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด์„ค์„ ์ง„ํ–‰ํ•ด์ฃผ์…จ๋Š”๋ฐ ๊ณผ์ œ์—์„œ ์˜๋„ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ”๋“ค์„ ์ž˜ ์ •๋ฆฌํ•ด์„œ ๋ง์”€ํ•ด์ฃผ์…”์„œ ๋„์›€์ด ๋งŽ์ด ๋˜์—ˆ๋‹ค! ๋ฒŒ์จ ๋ถ€์บ  ์‹œ์ž‘ํ•œ์ง€ 4์ฃผ๋‚˜ ์ง€๋‚ฌ๋‹ค. ์ฃผ๋ง์— ์‰ฌ์ง€ ๋ชปํ•˜์ง€๋งŒ(...) ๋ถ€์บ  ์—†๋Š” ์ฝ”ํ…Œ๋กœ ๊ฐ€๋“ํ•œ ์ฃผ๋ง์„ ๋ณด๋‚ด๊ณ  5์ฃผ์ฐจ๋„ ์—ด์‹ฌํžˆ ๋‹ฌ๋ ค๋ณด๋Š”๊ฑธ๋กœ ๊ฐ€์ฆˆ์•„๐Ÿš€

 

+ TMI

๋น„๋ฃจํ•œ ๋ธ”๋กœ๊ทธ์— ๋ถ€์บ  ํšŒ๊ณ ๋ก ๋•๋ถ„์— ๋ฐฉ๋ฌธ์ˆ˜๊ฐ€ ์˜ฌ๋ž๋‹ค 300 ์ด๋ผ๋‹ˆ ๋ณ„๊ฑฐ ์—†๋Š” ๋ธ”๋กœ๊ทธ์— ์ฐพ์•„์™€์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค(๊พธ๋ฒ…)

728x90
Comments