hye-log

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 05_DAY 21 ๋ณธ๋ฌธ

Boostcourse/AI Tech 4๊ธฐ

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 05_DAY 21

iihye_ 2022. 10. 20. 02:57

๐Ÿฅ” ๊ฐœ๋ณ„ํ•™์Šต


[8] Conditional Gereative Model

1. Conditional generative model

1) ์ฃผ์–ด์ง„ '์กฐ๊ฑด'์— ๋Œ€์‘ํ•˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋จ

2) Generative model vs. Conditional generative model

- Generative model : ๋‹จ์ˆœํžˆ random sample์„ ์ƒ์„ฑ

- Conditional generative model : condition์— ๋งž๋Š” random sample์„ ์ƒ์„ฑ

3) Conditional generative model์˜ ์˜ˆ์‹œ

- audio super resolution : ์ €ํ€„๋ฆฌํ‹ฐ์˜ ์Œ์„ฑ์„ ๊ณ ํ€„๋ฆฌํ‹ฐ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜

- machine translation : ๋ฒˆ์—ญ๊ธฐ

- article generation with the title : ํƒ€์ดํ‹€๊ณผ ๋ถ€์ œ๋ชฉ์ด ์ฃผ์–ด์ง€๋ฉด ๊ด€๋ จ๋œ ๋‚ด์šฉ(article)์„ ์ž‘์„ฑ

4) Image-to-Image 

- image๋ฅผ ๋‹ค๋ฅธ image๋กœ translating

- Style Transfer, Super resolution, Colorization ๋“ฑ

5) Example: Super resolution

- Input : ์ €ํ•ด์ƒ๋„(LR) ์ด๋ฏธ์ง€ -> Output : ๊ณ ํ•ด์ƒ๋„(HR) ์ด๋ฏธ์ง€

- Naive Regression model : MAE(L1) or MSE(L2) loss ์‚ฌ์šฉ

- Super Resolution GAN : Real HR Image์™€ Fake HR Image๋ฅผ ํŒ๋ณ„ํ•จ

- MAE/MSE : ์ƒ์„ฑ๋œ patch๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‹ค๋ฅธ patch์™€์˜ error๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ ๋‹นํ•œ ํ‰๊ท  ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ

- GAN : real data์™€ ๊ตฌ๋ถ„ ๋ชปํ•˜๋Š” ๊ฒŒ ๋ชฉ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด์˜ blurryํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ

 

2. Image translations GANs

1) Pix2Pix

(1) ๋ฌธ์ œ ์ •์˜

- ์ด๋ฏธ์ง€๋ฅผ style๊ณผ ๊ฐ™์ด ๋‹ค๋ฅธ ๋„๋ฉ”์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ

- ์˜ˆ) label to street scene, label to facade, bw to color, aerial to map, day to night, edges to photo

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In  Proceedings of the IEEE conference on computer vision and pattern recognition  (pp. 1125-1134).

(2) Loss function

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In  Proceedings of the IEEE conference on computer vision and pattern recognition  (pp. 1125-1134).

- GAN loss : realistic ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์œ ๋„ํ•จ

- L1 loss : blurry ํ•œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ. ์ ๋‹นํ•œ ๊ฐ€์ด๋“œ๋ฅผ ๋งŒ๋“ฆ

(3) GAN loss์˜ ์—ญํ• 

- L1 loss๋งŒ ์‚ฌ์šฉํ•˜๋ฉด blurryํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ

- GAN loss ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด sharpํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๋งŒ๋“ค์–ด์ง€์ง€๋งŒ style ์œ ์ง€๊ฐ€ ๋˜์ง€ ์•Š์Œ

- L1 loss์™€ GAN์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•ด์•ผ style๋„ ์œ ์ง€๋˜๊ณ  sharpํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๋งŒ๋“ค์–ด์ง

2) CycleGAN

(1) ๋ฌธ์ œ ์ธ์‹

- Pix2Pix์˜ ๊ฒฝ์šฐ pairwise data๋ฅผ ํ•„์š”๋กœ ํ•จ

- unpaired data๋กœ๋„ image๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

(2) CycleGAN์˜ ํŠน์ง•

- non-pairwise dataset์œผ๋กœ๋„ ์ด๋ฏธ์ง€ translation์ด ๊ฐ€๋Šฅ

Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In  Proceedings of the IEEE international conference on computer vision  (pp. 2223-2232).

(3) Loss function - GAN Loss

Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In  Proceedings of the IEEE international conference on computer vision  (pp. 2223-2232).

- G : input X๋กœ output Y๋ฅผ ์ƒ์„ฑํ•˜๋Š” generator

- F : input Y๋กœ oupput X๋ฅผ ์ƒ์„ฑํ•˜๋Š” generator

- Dx : X style๋กœ ๊ฐ”๋Š”์ง€ ํŒ๋ณ„ํ•˜๋Š” discriminator

- Dy : Y style๋กœ ๊ฐ”๋Š”์ง€ ํŒ๋ณ„ํ•˜๋Š” discriminator

(4) Model Collapse

- GAN Loss๋งŒ ์‚ฌ์šฉํ•  ๋•Œ์˜ ๋ฌธ์ œ์ 

- input์— ์ƒ๊ด€์—†์ด ํ•˜๋‚˜์˜ output๋งŒ์„ ์ƒ์„ฑ

- ์–ด๋–ค X(input image)๋ฅผ ๋„ฃ๋“  Y(realistic image)๊ฐ€ ๊ฐ™์Œ -> Dy๋Š” Y๋ฅผ ๋ณด๊ณ  ํ•ญ์ƒ realistic ํ•˜๋‹ค๊ณ  ํŒ๋‹จ -> G๊ฐ€ ์ž˜ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ํŒ๋‹จ

- ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Y๋ฅผ ๋„ฃ์œผ๋ฉด ํ•ญ์ƒ X๊ฐ€ ๊ฐ™์Œ -> Dx๋Š” X์˜ style๋งŒ ๋ณด๊ณ  ๋งž๋‹ค๊ณ  ํŒ๋‹จ -> F๊ฐ€ ์ž˜ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ํŒ๋‹จ

(5) Cycle-consistency loss

Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In  Proceedings of the IEEE international conference on computer vision  (pp. 2223-2232).

- X -> Y / Y -> X  ์—์„œ ์ฐจ์ด๊ฐ€ ์žˆ์œผ๋ฉด ์•ˆ ๋จ(content๊ฐ€ ์œ ์ง€๋˜์–ด์•ผ ํ•จ)

3) Perceptual loss

(1) GAN train์˜ ์–ด๋ ค์›€

- generator-discriminator ์‚ฌ์ด์˜ ๊ฒฝ์Ÿ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— GAN์€ ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ค์›€

- GAN ์—†์ด high-quality image๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‚˜?

(2) GAN loss vs. Perceptual loss

- GAN loss : train, ์ฝ”๋“œ ์ž‘์„ฑ์ด ์–ด๋ ค์›€. pre-trained network ํ•„์š” ์—†์Œ. application ์ œ์•ฝ X

- Perceptual loss : train, ์ฝ”๋“œ ์ž‘์„ฑ ํŽธํ•จ. pre-trained network ํ•„์š”

(3) Observation

- Pre-trained classifier๊ฐ€ ์‚ฌ๋žŒ์˜ ์ง€๊ฐ๋Šฅ๋ ฅ๊ณผ ์œ ์‚ฌํ•จ

- image๋ฅผ perceptual space๋กœ ๋ณ€ํ™˜

(4) Perceptual loss

- Feature reconstruction loss : feature map ์‚ฌ์ด์˜ loss๋ฅผ ๊ตฌํ•˜์—ฌ content๋ฅผ ์œ ์ง€ํ•˜๋Š”์ง€ ํ™•์ธ

- Style reconstruction loss : feature map์˜ ํ†ต๊ณ„์  ํŠน์„ฑ์„ ํฌํ•จํ•˜๋Š” gram matrix๋ฅผ ์ƒ์„ฑํ•˜์—ฌ multiple feature map์„ ์–ป์Œ

 

3. Various GAN applications

1) Deepfake

- ์‚ฌ๋žŒ์˜ ์–ผ๊ตด์ด๋‚˜ ์Œ์„ฑ์„ ๋‹ค๋ฅธ ์–ผ๊ตด์ด๋‚˜ ์Œ์„ฑ์œผ๋กœ ๋ฐ”๊ฟˆ

2) Face de-identification

- ์‚ฌ๋žŒ์˜ ์–ผ๊ตด์„ ์‹๋ณ„ํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ๋น„์Šทํ•œ ๋ชจ์Šต์œผ๋กœ ๋ฐ”๊ฟˆ

3) Video translation



๐Ÿฅ” ์˜ค๋Š˜์˜ ํšŒ๊ณ 

์˜ค๋Š˜์€ generative model ๊ฐ•์˜๋ฅผ ๋“ค์—ˆ๋‹ค. generative model์€ ํฅ๋ฏธ๋กœ์šด ์ฃผ์ œ์ด๊ธดํ•œ๋ฐ ํ•™์ˆ ์ ์œผ๋กœ ํŒŒ๊ณ  ๋“ค๋ฉด ์–ด๋ ค์šด.. ๊ฒƒ ๊ฐ™๋‹ค.. ์ตœ๊ทผ์— ๋‹ฌ๋ฆฌ์™€ ๊ฐ™์€ ์ƒ์„ฑ ๋ชจ๋ธ ์‚ฌ์ดํŠธ๊ฐ€ ์ธ๊ธฐ๋ฅผ ๋Œ๊ณ  ์žˆ๋Š”๋ฐ, ๋‹จ์ˆœํžˆ ์žฌ๋ฏธ์žˆ๋Š” application์œผ๋กœ๋„ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ์—๋„ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์–ด์„œ ๊ณ„์† ์—ฐ๊ตฌํ• ๋งŒํ•œ ์ฃผ์ œ์ด๊ธฐ๋Š” ํ•˜๋‹ค. ํ”ผ์–ด์„ธ์…˜ ๋•Œ์—๋Š” perceptual loss์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐํ–ˆ๋Š”๋ฐ loss๋ฅผ ์ˆ˜์‹์ ์œผ๋กœ ํŒŒ๊ณ  ๋“ค๋ฉด ํ—ท๊ฐˆ๋ฆฌ๋Š” ๋ถ€๋ถ„์ด ์ƒ๊ธฐ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ๋‚ด์ผ์€ ํ•˜๋ฃจ ์ข…์ผ(..) ๊นƒํ—ˆ๋ธŒ ํŠน๊ฐ•์ด ์žˆ๋Š”๋ฐ ๋งˆ์ง€๋ง‰ ํŠน๊ฐ•์ธ๋งŒํผ ์—ด์‹ฌํžˆ ๋“ค์–ด์•ผ์ง€!

728x90
Comments