hye-log

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 03_DAY 12 ๋ณธ๋ฌธ

Boostcourse/AI Tech 4๊ธฐ

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 03_DAY 12

iihye_ 2022. 10. 6. 02:18

๐ŸŒฑ ๊ฐœ๋ณ„ํ•™์Šต


[3] Optimization

1. Important Concepts in Optimization

1) Generalization(์ผ๋ฐ˜ํ™”)

- Generalization gap(training error์™€ test error ์‚ฌ์ด์˜ ์ฐจ)๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉ์ 

- ํ•™์Šต์„ ํ†ตํ•ด training error๊ฐ€ ์ค„์–ด 0์ด ๋˜์–ด๋„ test error๊ฐ€ ์ปค์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฒƒ์ด ์ค‘์š”

2) Underfitting vs. Overfitting

- Overfitting(๊ณผ์ ํ•ฉ)์€ training error๋ณด๋‹ค test error๊ฐ€ ์ปค์„œ training data์— ์ง€๋‚˜์น˜๊ฒŒ fitting ๋œ ์ƒํƒœ

- Underfitting๊ณผ Overfitting ์‚ฌ์ด์˜ balanced ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ค‘์š”

3) Cross-Validation

- traing data๋ฅผ train, valid, test๋กœ ๋‚˜๋ˆ„์–ด์„œ ์‚ฌ์šฉ

- valid๋Š” train์˜ ํ•™์Šต์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ๋˜์—ˆ๋Š”์ง€ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ

4) Bias and Variance

- Bias : ์–ผ๋งˆ๋‚˜ mean(ํ‰๊ท )์— ๊ฐ€๊นŒ์šด์ง€

- Variance : ์–ผ๋งˆ๋‚˜ ์ผ๊ด€์ ์ธ์ง€

5) Bootstrapping

- Bagging(Boostrapping aggregating) : ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•œ ํ›„ ๊ฐœ๋ณ„ ๋ชจ๋ธ ํ•™์Šต

- Boosting : ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์–ด๋ ค์šด ์ƒ˜ํ”Œ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘์–ด ํ•™์Šตํ•จ

 

2. Practical Gradient Descent Methods

1) Gradient Descenet Methods

- Stochastic Gradient Descent(SGD) : ๋‹จ์ผ ์ƒ˜ํ”Œ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•œ gradient๋กœ ์—…๋ฐ์ดํŠธ

- Mini-batch Gradient Descent : ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•œ gradient๋กœ ์—…๋ฐ์ดํŠธ

- Batch Gradient Descent : ์ „์ฒด ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•œ gradient๋กœ ์—…๋ฐ์ดํŠธ

2) Batch-size Matters

- Small batch-size๋Š” Flat Minimum์„ ๊ฐ€์ ธ์˜ด

- Large batch-size๋Š” Sharp Minimum์„ ๊ฐ€์ ธ์˜ด

- training function์—์„œ ์ž˜ ๋™์ž‘ํ•˜๋ฉด testing fuction์—์„œ๋„ ์ž˜ ๋˜๋Š” ๊ฒƒ์ด generalization

Keskar, Nitish Shirish, et al. "On large-batch training for deep learning: Generalization gap and sharp minima."  arXiv preprint arXiv:1609.04836  (2016).

3) Gradient Descent

- learning rate(η) ์žก๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Œ

4) Momentum

- momentum(β)๊ฐ€ ํฌํ•จ๋œ gradient ์‚ฌ์šฉ

- momentum์€ ํ•œ ๋ฒˆ ํ๋ฅธ ๋ฐฉํ–ฅ์„ ์–ด๋Š ์ •๋„ ์œ ์ง€์‹œ์ผœ์ค˜์„œ ํ•™์Šต์— ์œ ๋ฆฌํ•จ

5) Nesterov Accelerated Gradient

- Lookahead gradient๋ฅผ ์‚ฌ์šฉ

6) Adagrad

- ์ง€๊ธˆ๊นŒ์ง€์˜ gradient๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ–ˆ๋Š”์ง€(sum of gradient squares)๋ฅผ ์‚ฌ์šฉ

- G๊ฐ€ ๊ณ„์† ์ปค์ง€๋ฉด ํ•™์Šต์ด ๋ฉˆ์ถ”๋Š” ํ˜„์ƒ ๋ฐœ์ƒ

7) Adadelta

- learning rate ์—†์ด ๊ณ„์† ์ปค์ง€๋Š” G๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•ด์„œ ๊ณ ์•ˆ

8) RMSprop

- step ์‚ฌ์ด์ฆˆ๋ฅผ ์ด์šฉํ•˜์—ฌ gradient ๊ณ„์‚ฐ

9) Adam

- ์ ์ ˆํ•œ learning rate๋ฅผ ์‚ฌ์šฉ

- ์ด์ „ gradient ์ •๋ณด์ธ momentum์„ ์ด์šฉ

 

 

3. Regularization

1) Early Stopping : generalization gap์ด ์ฆ๊ฐ€ํ•˜์ง€ ์•Š๋Š” ๊ตฌ๊ฐ„์—์„œ ํ•™์Šต์„ ์ข…๋ฃŒํ•จ

2) Parameter Norm Penalty : parameter๊ฐ€ ์ปค์ง€์ง€ ์•Š๊ฒŒ ๋ชจ๋ธ ๋ณต์žก๋„๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•

3) Data Augmentation : ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๊ณตํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

- label์ด ๋ณ€ํ•˜์ง€ ์•Š๋Š” ์„ ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๊ณต

- ์˜ˆ) MNIST์—์„œ flip, rotate ๋“ฑ augmentation ๊ธฐ๋ฒ•์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„์•ผ ํ•จ

4) Noise Robustness : input์ด๋‚˜ weight์— noise๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๊ฐ•์ธํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

5) Label Smoothing : ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์•„์„œ ์„ž๋Š” ๋ฐฉ๋ฒ•

6) Dropout : ์ผ๋ถ€ ๋…ธ๋“œ์˜ ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

7) Batch Normalization



๐ŸŒฑ ์˜ค๋Š˜์˜ ํšŒ๊ณ 

๊ธฐ๋ณธ ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  optimizer ๊ฐ•์˜๋ฅผ ๋“ค์—ˆ๋‹ค! ๊ฐ•์˜ ๋‚ด์šฉ์ด ๊ทธ๋ ‡๊ฒŒ ๊ธด ๊ฑด ์•„๋‹Œ๋ฐ ์ƒ๊ฐ๋ณด๋‹ค ๊ฐ•์˜ ๋“ฃ๊ณ  ์ดํ•ดํ•˜๋Š”๋ฐ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฐ ํŽธ... ์ฒ˜์Œ ๋ฐฐ์šฐ๋Š” ๋‚ด์šฉ์€ ์•„๋‹ˆ์ง€๋งŒ ๋‹ค์‹œ ๋“ค์„์ˆ˜๋ก '์•„!' ํ•˜๋Š” ๋‚ด์šฉ๋„ ์žˆ๊ณ , '์•„?'ํ•˜๋Š” ๋‚ด์šฉ๋“ค๋„ ์žˆ๋‹ค(๋จธ์“ฑ) ๋‚ด์ผ์€ CNN, RNN ๊นŒ์ง€ ๋“ฃ๊ณ  ๊ณผ์ œ ๋งˆ์น˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ..! ํ”ผ์–ด์„ธ์…˜ ๋•Œ ๋ฐ์ด์ฝ˜์—์„œ ๋Œ€ํšŒ ํ•˜๋‚˜ ์ •ํ•ด์„œ ๋‚˜๊ฐ€๋ณด์ž๋Š” ์˜๊ฒฌ์ด ์žˆ์—ˆ๋‹ค. ์›”๊ฐ„ ๋ฐ์ด์ฝ˜ ๋Œ€ํšŒ๋ฅผ ์ฐพ์•„๋ณด๋‹ˆ ๊ทธ๋ฆผ์˜ ์ผ๋ถ€๋ฅผ ๋ณด๊ณ  ํ™”๊ฐ€๋ฅผ ๋งž์ถ”๋Š” ๋ฌธ์ œ์ธ๋ฐ ์šฐ์„  ํ™”๊ฐ€๊ฐ€ 50๋ช… ์žˆ๊ณ , ๊ทธ๋ ‡๋‹ค๋ฉด classification์„ 50๊ฐœ ํ•ด์•ผํ•˜๋Š”๊ฑด๊ฐ€..ใ…‡ใ……ใ…‡ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋‹ˆ ์ด๊ฑธ ํ•  ์ˆ˜ ์žˆ์„์ง€ ๊ฑฑ์ •๋„ ๋˜์ง€๋งŒ, ๊ฒฝ์ง„๋Œ€ํšŒ ํ•˜๋ฉด์„œ ์‹ค๋ ฅ ๋Š๋Š”๊ฒŒ ํฌ๋‹ˆ๊นŒ ๋ญ๋“  ์—ด์‹ฌํžˆ ํ•ด๋ณด์ž๐Ÿ˜Š 

(+) ์˜ค๋Š˜ ๋ญ”๊ฐ€ ๋Š๋‚Œ์ด ์™€์„œ ๋ฉ”์ผํ•จ ๊ณ„์† ์—ด์–ด๋ดค๋Š”๋ฐ ์„œ๋ฅ˜ํ•ฉ ๋ฉ”์ผ์ด ์™”๋‹ค! ์—ดํ˜ ๊ฐ„ ์ฝ”ํ…Œ ์—ด์‹ฌํžˆ ํŒŒ์ด๋งโœจ

728x90
Comments