hye-log

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 06_DAY 25 ๋ณธ๋ฌธ

Boostcourse/AI Tech 4๊ธฐ

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech]WEEK 06_DAY 25

iihye_ 2022. 10. 25. 19:31

๐Ÿ”ฅ ๊ฐœ๋ณ„ํ•™์Šต


[2] Image Classification & EDA

1. EDA(Exploratory Data Analysis)

1) EDA

- ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„

- ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ

2) EDA์˜ ๋„๊ตฌ -> ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋ผ๋ฉด ์–ด๋–ค ๋„๊ตฌ๋“  ์ƒ๊ด€ ์—†์Œ

- ์ผ์ผ์ด ์†์œผ๋กœ

- Python

- Excel

 

2. Image Classification

1) Image

- ์‹œ๊ฐ์  ์ธ์‹์„ ํ‘œํ˜„ํ•œ ์ธ๊ณต๋ฌผ(Artifact)

- (width, height, channel)

2) Model

- Input + Model = Output

3) Image Classification Model

- Image + Classification Model = Class

 

3. Baseline


[3] Dataset

1. Pre-processing

1) Data Science

- ์ข‹์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” Pre-preocessing์ด ์ค‘์š”

- ๋‚˜๋จธ์ง€๋Š” model ๋“ฑ์—์„œ ๋‹ค๋ฃธ

2) Bounding box

- ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์ •๋ณด๊ฐ€ ์œ ์šฉํ•œ ๊ฒƒ์€ ์•„๋‹ˆ๋ฏ€๋กœ Bounding box๋กœ ๊ฒฝ๊ณ„๋ฅผ ๊ทธ๋ฆผ

3) Resize

- ๊ณ„์‚ฐ์˜ ํšจ์œจ์„ ์œ„ํ•ด ์ ๋‹นํ•œ ํฌ๊ธฐ๋กœ ์‚ฌ์ด์ฆˆ ๋ณ€๊ฒฝ

 

2. Generalization

1) Bias & Variance

- High Bias(Underfitting) : ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ํ•™์Šต์ด ์ถฉ๋ถ„ํžˆ ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ

- High Variance(Overfitting) : ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ๋ ค๋˜์—ˆ์ง€๋งŒ ํ•™์Šต์ด ๋„ˆ๋ฌด ๋œ ๊ฒฝ์šฐ

2) Train / Validation

- train set์˜ ์ผ์ • ๋ถ€๋ถ„์„ ๋”ฐ๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ valid set์œผ๋กœ ์‚ฌ์šฉ

- train set์— fitting ๋œ model์„ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— valid set์„ ํ†ตํ•ด์„œ ์–ผ๋งˆ๋งŒํผ ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š”์ง€ ๊ฒ€์ฆ

- train set์ด ๋ถ€์กฑํ•˜๋‹ค๊ณ ํ•ด์„œ test set์„ ๊ฑด๋“œ๋ฆฌ๋ฉด ์•ˆ ๋จ

3) Data Augmentation

- ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ(case), ์ƒํƒœ(state)์˜ ๋‹ค์–‘์„ฑ

- torchvision.transforms

- Albumentations


[4] Data Generation

1. Data Feeding

1) ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŒ๋“ค์–ด์ง€๋Š” ์†๋„์™€ ํŠœ๋‹์ด ์ค‘์š”ํ•จ

- ๊ฐ™์€ RandomRotation, Resize๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์–ด๋–ค ์ˆœ์„œ๋กœ transform์„ ํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Œ

2. torch.utils.data

1) Dataset

- Vanilla Data๋ฅผ Dataset์œผ๋กœ ๋ณ€ํ™˜

2) Dataset์˜ ๊ตฌ์กฐ

from torch.utils.data import Dataset		# torch.utils.data์˜ Dataset ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ƒ์†

class MyDataset(Dataset):
    def __init__(self):				# MyDataset ํด๋ž˜์Šค๊ฐ€ ์ฒ˜์Œ ์„ ์–ธ๋˜์—ˆ์„ ๋•Œ ํ˜ธ์ถœ
    	pass
       
    def __getitem__(self, index):		# MyDataset์˜ ๋ฐ์ดํ„ฐ ์ค‘ index ์œ„์น˜์˜ ์•„์ดํ…œ์„ ๋ฆฌํ„ด
    	return None
        
    def __len__(self):				# MyDataset ์•„์ดํ…œ์˜ ์ „์ฒด ๊ธธ์ด
    	return None

3) DataLoader

- Dataset -> DataLoader -> (Batch, Channel, Height, Width)

train_loader = torch.utils.data.DataLoader(
	train_set,
	batch_size = batch_size,
	num_workers = num_workers,
	drop_last = True,
)

4) Dataset๊ณผ DataLoader๊ฐ€ ํ•˜๋Š” ์ผ์€ ๋‹ค๋ฅด๋ฏ€๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Œ

- Dataset : Vanilla ๋ฐ์ดํ„ฐ๋ฅผ ์›ํ•˜๋Š” ํ˜•ํƒœ๋กœ ์ถœ๋ ฅํ•ด์ฃผ๋Š” ํด๋ž˜์Šค

- DataLoader : Dataset์„ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉ



๐Ÿ”ฅ ์˜ค๋Š˜์˜ ํšŒ๊ณ 

CV ๋Œ€ํšŒ์— ์—ด๋ฆฌ๋Š” ๊ฐ•์˜๊ฐ€ ๋งŽ์ง€ ์•Š์•„์„œ EDA, Dataset, Data Generation ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ์ด์•ผ๊ธฐ ํ•ด๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค. ๊ฐ•์˜์—์„œ๋Š” ์ฃผ์ œ์˜ ํ‹€์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐํ•˜๊ณ , ๋ฏธ์…˜์„ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ์…‹์— ์–ด๋–ป๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๊ฐ€์ด๋“œ๋ฅผ ์ฃผ๋Š” ์‹์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. EDA์—์„œ๋Š” ๊ฐ•์˜๋งŒ ๋ดค์„ ๋•Œ์—๋Š” ์–ด๋–ค ๊ฒƒ์„ ๋ถ„์„ํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ฐ์ด ์ž˜ ์•ˆ ์™”๋Š”๋ฐ, ๋ฏธ์…˜ ์ฝ”๋“œ๋ฅผ ๋ณด๋‹ˆ ๋ฐ์ดํ„ฐ์˜ ์–ด๋–ค ๋ถ€๋ถ„๋“ค์„ ๋ด์•ผ ํ•˜๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฐ์ดํ„ฐ augmentation์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ augmentation ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉ์‹œํ‚ค๋ฉด์„œ ์ด๋ฏธ์ง€์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณด์•˜๋‹ค. ์ด๋Ÿฐ augmentation์— ๋”ฐ๋ผ์„œ๋„ ์„ฑ๋Šฅ์ด ์–ด๋–ป๊ฒŒ ๋ฐ”๋€”์ง€ ์‹คํ—˜ํ•ด๋ณด๊ณ  ์‹ถ๋‹ค! ํ”ผ์–ด์„ธ์…˜ ๋•Œ์—๋Š” EDA ๊ฒฐ๊ณผ๋ž‘ ์–ด๋–ค augmentation์„ ์ ์šฉํ•˜๋ฉด ์ข‹๊ฒ ๋Š”์ง€๋ฅผ ์ด์•ผ๊ธฐํ•ด๋ณด์•˜๋‹ค. classification์ด๋‹ค๋ณด๋‹ˆ cutmix, mixup๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•๋“ค๋„ ์ œ์‹œ๋˜์—ˆ๋Š”๋ฐ, augmentation์œผ๋กœ๋Š” ์ ์šฉํ•ด๋ณด์ง€ ์•Š์•˜๋˜ ๊ธฐ๋ฒ•์ธ์ง€๋ผ ์–ด๋–ค ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋ ์ง€ ๊ธฐ๋Œ€๋œ๋‹ค!

728x90
Comments