Stanford CS231n Lecture 2. Image Classification
Stanfoard CS231n 2017를 요약한 포스팅입니다. 정보 전달보다 자신을 위한 정리 목적이 강한 글입니다! :)
Image Classification
- Computer Vision의 핵심 Task
- Input : Image
- Output : Category Labels
- Semantic Gap
- 이미지에서 추출할 수 있는 정보(색, 질감)와 사람들이 원하는 추상적 정보의 차이
- Challenges
- Viewpoint Variation ( 보는 각도 )
- Illumination ( 조명 )
- Deformation ( 변형 )
- Occlusion ( 은폐, 숨김 )
- Background Clutter ( 배경과 섞임 )
- Intraclass Variation ( 물체의 다양성 )
- 상상하는 모든 이미지를 실시간으로 판단하고 싶음
- Attempts
- Find edges
- Find corners
- brittle
Data-Driven Approach
- Collect a dataset of images and labels
- Use Maching Learning to train a classifier
- Evaluate the classifier on new images
Classifier : Nearest Neighbor
- Train : Memorize all data and labels
- Predict : Predict the label of the most similiar training image
Example Dataset: CIFAR10
- 10 classes(airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)
- 50000 training images
- 10,000 testing images
Distance Metric
- L1(Manhattan) distance
- L2(Euclidean) distance
Hyper Paramter
- What is the best value of k to use?
- Whate is the best distinct to use?
- Choices about the algorithm that we set rather than learn
- Setting Hyper Paramters
- Idea 1. Choose hyper paramters that work best on your data
- Bad : \(K=1\) always work perfectly on training data
- Idea 2. Split data into train and test, choose hyper parameters that work best on test data
- Bad : No idea how algorithm will perform on new data
- Idea 3. Split data into train, val, and test; choose hyper parameters on val and evaluate on test
- Better!
- validation set : check accuracy, check how well algorithm is doing
- idea 4. Cross-Validation: Split data into folds, try each fold as validation and average the results
- Useful for small datasets, but not used too frequently in deep learning
- Idea 1. Choose hyper paramters that work best on your data
KNN on images never used
- Very slow at test time
- Distance metrics on pixels are not informative
- Curse of dimensionality ( 차원의 저주 )
Parametric Approach
- Image : Array of \(32\times 32\times 3\) (3072)
- Function : \(f(x,W)\)
- Output : 10 numbers giving class scores
Coming up
- Loss function : quantifying what it means to have a “good” W
- Optimization : start with random W and ifnd a W that minimizes the loss
- ConvNets : tweak the functional form of f
Reference
카일스쿨 유튜브 채널을 만들었습니다. 데이터 사이언스, 성장, 리더십, BigQuery 등을 이야기할 예정이니, 관심 있으시면 구독 부탁드립니다 :)
PM을 위한 데이터 리터러시 강의를 만들었습니다. 문제 정의, 지표, 실험 설계, 문화 만들기, 로그 설계, 회고 등을 담은 강의입니다
이 글이 도움이 되셨거나 다양한 의견이 있다면 댓글 부탁드립니다 :)