CS231n 9강. CNN Architectures
Stanfoard CS231n 2017 9강을 요약한 글입니다. 정보 전달보다 자신을 위한 정리 목적이 강한 글입니다! :)
Today
- Case Study
- AlexNet
- VGG
- GoogLeNet
- ResNet
- Also
- NiN (Network in Network)
- DenseNet
- Wide ResNet
- FractalNet
- ResNeXT
- SqueezeNet
- Stochastic Depth
LeNet
- ConvNet의 최초 도입
- 우편 번호(zip code), 숫자(digit)에 사용
AlexNet
- First large scale ConvNet
- ILSVR’12 winner
- Input : 227x227x3 images
- First layer(CONV1)
- 96 11x11 filters applied at stride 4
- output size?
- 96 55x55
- total number of parameters?
- (11x11x3)x96 = 35K
- Second layer(POOL1)
- 3x3 filters applied at stride 2
- output size?
- 96 27x27
- total number of paramters?
- no parameter
- parameters are the weights that we’re trying to learn. and so convolutional layers have weights that we learn but pooling all we do is have a rule, we look at the pooling region, we take max. so there’s no parameters that are learned
- CONV/FC는 parameter가 있고 RELU/POOL 등은 parameter가 없음
- Details
- first use of ReLU
- used Norm layers (not common anymore)
- heavy data augmentation (flipping, jittering, cropping, color normalization …)
- dropout 0.5
- batch size 128
- SGD Momentum 0.9
- Learning rate : 1e-2, reduced by 10 manually when val accuracy plateaus
- L2 weight decay : 5e-4
- 7 CNN ensemble: 18.2% -> 15.4%
- 앞 부분 ConvNet이 2개로 나뉜 이유
- GTX 580으로 학습해서 2개의 GPU를 사용했었음
- ImageNet winners
ZFNet
- Improved hyperparameters
VGGNet
- much deeper networks, much smaller filters
- 레이어의 수를 16~19개까지 늘림(기존 AlexNet은 8개)
- 3x3 CONV stride 1, pad 1
- 2x2 MAX POOL stride 2
- Q1) Why use smaller filters? (3x3 conv)
- A1) 3x3 conv(stride 1) layer는 7x7 conv layer와 같은 effective receptive field를 가짐
- Q2) What is the effective receptive field of three 3x3 conv (stride 1) layers?
- 작은 필터를 사용하면 파라미터수를 줄일 수가 있고, 여러번 겹쳐서 사용하면 더 큰 필터가 표현하는 영역(Receptive Field)을 표현할 수 있게 됩니다
- 3x3 2번을 겹치면 5x5 영역에 대한 특징을 뽑을 수 있고, 3번 겹쳐서 사용하면 7x7 영역에 대한 특징을 뽑아낼 수 있음
- 3x(3x3xC(인풋의 채널 수)) = 27C
- 7x7xC=49C
- 파라미터 수는 더 줄이고 망은 더 깊어지는 효과(More nonlinearity)
- Details
- ILSVRC’14 2nd in classification, 1st in localization
- localization : 어디에 물체가 있는지(Bounding Box) + Classification
- Similar training procedure as Krizhevsky 2012
- No Local Response Normalisation (LRN)
- Use VGG16 or VGG19 (VGG19 only slightly better, more memory)
- Use ensembles for best results
- FC7 features generalize well to other tasks
- Featrue Representation!
- ILSVRC’14 2nd in classification, 1st in localization
- depth의 2가지
- depth : width x height x depth할 때의 depth
- depth : total number of layers
GoogLeNet
- much deeper networks with computational efficiency (22 layers)
- 효율적인 Inception Module
- No FC layers
- 5 million parameters(12x less than AlexNet)
- Inception Module
- good local network typology(network within a network), stack each other
- 여러 filter 연산을 parallel하게 진행한 후 concat(depth wise)
- Q) What is the problem with this?
- expensive compute
- Solution : “bottleneck” layers that use 1x1 convolutions to reduce feature depth
- 1x1 conv로 depth를 줄임
- Google Inception Model 참고. v1이 GoogLeNet. v4까지 정리되어 있습니다
ResNet
- very deep networks using residual connections
- 152 layers
- classification / detetection
- What happens when we continue stacking deeper layers on a “plain” convolutional neural network?
- 가설 : problem은 optimization 문제! deeper model이 optimize되긴 어려움
- Deep한 모델이 shallower한 모델보다 성능이 좋아야 함
- A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.
- Residual : 이전 몇 단계 전 레이어의 결과를 현재 레이어의 결과와 합쳐 내보내는 것
- Full ResNet architecture
- bottleneck layer
- 효율성 증대를 위해 사용
- GoogLeNet과 유사
- Training ResNet in practice
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used
- Result
- 사람보다 나은 performance를 보여줌
- Comparing complexity
- inception-v4 : ResNet + Inception
- VGG : Highest memory, most operations
- GoogLeNet : most efficient
- AlexNet : Smaller compute, still memory heavy, lower accuracy
- ResNet : Moderate efficieny depending on model, highest accuracy
- Time and power consumption
Other architectures
Network in Network (NiN)
Identity Mappings in Deep Residual Networks
- Improving ResNets
Wide Residual Networks
- Improving ResNets
ResNeXt(Aggregated Residual Transformations for Deep Neural Networks)
- Improving ResNets
Deep Networks with Stochastic Depth
- Improving ResNets
FractalNet: Ultra-Deep Neural Networks without Residuals
- Beyond ResNet
Densely Connected Convolutional Networks
- Beyond ResNet
SqueezeNet: AlexNet-level Accuracy With 50x Fewer Parameters and <0.5Mb Model Size
- Efficient networks
Summary
Reference
카일스쿨 유튜브 채널을 만들었습니다. 데이터 사이언스, 성장, 리더십, BigQuery 등을 이야기할 예정이니, 관심 있으시면 구독 부탁드립니다 :)
PM을 위한 데이터 리터러시 강의를 만들었습니다. 문제 정의, 지표, 실험 설계, 문화 만들기, 로그 설계, 회고 등을 담은 강의입니다
이 글이 도움이 되셨거나 다양한 의견이 있다면 댓글 부탁드립니다 :)