CS231n 9강. CNN Architectures

Stanfoard CS231n 2017 9강을 요약한 글입니다. 정보 전달보다 자신을 위한 정리 목적이 강한 글입니다! :)


  • Case Study
    • AlexNet
    • VGG
    • GoogLeNet
    • ResNet
  • Also
    • NiN (Network in Network)
    • DenseNet
    • Wide ResNet
    • FractalNet
    • ResNeXT
    • SqueezeNet
    • Stochastic Depth


  • ConvNet의 최초 도입
  • 우편 번호(zip code), 숫자(digit)에 사용


  • First large scale ConvNet
  • ILSVR’12 winner
  • Input : 227x227x3 images
  • First layer(CONV1)
    • 96 11x11 filters applied at stride 4
    • output size?
      • 96 55x55
    • total number of parameters?
      • (11x11x3)x96 = 35K
  • Second layer(POOL1)
    • 3x3 filters applied at stride 2
    • output size?
      • 96 27x27
    • total number of paramters?
      • no parameter
      • parameters are the weights that we’re trying to learn. and so convolutional layers have weights that we learn but pooling all we do is have a rule, we look at the pooling region, we take max. so there’s no parameters that are learned
      • CONV/FC는 parameter가 있고 RELU/POOL 등은 parameter가 없음
  • Details
    • first use of ReLU
    • used Norm layers (not common anymore)
    • heavy data augmentation (flipping, jittering, cropping, color normalization …)
    • dropout 0.5
    • batch size 128
    • SGD Momentum 0.9
    • Learning rate : 1e-2, reduced by 10 manually when val accuracy plateaus
    • L2 weight decay : 5e-4
    • 7 CNN ensemble: 18.2% -> 15.4%
  • 앞 부분 ConvNet이 2개로 나뉜 이유
    • GTX 580으로 학습해서 2개의 GPU를 사용했었음
  • ImageNet winners


  • Improved hyperparameters


  • much deeper networks, much smaller filters
  • 레이어의 수를 16~19개까지 늘림(기존 AlexNet은 8개)
  • 3x3 CONV stride 1, pad 1
  • 2x2 MAX POOL stride 2
  • Q1) Why use smaller filters? (3x3 conv)
    • A1) 3x3 conv(stride 1) layer는 7x7 conv layer와 같은 effective receptive field를 가짐
    • Q2) What is the effective receptive field of three 3x3 conv (stride 1) layers?
    • 작은 필터를 사용하면 파라미터수를 줄일 수가 있고, 여러번 겹쳐서 사용하면 더 큰 필터가 표현하는 영역(Receptive Field)을 표현할 수 있게 됩니다
    • 3x3 2번을 겹치면 5x5 영역에 대한 특징을 뽑을 수 있고, 3번 겹쳐서 사용하면 7x7 영역에 대한 특징을 뽑아낼 수 있음
      • 3x(3x3xC(인풋의 채널 수)) = 27C
      • 7x7xC=49C
      • 파라미터 수는 더 줄이고 망은 더 깊어지는 효과(More nonlinearity)
  • Details
    • ILSVRC’14 2nd in classification, 1st in localization
      • localization : 어디에 물체가 있는지(Bounding Box) + Classification
    • Similar training procedure as Krizhevsky 2012
    • No Local Response Normalisation (LRN)
    • Use VGG16 or VGG19 (VGG19 only slightly better, more memory)
    • Use ensembles for best results
    • FC7 features generalize well to other tasks
      • Featrue Representation!
  • depth의 2가지
    • depth : width x height x depth할 때의 depth
    • depth : total number of layers


  • much deeper networks with computational efficiency (22 layers)
  • 효율적인 Inception Module
  • No FC layers
  • 5 million parameters(12x less than AlexNet)
  • Inception Module
    • good local network typology(network within a network), stack each other
    • 여러 filter 연산을 parallel하게 진행한 후 concat(depth wise)
    • Q) What is the problem with this?
      • expensive compute
      • Solution : “bottleneck” layers that use 1x1 convolutions to reduce feature depth
    • 1x1 conv로 depth를 줄임
  • Google Inception Model 참고. v1이 GoogLeNet. v4까지 정리되어 있습니다


  • very deep networks using residual connections
  • 152 layers
  • classification / detetection
  • What happens when we continue stacking deeper layers on a “plain” convolutional neural network?
  • 가설 : problem은 optimization 문제! deeper model이 optimize되긴 어려움
    • Deep한 모델이 shallower한 모델보다 성능이 좋아야 함
    • A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.
  • Residual : 이전 몇 단계 전 레이어의 결과를 현재 레이어의 결과와 합쳐 내보내는 것
  • Full ResNet architecture
  • bottleneck layer
    • 효율성 증대를 위해 사용
    • GoogLeNet과 유사
  • Training ResNet in practice
    • Batch Normalization after every CONV layer
    • Xavier/2 initialization from He et al.
    • SGD + Momentum (0.9)
    • Learning rate: 0.1, divided by 10 when validation error plateaus
    • Mini-batch size 256
    • Weight decay of 1e-5
    • No dropout used
  • Result
    • 사람보다 나은 performance를 보여줌
  • Comparing complexity
    • inception-v4 : ResNet + Inception
    • VGG : Highest memory, most operations
    • GoogLeNet : most efficient
    • AlexNet : Smaller compute, still memory heavy, lower accuracy
    • ResNet : Moderate efficieny depending on model, highest accuracy
  • Time and power consumption

Other architectures

Network in Network (NiN)

Identity Mappings in Deep Residual Networks

  • Improving ResNets

Wide Residual Networks

  • Improving ResNets

ResNeXt(Aggregated Residual Transformations for Deep Neural Networks)

  • Improving ResNets

Deep Networks with Stochastic Depth

  • Improving ResNets

FractalNet: Ultra-Deep Neural Networks without Residuals

  • Beyond ResNet

Densely Connected Convolutional Networks

  • Beyond ResNet

SqueezeNet: AlexNet-level Accuracy With 50x Fewer Parameters and <0.5Mb Model Size

  • Efficient networks



카일스쿨 유튜브 채널을 만들었습니다. 데이터 사이언스, 성장, 리더십, BigQuery 등을 이야기할 예정이니, 관심 있으시면 구독 부탁드립니다 :)

이 글이 도움이 되셨다면 추천 클릭을 부탁드립니다 :)

Buy me a coffeeBuy me a coffee

© 2017. by Seongyun Byeon

Powered by zzsza