CS231n 9강. CNN Architectures


Stanfoard CS231n 2017 9강을 요약한 글입니다. 정보 전달보다 자신을 위한 정리 목적이 강한 글입니다! :)

Today

  • Case Study
    • AlexNet
    • VGG
    • GoogLeNet
    • ResNet
  • Also
    • NiN (Network in Network)
    • DenseNet
    • Wide ResNet
    • FractalNet
    • ResNeXT
    • SqueezeNet
    • Stochastic Depth

LeNet

  • ConvNet의 최초 도입
  • 우편 번호(zip code), 숫자(digit)에 사용

    AlexNet

  • First large scale ConvNet
  • ILSVR’12 winner
  • Input : 227x227x3 images
  • First layer(CONV1)
    • 96 11x11 filters applied at stride 4
    • output size?
      • 96 55x55
    • total number of parameters?
      • (11x11x3)x96 = 35K
  • Second layer(POOL1)
    • 3x3 filters applied at stride 2
    • output size?
      • 96 27x27
    • total number of paramters?
      • no parameter
      • parameters are the weights that we’re trying to learn. and so convolutional layers have weights that we learn but pooling all we do is have a rule, we look at the pooling region, we take max. so there’s no parameters that are learned
      • CONV/FC는 parameter가 있고 RELU/POOL 등은 parameter가 없음
  • Details
    • first use of ReLU
    • used Norm layers (not common anymore)
    • heavy data augmentation (flipping, jittering, cropping, color normalization …)
    • dropout 0.5
    • batch size 128
    • SGD Momentum 0.9
    • Learning rate : 1e-2, reduced by 10 manually when val accuracy plateaus
    • L2 weight decay : 5e-4
    • 7 CNN ensemble: 18.2% -> 15.4%
  • 앞 부분 ConvNet이 2개로 나뉜 이유
    • GTX 580으로 학습해서 2개의 GPU를 사용했었음
  • ImageNet winners

ZFNet

  • Improved hyperparameters

VGGNet

  • much deeper networks, much smaller filters
  • 레이어의 수를 16~19개까지 늘림(기존 AlexNet은 8개)
  • 3x3 CONV stride 1, pad 1
  • 2x2 MAX POOL stride 2
  • Q1) Why use smaller filters? (3x3 conv)
    • A1) 3x3 conv(stride 1) layer는 7x7 conv layer와 같은 effective receptive field를 가짐
    • Q2) What is the effective receptive field of three 3x3 conv (stride 1) layers?
    • 작은 필터를 사용하면 파라미터수를 줄일 수가 있고, 여러번 겹쳐서 사용하면 더 큰 필터가 표현하는 영역(Receptive Field)을 표현할 수 있게 됩니다
    • 3x3 2번을 겹치면 5x5 영역에 대한 특징을 뽑을 수 있고, 3번 겹쳐서 사용하면 7x7 영역에 대한 특징을 뽑아낼 수 있음
      • 3x(3x3xC(인풋의 채널 수)) = 27C
      • 7x7xC=49C
      • 파라미터 수는 더 줄이고 망은 더 깊어지는 효과(More nonlinearity)
  • Details
    • ILSVRC’14 2nd in classification, 1st in localization
      • localization : 어디에 물체가 있는지(Bounding Box) + Classification
    • Similar training procedure as Krizhevsky 2012
    • No Local Response Normalisation (LRN)
    • Use VGG16 or VGG19 (VGG19 only slightly better, more memory)
    • Use ensembles for best results
    • FC7 features generalize well to other tasks
      • Featrue Representation!
  • depth의 2가지
    • depth : width x height x depth할 때의 depth
    • depth : total number of layers

GoogLeNet

  • much deeper networks with computational efficiency (22 layers)
  • 효율적인 Inception Module
  • No FC layers
  • 5 million parameters(12x less than AlexNet)
  • Inception Module
    • good local network typology(network within a network), stack each other
    • 여러 filter 연산을 parallel하게 진행한 후 concat(depth wise)
    • Q) What is the problem with this?
      • expensive compute
      • Solution : “bottleneck” layers that use 1x1 convolutions to reduce feature depth
    • 1x1 conv로 depth를 줄임
  • Google Inception Model 참고. v1이 GoogLeNet. v4까지 정리되어 있습니다

ResNet

  • very deep networks using residual connections
  • 152 layers
  • classification / detetection
  • What happens when we continue stacking deeper layers on a “plain” convolutional neural network?
  • 가설 : problem은 optimization 문제! deeper model이 optimize되긴 어려움
    • Deep한 모델이 shallower한 모델보다 성능이 좋아야 함
    • A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.
  • Residual : 이전 몇 단계 전 레이어의 결과를 현재 레이어의 결과와 합쳐 내보내는 것
  • Full ResNet architecture
  • bottleneck layer
    • 효율성 증대를 위해 사용
    • GoogLeNet과 유사
  • Training ResNet in practice
    • Batch Normalization after every CONV layer
    • Xavier/2 initialization from He et al.
    • SGD + Momentum (0.9)
    • Learning rate: 0.1, divided by 10 when validation error plateaus
    • Mini-batch size 256
    • Weight decay of 1e-5
    • No dropout used
  • Result
    • 사람보다 나은 performance를 보여줌
  • Comparing complexity
    • inception-v4 : ResNet + Inception
    • VGG : Highest memory, most operations
    • GoogLeNet : most efficient
    • AlexNet : Smaller compute, still memory heavy, lower accuracy
    • ResNet : Moderate efficieny depending on model, highest accuracy
  • Time and power consumption

Other architectures

Network in Network (NiN)

Identity Mappings in Deep Residual Networks

  • Improving ResNets

Wide Residual Networks

  • Improving ResNets

ResNeXt(Aggregated Residual Transformations for Deep Neural Networks)

  • Improving ResNets

Deep Networks with Stochastic Depth

  • Improving ResNets

FractalNet: Ultra-Deep Neural Networks without Residuals

  • Beyond ResNet

Densely Connected Convolutional Networks

  • Beyond ResNet

SqueezeNet: AlexNet-level Accuracy With 50x Fewer Parameters and <0.5Mb Model Size

  • Efficient networks

Summary

Reference


카일스쿨 유튜브 채널을 만들었습니다. 데이터 분석, 커리어에 대한 내용을 공유드릴 예정입니다.

PM을 위한 데이터 리터러시 강의를 만들었습니다. 문제 정의, 지표, 실험 설계, 문화 만들기, 로그 설계, 회고 등을 담은 강의입니다

이 글이 도움이 되셨거나 의견이 있으시면 댓글 남겨주셔요.

Buy me a coffeeBuy me a coffee





© 2017. by Seongyun Byeon

Powered by zzsza