딥모닝 5주차. PR12-022~025

PR12 동영상을 하루에 1개씩 보는 “딥모닝” 스터디에서 본 영상을 정리하는 글입니다
PR-022: InfoGAN
PR-023: YOLO9000: Better, Faster, Stronger
PR-024: Pixel Recurrent Neural Network
PR-025: Learning with side information through modality hallucination
PR-026는 생략

PR-022: InfoGAN

선정 이유
- Cool idea
- Cool 수학적 백그라운드
- 결과도 좋음
- Cool 구현체
- 다른 paper들과 연결

GAN

Generator : noise
Discriminator : fake 이미지 구분

Information theory

Entropy
- 확률분포 p를 최적의 coding scheme으로 coding했을 때 필요한 bit의 수
  - p가 얼마나 무질서하게 퍼져있는지
Cross Entropy
- true distribution p에 대한 coding scheme으로 unnatural distribution q를 coding했을 때 필요한 bit의 수
Mutual information
- X를 알 때 Y의 entropy가 어떻게 변할까?
- 두 확률분포가 서로 얼마나 의존하고 있을까?
- 서로 독립이면 \(I(X; Y) = 0\)

InfoGAN

Motivation
- GAN을 학습하면 representation이 어떻게 구성되어있는지 알 수 없음(disentangled) => 활용하기 힘듬
- 유의미한 확률 분포를 나타내도록 뽑아낼 수 있지 않을까?
Main idea
- GAN처럼 G, D를 학습하되 latent code c~P(c)에 대해서 c와 G(z,c)가 의존 관계에 놓이도록 학습하자!
Variational Mutual Information Maximization
- Q(c x)를 가져와 근사
- Expectation을 처리하기 위해 Lemma 5.1 사용
Lemma 5.1
- Random variables X, Y and function f(x,y) under suitable regularity conditions
- Reconstruction loss!
Disco GAN 자료
- reconstructure loss가 발생해서 oscillation

PR-023: YOLO9000: Better, Faster, Stronger

Better (YOLOv2)
- Batch normalization
- High resolution classifier
- Convolution with anchor boxes
- Dimension clusters
- Direct location prediction
- Fine-grained features
- Multi-scale training
Faster (YOLOv2)
- Darknet-19
- Training for classification
- Training for detection
Stronger (YOLO9000)
- Hierarchical classification
- Dataset combination with Wort-tree
- Joint classification and detection

Introdcution & Motivation

Detection은 class 개수가 너무 작음
Classification만큼 올리고 싶은데 힘든 이유가 labeling이 어렵기 때문
- 따라서 Detection과 Classification 데이터를 잘 짬뽕해서 만들려고 함
Joint training algorithm

YOLO

성능이 떨어지는 편
Recall이 낮음

Better

Batch Normalization 사용
High Resolution Classifier
- 448x448 fine tuning
Convolution with Anchor Boxes
- 앵커 박스 도입
- 416x416 : 홀수, 13x13이면 이미지의 가운데!
- 모든 앵커박스마다 class, objectness를 예측
Dimension Clusters
- 어떤 anchor 박스가 좋을까?
- IoU가 제일 큰 곳으로 k means
Direction Location Prediction
- 이미지 전체로 이동할 수 있음
Fine-Grained Features
- 13x13 feature maps
Multi-Scale Training

Faster

Darknet-19
- Mostly 3x3 filter
- Global average pooling 사용
  - 파라미터 획기적으로 줄일 수 있음
Training for classification
- ImageNet 100 class
- 448x448 fine-tuning
Training for detection
- Adding 3x3 conv layer
- 5 boxese with 5 coordinates each and 20 classes per box

Stronger

YOLO 9000
Hierarchical classification
- ImageNEt labels are pulled from WordNet
- structured as a directed graph, not a tree
- conditional probabilities를 예측
Join Classification
- 9000 classes from the full ImageNet release
- 앵커박스 3개
- classification loss만 backpropagation
Dataset Combination with WordTree
- COCO : general concepts
- ImageNet : specific concepts

결론

YOLOv2는 빠르고 정확함
YOLO9000은 데이터셋을 합치는 것이 의의 있음

PR-024: Pixel Recurrent Neural Network

Generative Model의 한 종류
이미지 픽셀을 Recurrent Neural network로!

Generative Model

Generative Model도 다양한 계보가 있음!
- Data distribution을 어떻게 모사할까?가 Pixel RNN쪽

Intuition

픽셀이 많아지면 이들의 관계를 어떻게 모델링할 것인가?
Sequential Model로 Generate!
- 픽셀 하나가 주어졌을 때, 다음 픽셀을 예측
- i에서 i-1 … 1까지 픽셀을 사용해 원하는 p(x)를 구함
Autoregressive Model
- 자기 자신을 두고 자기 자신을 나타내는 모델
- 단순하고 직관적, 학습이 잘됨
- latent value가 없음
- tractable likelihood

Mask

미래의 픽셀은 정보를 받지 않기 위해, 계산을 할 때 마스크를 씀
미래 픽셀은 0
채널이 여러개일 경우
- R부터 시작
- Mask A, B

Receptive Fields

Architecture

Residual 붙여서 더 깊게!
Row LSTM
Diagonal LSTM

PixelCNN

LSTM이 너무 느리니.. (성능은 좋지만) 개선!
CNN으로 레이어를 여러번 쌓음
256 classes

KL Divergence

데이터를 제대로 이해하고 있으면 compress할 때 완벽하게 알고 있기 때문에 추가적인 정보가 필요가 없음
- compress할 때 얼마나 데이터가 필요한가로 설명할 수 있음
Negative Likelihood

PR-025: Learning with side information through modality hallucination

hallucination : 환각, 복사
Learning with side information
- multi modal
- RGB, Depth 등등 다양한 것을 input
- Test를 위해서도 똑같이 필요한데 이게 문제..

Missing Input during test

Zero padding
- 잘 안될 수 있음
Generate해서 RGB를 동시에 넣음
- Generate가 어려움
Feature Space를 hallucination

Hallucination

빨간색을 카피할 수 있는 파란색

Loss function

Several issues

First train the RGB & D-Net, copy the D Nte to H-Net
Pool5 layer가 hallucinate

카일스쿨 유튜브 채널을 만들었습니다. 데이터 분석, 커리어에 대한 내용을 공유드릴 예정입니다.

PM을 위한 데이터 리터러시 강의를 만들었습니다. 문제 정의, 지표, 실험 설계, 문화 만들기, 로그 설계, 회고 등을 담은 강의입니다

이 글이 도움이 되셨거나 의견이 있으시면 댓글 남겨주셔요.

Buy me a coffee

PR-022: InfoGAN

GAN

Information theory

InfoGAN

PR-023: YOLO9000: Better, Faster, Stronger

Introdcution & Motivation

YOLO

Better

Faster

Stronger

결론

PR-024: Pixel Recurrent Neural Network

Generative Model

Intuition

Mask

Receptive Fields

Architecture

PixelCNN

KL Divergence

PR-025: Learning with side information through modality hallucination

Missing Input during test

Hallucination

Loss function

Several issues

Share this post