Deep learning tutorial (part 1) – Deep Networks (general)

Deep learning tutorial (part 1)

http://videolectures.net/kdd2014_salakhutdinov_deep_learning/?q=Deep%20Learning

http://www.vision.is.tohoku.ac.jp/files/9313/6601/7876/CVIM_tutorial_deep_learning.pdf

  1. Introduction
    • Previous: Data -> Features (Ex. Swift, spectrogram, MFCC, Flux, SCR) -> Learning
    • Question for unsupervised Feature Learning: Can we learn meaningful features from unlabeled data.
  2. Sparse Coding
    • Given a set of input data vectors {x_1, x_2,…,x_N} learn a dictionary of bases {b_1, b_2,…,b_K} such that x_n = a_n1 * b_1 + a_n2 * b_2 + … + a_nK * b_K
    • {a_nk} is sparse (mostly zero)
    • Each data vector is represented as a sparse linear combination of bases.
    • Example: a part of natural image = combination of learn bases
    • Training for cost = Reconstruction Error + Sparsity penalty -> Find min (optimization, convex QP problem)
    • Testing
  3. Autoencoders
    • Feature Representation (binary features z)-> (Decoder, Feed-back generative top-down, Dz) -> Input Image
    • Input Image -> (Encoder, Feed-forward bottom-up, z = sigmoid(Wx)) -> Feature Representation (binary features z)
    • An autoencoder with D inputs, D outputs, and K hidden units with K < D
    • Given an input x, its reconstruction is given by y_j(x,W,D) = Sum_{k=1}^K D_{jk}sigma (Sum_{i=1}^D W_{ki}x_i), j = 1,…,D
    • We can determine the network parameters W and D by minimizing reconstruction error E(W,D) = 1/2 Sum_{n=1}^N || y(x_n,W,D) – x_n ||^2
    • If hidden and output layers are linear, it will learn hidden units that are a linear function of the data and minimize the squared error.
    • With nonlinear hidden units, we have nonlinear generalization of PCA
    • Predictive Sparse Decomposition (Kavukcuoglu et al., 09)
    • Stacked Auto encoders (Greedy Layer-wise Learning): Input x -> (Encoder, Decoder, Sparsity) -> Features -> (Encoder, Decoder, Sparsity) -> Features -> (Encoder, Decoder) -> Class Labels
  4. Introduction to Graphical models
    • Representing dependency structure btw random variables.
    • Two types:
      • Directed (Bayesian networks)
      • Undirected (Markov random fields, Boltzmann machines)
    • The joint distribution defined by the graph is given by the product of a conditional distribution for each node conditioned on its parents
    • Markov random fields
      • Maximum Likelihood Learning
      • MRFs with Latent Variables
  5. Restricted Boltzmann Machines: Learning low-level features
  6. Deep Belief Networks: Learning Part-based Hierarchies

I will show u in details of MRFs, Restricted Boltzmann Machines, Deep Belief Networks in separate post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s