Deep learning tutorial (part 1)

http://videolectures.net/kdd2014_salakhutdinov_deep_learning/?q=Deep%20Learning

http://www.vision.is.tohoku.ac.jp/files/9313/6601/7876/CVIM_tutorial_deep_learning.pdf

**Introduction**
- Previous: Data -> Features (Ex. Swift, spectrogram, MFCC, Flux, SCR) -> Learning
- Question for unsupervised Feature Learning: Can we learn meaningful features from unlabeled data.

**Sparse Coding**
- Given a set of input data vectors {x_1, x_2,…,x_N} learn a dictionary of bases {b_1, b_2,…,b_K} such that x_n = a_n1 * b_1 + a_n2 * b_2 + … + a_nK * b_K
- {a_nk} is sparse (mostly zero)
- Each data vector is represented as a sparse linear combination of bases.
- Example: a part of natural image = combination of learn bases
- Training for cost = Reconstruction Error + Sparsity penalty -> Find min (optimization, convex QP problem)
- Testing

**Autoencoders**
- Feature Representation (binary features z)-> (Decoder, Feed-back generative top-down, Dz) -> Input Image
- Input Image -> (Encoder, Feed-forward bottom-up, z = sigmoid(Wx)) -> Feature Representation (binary features z)
- An autoencoder with D inputs, D outputs, and K hidden units with K < D
- Given an input x, its reconstruction is given by y_j(x,W,D) = Sum_{k=1}^K D_{jk}sigma (Sum_{i=1}^D W_{ki}x_i), j = 1,…,D
- We can determine the network parameters W and D by minimizing reconstruction error E(W,D) = 1/2 Sum_{n=1}^N || y(x_n,W,D) – x_n ||^2
- If
**hidden and output layers are linear**, it will learn hidden units that are a linear function of the data and minimize the squared error.
- With nonlinear hidden units, we have nonlinear generalization of PCA
- Predictive Sparse Decomposition (Kavukcuoglu et al., 09)
- Stacked Auto encoders (Greedy Layer-wise Learning): Input x -> (Encoder, Decoder, Sparsity) -> Features -> (Encoder, Decoder, Sparsity) -> Features -> (Encoder, Decoder) -> Class Labels

- Introduction to Graphical models
- Representing dependency structure btw random variables.
- Two types:
- Directed (Bayesian networks)
- Undirected (Markov random fields, Boltzmann machines)

- The joint distribution defined by the graph is given by
** the product of a conditional distribution for each node conditioned on its parents**
- Markov random fields
- Maximum Likelihood Learning
- MRFs with Latent Variables

- Restricted Boltzmann Machines: Learning low-level features
- Deep Belief Networks: Learning Part-based Hierarchies

I will show u in details of MRFs, Restricted Boltzmann Machines, Deep Belief Networks in separate post.

### Like this:

Like Loading...

*Related*