Presentation available online (PDF files):
This tutorial will cover
Principal Component Analysis (PCA),
spectral clustering via
the Laplacian matrix of a graph,
nonnegative matrix factorization (NMF),
other matrix models used in machine learning.
Principal Component Analysis (PCA) and its underlying
Singular Value Decomposition (SVD)
are widely used in machine learning,
image/signal processing, data analysis and many other fields.
This area is going through a Renaissance period
with many new developments:
Kernel PCA, probabilistic PCA, independent component analysis;
Sparsification of PCA using L1 constraints extending LASSO,
truncation and discretize, direct sparsification.
Recent results show that PCA components provides solutions to
K-means clustering, thus
connecting dimension reduction to clustering.
Solutions of Kernel K-means clustering are
given by the eigenvector of the kernel matrix
(or a matrix of pairwise similarities),
whereas
solutions of the spectral clustering normalized cut objective are given by
the generalized eigenvector of the same matrix.
Thus Kernel K-means clustering and
spectral clustering are closely related.
From matrix perspective, PCA/SVD are
matrix factorization (approximations by
lower rank matrices with clear meaning).
Thus K-means and spectral clustering are under this
broad matrix model framework.
Aside from eigenvector based factorizations,
nonnegative matrix factorization (NMF) have
many desirable properties.
It is recognized that NMF provides a continuous nonnegative solution
to the K-means clustering
and also a solution to the spectral clustering.
Besides NMF, other matrix factorization such
as maximum margin factorization, partitioned
columns based factorizations are also useful.
The key defining properties of the matrix model are
orthogonality and nonnegativity.
Enforcing orthogonality while ignoring nonnegativity,
we get spectral clustering.
Ignoring orthogonality while enforing nonnegativity,
we get NMF.
We may also impose
orthogonality and nonnegativity simultaneously.
This leads to orthogonal NMF in NMF side and
indicator matrix quadratic clustering in
spectral clustering side.
Standard SVD/PCA deals with a set of 1D vectors such as
data points in a high dimensional space.
When the input
is a set of 2D objects such as images or weather maps,
2DSVD computes their low-dimensional approximations by
using principal eigenvectors of the row-row and
column-column covariance matrices.
The tutorial starts with the classic PCA
and provides a unified review of these recent developments.
We hope to convey the breadth and
depth of matrix model based approach for machine learning.
Summary.