(See related tutorial on
Principal Component Analysis and Matrix Factorizations for Learning
)
Spectral Clustering
Tutorial slides for Part I (pdf file)
Tutorial slides for Part II (pdf file)
Summary.
Spectral methods recently emerge as effective methods
for data clustering, image segmentation, Web ranking
analysis and dimension reduction.
They start with well-motivated objective functions;
optimization eventually leads to eigenvectors, with
many clear and interesting algebraic properties.
At the core of spectral clustering is the Laplacian of the
graph adjacency (pairwise similarity) matrix, evolved from
spectral graph partitioning.
This tutorial provides a survey of recent advances
after brief historical developments.
Mathematical proofs will be outlined and examples in
gene expresions and internet newsgroups will given to illustrate
the ideas and results.
Tutorial contents:
Part I. Basic Algorithms (90 min)
-
Spectral graph partitioning.
What is spectral relaxation? How it relates to Graph Laplacian.
Properties of the Laplacian.
(30 min)
-
Spectral 2-way clustering.
Clustering objective functions: Ratio cut, Normalized cut, Min-max cut.
Cluster balance analysis. Random graphs.
(30min)
-
Extension to Bipartite graphs.
Simultaneous clustering of rows and columns of contingency table
such as word-document matrix.
Extension to directed graphs.
(15min)
-
Spectral relaxation of multi-way clusterings.
Lower bounds.
(15min)
Part II. Advanced Topics (90 min)
-
Spectral embedding. Simplex cluster structure.
Perturbation analysis.
(20min)
-
K-means clustering in eigenspace.
Equivalence of K-means clustering and PCA
(15min)
-
Connectivity network.
Scaled PCA. Green's function.
Other projection methods.
Extension to bipartite graphs.
Correspondence Anslysis.
(25min)
-
Random walks.
Semi-definite programming.
(10min)
-
Spectral ordering (distance sensitive oredering)
(10min)
-
Spectral web ranking: PageRank and HITS.
Closed-form solutions.
(10min)
Audience.
Spectral clustering is not just effective data clustering algorithms.
Its has a rich structure with
interesting properties and deep connections to
principal component analysis, Hopfield networks,
and semidefinite programming.
Intended audience are both people interested in clustering and
people interested in theoretical developments of
unsupervised learning and dimension reduction.
Prerequisites.
Basic matrix algebra at the level of
G. Strang,
Introduction to Linear Algebra;
G. Golub and C.V. Loan, Matrix Computation.
Brief Introduction.
Spectral clustering has its origin in
spectral graph partitioning (Fiedler 1973; Donath & Hoffman 1972),
a popular algorithm in high performance computing
(Pothen, Simon & Liou, 1990). This led to Ratio-cut clustering
(Hagen & Kahng, 92; Chan, Schlag & Zien, 1994).
These algorithms use eigenvectors of the Laplacian of the
graph adjacency (pairwise similarity) matrix.
Recent work on Normalized-cut (Shi & Malik, 2000)
uses the eigenvector of the generalized/normalized Laplacian
(Chung, 1997) and brought renewed interest in the topic.
This has been extended to bipartite graphs for
simulataneous clustering of rows and columns of contingency table
such as word-document matrix (Zha et al,2001; Dhillon,2001).
Many new properties have been recently proved, such as
random walks (Meila & Shi, 2001),
self-aggregation (Ding et al, 2002),
semidefinite relaxation (Xing & Jordan, 2003), and
perturbation analysis (Ding et al,2002).
Standard spectral clustering deals with 2-way clustering.
Multi-way clustering methods are also proposed
(Ng, Jordan & Weiss, 2001; Ding et al, 2002; Xu & Shi, 2003) and
multi-way spectral relaxation and lower bounds (Gu et al, 2001).
The widely used K-means clustering
is shown recently (Zha,et al 2001; Ding & He, 2004)
to be directly related to PCA:
(a) the solution for cluster membership indicators
in K-means clustering
are given by PCA components, eigenvectors of the Gram
(inner-product kernel) matrix;
(b) PCA subspace is identical to the subspace
spanned by K cluster centroids.
Another popular use of eigenvectors is the webpage ranking algorithms,
such as PageRank (used in Google) and HITS (Kleinberg, 1999),
where closed-form solutions are obtained (Ding, et al, 2001, 2002).
Finally, efficent linear algebra software
for computing eigenvectors are fully developed
and freely available, which will facilitate spectral clustering
on large datasets.
Selected References
-
F.R. Bach and M.I. Jordan.
Learning spectral clustering.
Neural Info. Processing Systems 16 (NIPS 2003), 2003.
-
M. Belkin and P. Niyogi.
Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,
Advances in Neural Information Processing Systems 14 (NIPS 2001), pp:
585-591, MIT Press, Cambridge, 2002.
-
M. Brand and K. Huang.
A unifying theorem for spectral embedding and clustering.
Int'l Workshop on AI & Stat (AI-STAT 2003) 2003.
-
S. Brin and L. Page.
The anatomy of a large-scale hypertextual web search engine.
Proc. of 7th WWW Conferece, 1998.
-
P.K. Chan, M.Schlag, and J.Y. Zien.
Spectral k-way ratio-cut partitioning and clustering.
IEEE Trans. CAD-Integrated Circuits and Systems, 13:1088--1096,
-
F.R.K. Chung.
Spectral Graph Theory.
Amer. Math. Society Press, 1997.
-
I. S. Dhillon.
Co-clustering documents and words using bipartite spectral graph
partitioning.
Proc. ACM Int'l Conf Knowledge Disc. Data Mining (KDD 2001),
2001.
-
C. Ding, H. Zha, X. He, P. Husbands & H.D. Simon.
Link Analysis: Hubs and Authorities on the World Wide Web.
LBNL Tech Report 47847. 2001. To appear in SIAM Review June 2004.
-
C. Ding, X. He, H. Zha, M. Gu, and H. Simon.
A min-max cut algorithm for graph partitioning and data clustering.
Proc. IEEE Int'l Conf. Data Mining, 2001.
-
C. Ding, X. He, H. Zha, and H. Simon.
Unsupervised learning: self-aggregation in scaled principal component
space.
Proc. 6th European Conf. Principles of Data Mining and Knowledge
Discovery (PDKK 2002), pages 112--124, 2002.
-
C. Ding.
Document Retrieval and Clustering: from Principal Component Analysis
to Self-aggregation Networks.
Int'l Workshop on AI & Stat (AI-STAT 2003) 2003.
-
C. Ding & X. He.
Principal Components and K-means Clustering.
LBNL Tech Report 52983. 2003.
-
W.E. Donath and A. J. Hoffman.
Lower bounds for partitioning of graphs.
IBM J. Res. Develop., 17:420--425, 1973.
-
M. Fiedler.
Algebraic connectivity of graphs.
Czech. Math. J., 23:298--305, 1973.
-
M. Fiedler.
A property of eigenvectors of non-negative symmetric matrices and its
application to graph theory.
Czech. Math. J., 25:619--633, 1975.
-
M. Gu, H. Zha, C. Ding, X. He, and H. Simon.
Spectral relaxation models and structure analysis for k-way graph
Clustering and bi-clustering.
Penn State Univ Tech Report CSE-01-007, 2001.
-
L. Hagen and A.B. Kahng.
New spectral methods for ratio cut partitioning and clustering.
IEEE. Trans. on Computed Aided Desgin, 11:1074--1085, 1992.
-
S.D. Kamvar, D. Klein, and C.D. Manning, Spectral Learning,
IJCAI-03, 2003.
-
J.M. Kleinberg.
Authoritative sources in a hyperlinked environment.
J. ACM}, 48:604--632, 1999.
-
M. Meila and J. Shi.
A random walks view of spectral segmentation.
Int'l Workshop on AI & Stat (AI-STAT 2001).
-
M. Meila and L. Xu.
Multiway cuts and spectral clustering.
U. Washington Tech Report, 2003.
-
A.Y. Ng, M.I. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm.
Proc. Neural Info. Processing Systems (NIPS 2001), 2001.
-
A. Pothen, H. D. Simon, and K. P. Liou.
Partitioning sparse matrices with egenvectors of graph.
SIAM Journal of Matrix Anal. Appl., 11:430--452, 1990.
-
J. Shi and J. Malik.
Normalized cuts and image segmentation.
IEEE. Trans. on Pattern Analysis and Machine Intelligence,
22:888--905, 2000.
-
E.P. Xing and M.I. Jordan.
On semidefinite relaxation for normalized k-cut and
connections to spectral clustering.
Tech Report CSD-03-1265, UC Berkeley, 2003.
-
S.X. Yu and J. Shi.
Multiclass spectral clustering.
Int'l Conf. on Computer Vision, 2003.
-
H. Zha, C. Ding, M. Gu, X. He, and H.D. Simon.
Spectral relaxation for K-means clustering.
Advances in Neural Information Processing Systems 14
(NIPS 2001). pp. 1057-1064, Vancouver, Canada.
Dec. 2001.
-
H. Zha, X. He, C. Ding, M. Gu & H. Simon.
Bipartite Graph Partitioning and Data Clustering,
Proc. of ACM 10th Int'l Conf. Information and Knowledge Management (CIKM 2001), pp.25-31,
2001, Atlanta.
-
Y. Zhao and G. Karypis.
Criterion functions for document clustering: Experiments and analysis.
Univ. Minnesota, CS Dept. Tech Report 01-40, 2001.
Presenter biography.
Chris Ding is a staff computer scientist at
Lawrence Berkeley National Laboratory.
He started work on mesh/graph partitioning used spectral methods
since 1995 and
has been working extensively on spectral clustering:
min-max cut, spectral relaxation on multi-way cuts and
bounds, extension to bipartite graphs,
K-means relaxation, and perturbation analysis;
Latent Semantic Indexing in IR
and web ranking algorithms using spectral methods,
with about 15 publications in this area.
This tutorial grows out of his research experiences
in this area.