Spectral Clustering, ICML 2004 Tutorial by Chris Ding

Chris Ding , Comp Sci & Eng Dept, Univ of Texas Arlington

(See related tutorial on Principal Component Analysis and Matrix Factorizations for Learning )

Tutorial given at ICML 2004 ( International Conference on Machine Learning, July 2004, Banff, Alberta, Canada )

Spectral Clustering

Tutorial slides for Part I (pdf file)
Tutorial slides for Part II (pdf file)

Summary. Spectral methods recently emerge as effective methods for data clustering, image segmentation, Web ranking analysis and dimension reduction. They start with well-motivated objective functions; optimization eventually leads to eigenvectors, with many clear and interesting algebraic properties. At the core of spectral clustering is the Laplacian of the graph adjacency (pairwise similarity) matrix, evolved from spectral graph partitioning. This tutorial provides a survey of recent advances after brief historical developments. Mathematical proofs will be outlined and examples in gene expresions and internet newsgroups will given to illustrate the ideas and results.

Tutorial contents:

Part I. Basic Algorithms (90 min)

Spectral graph partitioning. What is spectral relaxation? How it relates to Graph Laplacian. Properties of the Laplacian. (30 min)
Spectral 2-way clustering. Clustering objective functions: Ratio cut, Normalized cut, Min-max cut. Cluster balance analysis. Random graphs. (30min)
Extension to Bipartite graphs. Simultaneous clustering of rows and columns of contingency table such as word-document matrix. Extension to directed graphs. (15min)
Spectral relaxation of multi-way clusterings. Lower bounds. (15min)

Part II. Advanced Topics (90 min)

Spectral embedding. Simplex cluster structure. Perturbation analysis. (20min)
K-means clustering in eigenspace. Equivalence of K-means clustering and PCA (15min)
Connectivity network. Scaled PCA. Green's function. Other projection methods. Extension to bipartite graphs. Correspondence Anslysis. (25min)
Random walks. Semi-definite programming. (10min)
Spectral ordering (distance sensitive oredering) (10min)
Spectral web ranking: PageRank and HITS. Closed-form solutions. (10min)

Audience. Spectral clustering is not just effective data clustering algorithms. Its has a rich structure with interesting properties and deep connections to principal component analysis, Hopfield networks, and semidefinite programming. Intended audience are both people interested in clustering and people interested in theoretical developments of unsupervised learning and dimension reduction.

Prerequisites. Basic matrix algebra at the level of G. Strang, Introduction to Linear Algebra; G. Golub and C.V. Loan, Matrix Computation.

Brief Introduction. Spectral clustering has its origin in spectral graph partitioning (Fiedler 1973; Donath & Hoffman 1972), a popular algorithm in high performance computing (Pothen, Simon & Liou, 1990). This led to Ratio-cut clustering (Hagen & Kahng, 92; Chan, Schlag & Zien, 1994). These algorithms use eigenvectors of the Laplacian of the graph adjacency (pairwise similarity) matrix. Recent work on Normalized-cut (Shi & Malik, 2000) uses the eigenvector of the generalized/normalized Laplacian (Chung, 1997) and brought renewed interest in the topic. This has been extended to bipartite graphs for simulataneous clustering of rows and columns of contingency table such as word-document matrix (Zha et al,2001; Dhillon,2001). Many new properties have been recently proved, such as random walks (Meila & Shi, 2001), self-aggregation (Ding et al, 2002), semidefinite relaxation (Xing & Jordan, 2003), and perturbation analysis (Ding et al,2002). Standard spectral clustering deals with 2-way clustering. Multi-way clustering methods are also proposed (Ng, Jordan & Weiss, 2001; Ding et al, 2002; Xu & Shi, 2003) and multi-way spectral relaxation and lower bounds (Gu et al, 2001). The widely used K-means clustering is shown recently (Zha,et al 2001; Ding & He, 2004) to be directly related to PCA: (a) the solution for cluster membership indicators in K-means clustering are given by PCA components, eigenvectors of the Gram (inner-product kernel) matrix; (b) PCA subspace is identical to the subspace spanned by K cluster centroids. Another popular use of eigenvectors is the webpage ranking algorithms, such as PageRank (used in Google) and HITS (Kleinberg, 1999), where closed-form solutions are obtained (Ding, et al, 2001, 2002). Finally, efficent linear algebra software for computing eigenvectors are fully developed and freely available, which will facilitate spectral clustering on large datasets.

Selected References

F.R. Bach and M.I. Jordan. Learning spectral clustering. Neural Info. Processing Systems 16 (NIPS 2003), 2003.
M. Belkin and P. Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing Systems 14 (NIPS 2001), pp: 585-591, MIT Press, Cambridge, 2002.
M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. Int'l Workshop on AI & Stat (AI-STAT 2003) 2003.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Proc. of 7th WWW Conferece, 1998.
P.K. Chan, M.Schlag, and J.Y. Zien. Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. CAD-Integrated Circuits and Systems, 13:1088--1096,
F.R.K. Chung. Spectral Graph Theory. Amer. Math. Society Press, 1997.
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. Proc. ACM Int'l Conf Knowledge Disc. Data Mining (KDD 2001), 2001.
C. Ding, H. Zha, X. He, P. Husbands & H.D. Simon. Link Analysis: Hubs and Authorities on the World Wide Web. LBNL Tech Report 47847. 2001. To appear in SIAM Review June 2004.
C. Ding, X. He, H. Zha, M. Gu, and H. Simon. A min-max cut algorithm for graph partitioning and data clustering. Proc. IEEE Int'l Conf. Data Mining, 2001.
C. Ding, X. He, H. Zha, and H. Simon. Unsupervised learning: self-aggregation in scaled principal component space. Proc. 6th European Conf. Principles of Data Mining and Knowledge Discovery (PDKK 2002), pages 112--124, 2002.
C. Ding. Document Retrieval and Clustering: from Principal Component Analysis to Self-aggregation Networks. Int'l Workshop on AI & Stat (AI-STAT 2003) 2003.
C. Ding & X. He. Principal Components and K-means Clustering. LBNL Tech Report 52983. 2003.
W.E. Donath and A. J. Hoffman. Lower bounds for partitioning of graphs. IBM J. Res. Develop., 17:420--425, 1973.
M. Fiedler. Algebraic connectivity of graphs. Czech. Math. J., 23:298--305, 1973.
M. Fiedler. A property of eigenvectors of non-negative symmetric matrices and its application to graph theory. Czech. Math. J., 25:619--633, 1975.
M. Gu, H. Zha, C. Ding, X. He, and H. Simon. Spectral relaxation models and structure analysis for k-way graph Clustering and bi-clustering. Penn State Univ Tech Report CSE-01-007, 2001.
L. Hagen and A.B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE. Trans. on Computed Aided Desgin, 11:1074--1085, 1992.
S.D. Kamvar, D. Klein, and C.D. Manning, Spectral Learning, IJCAI-03, 2003.
J.M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM}, 48:604--632, 1999.
M. Meila and J. Shi. A random walks view of spectral segmentation. Int'l Workshop on AI & Stat (AI-STAT 2001).
M. Meila and L. Xu. Multiway cuts and spectral clustering. U. Washington Tech Report, 2003.
A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001), 2001.
A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparse matrices with egenvectors of graph. SIAM Journal of Matrix Anal. Appl., 11:430--452, 1990.
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence, 22:888--905, 2000.
E.P. Xing and M.I. Jordan. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Tech Report CSD-03-1265, UC Berkeley, 2003.
S.X. Yu and J. Shi. Multiclass spectral clustering. Int'l Conf. on Computer Vision, 2003.
H. Zha, C. Ding, M. Gu, X. He, and H.D. Simon. Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS 2001). pp. 1057-1064, Vancouver, Canada. Dec. 2001.
H. Zha, X. He, C. Ding, M. Gu & H. Simon. Bipartite Graph Partitioning and Data Clustering, Proc. of ACM 10th Int'l Conf. Information and Knowledge Management (CIKM 2001), pp.25-31, 2001, Atlanta.
Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Univ. Minnesota, CS Dept. Tech Report 01-40, 2001.

Presenter biography. Chris Ding is a staff computer scientist at Lawrence Berkeley National Laboratory. He started work on mesh/graph partitioning used spectral methods since 1995 and has been working extensively on spectral clustering: min-max cut, spectral relaxation on multi-way cuts and bounds, extension to bipartite graphs, K-means relaxation, and perturbation analysis; Latent Semantic Indexing in IR and web ranking algorithms using spectral methods, with about 15 publications in this area. This tutorial grows out of his research experiences in this area.