My main research areas are machine learning / data mining, bioinformatics, information retrieval, web link analysis, and high performance computing. My research are supported by National Science Foundation grants and University of Texas Regents STARS Award. I have published about 200 papers that were cited 12,200 time (google scholar).
Our work on multi-class protein fold prediction is now standard benchmark for protein 3D strucure prediction. We discovered that Principal Component Analysis (PCA) provides the solution to K-means clustering. We proved that nonnegative matrix factorization is equivalent to K-means /spectral clustering. We generalize PCA to 2D Singular Value Decomposition for for dimension reduction of a set of 2D matrices. Our MPH technology/software for integrating multi-component executables on distributed memory architectures are adopted in many state-of-art large scale models for predicting the long-term climate. We developed the vacany tracking algorthm for provably optimal in-place multi-dimensional array index reshuffle .
After placing 27th in China's nation-wide CUSPEA Program, I came to Columbia University in 1981 and earned Ph.D. in Theoretical Physics and Computer Science on building a parallel processor using Intel 80286s and commodity FPUs ( Science, front cover story, March 18, 1988), designing algorithms and doing large scale QCD simulations on it. From 1987 to 1993, I worked at California Institute of Technology on Caltech Hypercubes developing parallel algorithms for Materials Science (see Nature article by Editor John Maddox ) and Computational Biology (see a National Research Council Report ). From 1993 to 1996, I worked at NASA's Jet Propulsion Laboratory on developing algorithms for climate data assimilation (SIAM News, front page, October 1996), sparse matrix linear solvers and parallel graph partitioning. In 1996 I joined Lawrence Berkeley National Laboratory, working on high performance computing, algorithmic R&D for climate models, application benckmarking, giving tutorials on HPF, MPI, etc, and exploring new frontiers ... the magic of matrix for clustering, ordering, ranking, embedding ... bipartite graphs for systemic representation of proteins interaction networks, motifs, domains, complexes, functional modules, pathways ...
I received a Pfister Fellowship at Columbia (1981-83), four Best Paper Awards for climate data assimilation parallel algorithm and supernova detection using support vector machines, a NASA Group Achievement Award at JPL, and two Outstanding Performance Awards at Lawrence Berkeley National Laboratory. I served in review panels for US National Science Foundation, and reviewer for research proposals of National Science Foundations of Ireland, Israel, and Research Grants Council of Hong Kong. I also served for bioinformatics journal, and program committees of leading conferences in data mining, machine learning and bioinformatics. With Prof. Tao Li, we organize annual workshops on data mining using matrices and tensors.
Webpage Ranking. Type in query words on a search engine and you get millions of returned webpages. A challenge is to rank them such that the most informative webpages come on the top. Two popular ranking algorithms are PageRank (Google) and HITS (IBM). We use matrix techniques and proved via closed-form solutions that these elaborate rankings are in fact equivalent to the ranking by indegree (# of inbound hyperlinks), assuming the web is an expected-degree-sequence random graph -- Your webpage must be valuable since it's pointed to by many other links!