Statseminars Joint Biostatistics / Stat & Data Science Seminar , Speaker Carey E. Priebe, 4/9 @4:15pm-5:30pm

biostatistics / STATISTICS & DATA SCIENCE Joint SEMINAR

Date: Monday, April 9, 2018

Time: 4:15pm – 5:30pm

Place: Yale Institute for Network Science, 17 Hillhouse Avenue, 3rd Floor, Rm 328

Seminar Speaker: Carey E. Priebe

Department of Applied Mathematics & Statistics, Johns Hopkins University

Personal Website: https://www.ams.jhu.edu/~priebe/

Title: On Spectral Graph Clustering

Abstract: Clustering is a many-splendored thing. As the ill-defined cousin of classification, in which the observation to be classified X comes with a true but unobserved class label Y, clustering is concerned with coherently grouping observations without any explicit concept of true groupings. Spectral graph clustering — clustering the vertices of a graph based on their spectral embedding — is all the rage, and recent theoretical results provide new understanding of the problem and solutions. In particular, we reset the field of spectral graph clustering, demonstrating that spectral graph clustering should not be thought of as kmeans clustering composed with Laplacian spectral embedding, but rather Gaussian mixture model (GMM) clustering composed with either Laplacian or Adjacency spectral embedding (LSE or ASE); in the context of the stochastic blockmodel (SBM), we use eigenvector CLTs & Chernoff analysis to show that (1) GMM dominates kmeans and (2) neither LSE nor ASE dominates, and we present an LSE vs ASE characterization in terms of affinity vs core-periphery SBMs. Along the way, we describe our recent asymptotic efficiency results, as well as an interesting twist on the eigenvector CLT when the block connectivity probability matrix is not positive semidefinite. (And, time permitting, we will touch on essential results using the matrix two-to-infinity norm.) We conclude with a ‘Two Truths’ LSE vs ASE spectral graph clustering result — necessarily including model selection for both embedding dimension & number of clusters — convincingly illustrated via an exciting new diffusion MRI connectome data set: different embedding methods yield different clustering results, with one (ASE) capturing gray matter/white matter separation and the other (LSE) capturing left hemisphere/right hemisphere characterization.

4:00 p.m. Pre-talk Refreshments

4:15 p.m. – 5:30 Seminar, Room 328, 17 Hillhouse Avenue

For more details and upcoming events visit our website at
http://statistics.yale.edu/ .

Gerstein Lab Linkstream

Land of the data, home of the links

Statseminars Joint Biostatistics / Stat & Data Science Seminar , Speaker Carey E. Priebe, 4/9 @4:15pm-5:30pm

Leave a Reply Cancel reply