Social networks today face several fundamental problems. The research of WNCG Profs. Sujay Sanghavi and Sanjay Shakkottai and student collaborators sift through the information overload to bring sense to large-scale social networks in three primary ways.
The first problem in social networks involves finding communities that share denser connections between themselves, as compared to the density of connections across the network. WNCG Prof. Sujay Sanghavi and collaborators Yudong Chen and Huan Xu develop a convex version of the maximum-likelihood algorithm, and show that it outperforms a long list of methods for this classic problem. The research team demonstrates that, in the classic stochastic block model setting, this algorithm outperforms all existing methods by polynomial factors. In fact, it is within logarithmic factors of known lower bounds for spectral methods, and there is evidence suggesting that no polynomial time algorithm would do significantly better. We then show that this guarantee carries over to a more general semi-random extension of the stochastic block model; our method can handle the settings of semi-random graphs, heterogeneous degree distributions, unequal cluster sizes, outlier nodes, planted k-cliques, planted coloring etc.
Paper: Clustering Sparse Graphs
The second problem with social networks is that the social graph between people may not often be explicit and needs to be learned from observations. Prof. Sujay Sanghavi and student Praneeth Netrapalli look at the problem of finding the graph on which an epidemic cascade spreads, given only the times when each node gets infected. While this is a problem of importance in several contexts -- offline and online social networks, e-commerce, epidemiology, vulnerabilities in infrastructure networks -- there has been very little work, analytical or empirical, on finding the graph.
For the classic independent cascade SIR epidemics, the researchers analytically establish the number of cascades required by both the global maximum-likelihood (ML) estimator, and a natural greedy algorithm. Both results are based on a key observation: the global graph learning problem decouples into n local problems -- one for each node. For a node of degree d, the WNCG research teams shows that its neighborhood can be reliably found once it has been infected O(d2logn) times (for ML on general graphs) or O(dlogn) times (for greedy on trees). They also provide a corresponding information-theoretic lower bound of Ω(dlogn); thus their bounds are essentially tight. Furthermore, if given side information in the form of a super-graph of the actual graph, then the number of cascade samples required -- in all cases -- becomes independent of the network size n. Finally, the researchers show that for a very general SIR epidemic cascade model, the Markov graph of infection times is obtained via the moralization of the network graph.
Finding the topics and labels of internet content is a problem of growing importance. It is particularly challenging for image and video media that do not have easily recognizable features. WNCG Profs. Sujay Sanghavi and Sanjay Shakkottai and student Avik Ray provide a new algorithm that demonstrates how to infer content topics based on the way social network users interact with the content. The researchers study its theoretical performance and demonstrate its empirical effectiveness over standard topic modeling algorithms.
WNCG website research feature: Sifting Through Social Noise