Community Detection in Massive Graphs

06 May 2014

WNCG Ph.D Students Dimitris Papailiopoulos and Yannis Mitliagkas, along with WNCG Professors Alex Dimakis and Constantine Caramanis, have developed an efficient low-rank framework for finding dense components of graphs with billions of connections.

The authors have developed a novel low-rank approximation framework that finds provably good solutions for intractable big-graph problems such as the densest k-subgraph. Their framework operates by solving smaller instances of these problems, appropriately sampled from a low-rank subspace of the graph. Their algorithm comes with novel performance bounds that depend on the graph spectrum. For most real-world graphs these bounds translate to 70%-80% approximation ratios. These guarantees are surprisingly tighter compared to worst-case approximation results, which can only guarantee a 10% approximation ratio even for moderately sized data sets.

A major advantage of their framework is that it runs in nearly linear time, under mild conditions on the graph. Moreover, it is scalable and parallelizable. They illustrate this by implementing it in MapReduce and by scaling out to more than 800 cores on Amazon EC2. This enables us to solve large instances of the densest k-subgraph problem on massive graphs with billions of edges.

For the details see: Paper .

This work was partially supported by NSF grants CCF-1344364, CCF-1344179, DARPA XDATA and research gifts by Google and Docomo.