WNCG - Wireless Networking and Communications Group - Statistics
http://wncg.org/tags/statistics
enCommunity Detection in Massive Graphs
http://wncg.org/research/briefs/community-detection-massive-graphs
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Tuesday, May 6, 2014</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>WNCG Ph.D Students Dimitris Papailiopoulos and Yannis Mitliagkas, along with WNCG Professors Alex Dimakis and Constantine Caramanis, have developed an efficient low-rank framework for finding dense components of graphs with billions of connections.</p>
<p>The authors have developed a novel low-rank approximation framework that finds provably good solutions for intractable big-graph problems such as the densest k-subgraph. Their framework operates by solving smaller instances of these problems, appropriately sampled from a low-rank subspace of the graph. Their algorithm comes with novel performance bounds that depend on the graph spectrum. For most real-world graphs these bounds translate to 70%-80% approximation ratios. These guarantees are surprisingly tighter compared to worst-case approximation results, which can only guarantee a 10% approximation ratio even for moderately sized data sets.</p>
<p>A major advantage of their framework is that it runs in nearly linear time, under mild conditions on the graph. Moreover, it is scalable and parallelizable. They illustrate this by implementing it in MapReduce and by scaling out to more than 800 cores on Amazon EC2. This enables us to solve large instances of the densest k-subgraph problem on massive graphs with billions of edges. </p>
<p>For the details see: <a href="https://webspace.utexas.edu/dp26726/papers/DkS_long.pdf"> Paper </a>.</p>
<p>This work was partially supported by NSF grants CCF-1344364, CCF-1344179, DARPA XDATA and research gifts by Google and Docomo.</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div><div class="field-item odd"><a href="/people/faculty/alex-dimakis">Alex Dimakis</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/ioannis-mitliagkas">Ioannis Mitliagkas</a></div><div class="field-item odd"><a href="/people/students/dimitris-papailiopoulos">Dimitris Papailiopoulos</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/ml">ML</a>, <a href="/tags/graph-algorithms">graph algorithms</a>, <a href="/tags/machine-learning">Machine Learning</a>, <a href="/tags/statistics">Statistics</a>, <a href="/tags/computation">Computation</a>, <a href="/tags/mapreduce">MapReduce</a></div></div></div>Tue, 06 May 2014 15:14:42 +0000cc333383446 at http://wncg.orghttp://wncg.org/research/briefs/community-detection-massive-graphs#commentsDetecting Epidemics in Networks
http://wncg.org/research/briefs/detecting-epidemics-networks
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Saturday, March 22, 2014</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>WNCG Ph.D. student Chris Milling, along with WNCG Professors Constantine Caramanis and Sanjay Shakkottai, and Technion Professor Shie Mannor, have developed efficient algorithms for quickly and efficiently determining if an epidemic is spreading through a social network.</p>
<p>The history of infections and epidemics holds famous examples where understanding, containing and ultimately treating an outbreak began with understanding its mode of spread. Influenza, HIV and most computer viruses, spread person to person, device to device, through contact networks; Cholera, Cancer, and seasonal allergies, on the other hand, do not. In this paper we study two fundamental questions of detection: first, given a snapshot view of a (perhaps vanishingly small) fraction of those infected, under what conditions is an epidemic spreading via contact (e.g., Influenza), distinguishable from a "random illness" operating independently of any contact network (e.g., seasonal allergies); second, if we do have an epidemic, under what conditions is it possible to determine which network of interactions is the main cause of the spread -- the <em> causative network</em> -- without any knowledge of the epidemic, other than the identity of a minuscule subsample of infected nodes? The core, therefore, of this paper, is to obtain an understanding of the <em>diagnostic power of network information</em>. We derive sufficient conditions networks must satisfy for these problems to be identifiable, and produce efficient, highly scalable algorithms that solve these problems. We show that the identifiability condition we give is fairly mild, and in particular, is satisfied by two common graph topologies: the grid, and the Erdos-Renyi graphs. For the details, see:</p>
<ul><li><a href="http://arxiv.org/pdf/1309.6545v1.pdf">Epidemic Detection on Networks</a></li>
</ul><p>This work was partially supported by the National Science Foundation (NSF) and the Defense Threat Reduction Agency (DTRA).</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div><div class="field-item odd"><a href="/people/faculty/sanjay-shakkottai">Sanjay Shakkottai</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/p-chris-milling">P. Chris Milling</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/networks">networks</a>, <a href="/tags/social-networks">social networks</a>, <a href="/tags/machine-learning">Machine Learning</a>, <a href="/tags/statistics">Statistics</a></div></div></div>Sat, 22 Mar 2014 20:22:33 +0000cc333383418 at http://wncg.orghttp://wncg.org/research/briefs/detecting-epidemics-networks#commentsMemory-Limited Learning
http://wncg.org/research/briefs/memory-limited-learning
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Monday, March 3, 2014</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>WNCG Prof. Constantine Caramanis along with Ph.D. student Ioannis Mitliagkas and MSR Bangalore researcher Dr. Prateek Jain, have obtained the first-ever linear-memory algorithm for Principal Component Analysis. Their algorithm is efficient to implement, needs to see each data point only once, and works even in the setting of many missing entries.</p>
<div title="Page 1">
<p>Principal component analysis is a fundamental tool for dimensionality reduction, clustering, classification, and many more learning tasks. It is a basic preprocessing step for learning, recognition, and estimation procedures. The core computational element of PCA is performing a (partial) singular value decomposition, and much work over the last half century has focused on efficient algorithms and hence on computational complexity. The recent focus on understanding high-dimensional data (examples: video or image data, medical or DNA data), where the dimensionality of the data scales together with the number of available sample points, has led to an exploration of the sample complexity of covariance estimation. What has not been considered is the memory complexity of PCA algorithms. The only algorithms with known performance guarantees thus far, require O(p<sup>2</sup>) memory, in p dimensions. This can be prohibitive for modern high-dimensional applications.</p>
<p>This work fills precisely this need. We develop an algorithm with O(p) memory requirement (the best possible) and with performance matching state-of-the-art memory-intensive algorithms. Moreover, in followup work, we also develop an algorithm that works even when each data point has suffered a vast number of deletions or erasures. </p>
<ul><li>Paper 1: <a href="http://users.ece.utexas.edu/~cmcaram/pubs/Streaming-PCA.pdf">Memory-Limited Streaming PCA</a></li>
<li>Paper 2: <a href="https://webspace.utexas.edu/im4454/www/kdd2014long.pdf">Streaming PCA with Many Missing Entries</a></li>
</ul><p>This research was partiall funded by the National Science Foundation (NSF) and the Defense Threat Reduction Agency (DTRA).</p>
</div>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/ioannis-mitliagkas">Ioannis Mitliagkas</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/statistics">Statistics</a>, <a href="/tags/machine-learning">Machine Learning</a>, <a href="/tags/optimization">Optimization</a></div></div></div>Mon, 03 Mar 2014 21:24:20 +0000cc333383333 at http://wncg.orghttp://wncg.org/research/briefs/memory-limited-learning#commentsMixed Regression: Disentangling Mixed Data
http://wncg.org/research/briefs/mixed-regression-disentangling-mixed-data
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Friday, February 7, 2014</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>In two recent papers, Caramanis, Chen, Sanghavi and Yi obtain the best known statistical and computational complexity bounds for mixed regression. </p>
<p>Mixture models carry much explanatory power, and are natural modeling tools: rather than asking for a single model to explain all observations, they treat observed data as a superposition of simple statistical processes. Due to the wide applicability and naturalness of this modeling approach, their popularity extends across many application areas and domains, including health-care, object recognition, and natural language processing. Yet the inherently combinatorial nature of the mixture -- the assumption that one subset of data come from one model, and another subset from another -- presents significant algorithmic challenges in learning. Essentially the core of the challenge is that clustering and fitting must be performed simultaneously. </p>
<p>In two recent papers, WNCG faculty Constantine Caramanis and Sujay Sanghavi, in collaboration with Xinyang Yi, Yudong Chen, provide efficient algorithms that give the best known statistical and computational complexity bounds for this problem. In the first paper, we use alternating minimization, essentially showing that the EM algorithm has fast convergence. In the second, we use convex optimization techniques to derive an efficient algorith for mixed regression; we also obtain minimax optimal rates.</p>
<ul><li>Paper 1. <a href="http://arxiv.org/pdf/1310.3745v1.pdf">http://arxiv.org/pdf/1310.3745v1.pdf</a></li>
<li>Paper 2. <a href="http://arxiv.org/pdf/1312.7006.pdf">http://arxiv.org/pdf/1312.7006.pdf</a></li>
</ul><p>This research was partiall funded by the National Science Foundation (NSF) and the Defense Threat Reduction Agency (DTRA).</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div><div class="field-item odd"><a href="/people/faculty/sujay-sanghavi">Sujay Sanghavi</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/xinyang-yi">Xinyang Yi</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/machine-learning">Machine Learning</a>, <a href="/tags/optimization">Optimization</a>, <a href="/tags/statistics">Statistics</a></div></div></div>Thu, 27 Feb 2014 22:23:03 +0000cc333383316 at http://wncg.orghttp://wncg.org/research/briefs/mixed-regression-disentangling-mixed-data#comments