Mixed Regression: Disentangling Mixed Data

07 Feb 2014

In two recent papers, Caramanis, Chen, Sanghavi and Yi obtain the best known statistical and computational complexity bounds for mixed regression. 

Mixture models carry much explanatory power, and are natural modeling tools: rather than asking for a single model to explain all observations, they treat observed data as a superposition of simple statistical processes. Due to the wide applicability and naturalness of this modeling approach, their popularity extends across many application areas and domains, including health-care, object recognition, and natural language processing. Yet the inherently combinatorial nature of the mixture -- the assumption that one subset of data come from one model, and another subset from another -- presents significant algorithmic challenges in learning. Essentially the core of the challenge is that clustering and fitting must be performed simultaneously. 

In two recent papers, WNCG faculty Constantine Caramanis and Sujay Sanghavi, in collaboration with Xinyang Yi, Yudong Chen, provide efficient algorithms that give the best known statistical and computational complexity bounds for this problem. In the first paper, we use alternating minimization, essentially showing that the EM algorithm has fast convergence. In the second, we use convex optimization techniques to derive an efficient algorith for mixed regression; we also obtain minimax optimal rates.

This research was partiall funded by the National Science Foundation (NSF) and the Defense Threat Reduction Agency (DTRA).