Quadratic Maximization Problems
Several optimization problems in machine learning, data mining and graph theory can be expressed as quadratic maximization problems, subject to integrality, positivity, or sparsity constraints. These include Sparse PCA, Densest Subgraph, Nonnegative matrix factorization, MaxCut, Maximum clique and many others. These problems are known to be computationally intractable and, in many cases, hard to approximate. WNCG Profs. Alex Dimakis and Constantine Caramanis, with students Dimitrios Papailioupoulos, Ioannis Mitliagkas and Megasthenis Asteris are developing a novel technique that can solve these problems exactly when the involved quadratic form matrix is positive semidefinite and low rank, even under combinatorial constraints. This is achieved by transforming the low-rank space using hyperspherical coordinates in a method called the spannogram, which allows researchers to handle constraints like integrality or sparsity.
Clearly, real-world data sets rarely produce exactly low-rank matrices. For that reason, WNCG researchers obtain low-rank approximations and performance bounds that depend on the spectral decay of the data matrix eigenvalues. The WNCG team are developing a general framework by combining low-rank approximations with low-rank quadratic optimization. For some problems, the researchers obtain excellent data-dependent bounds and algorithms that outperform the previous state of the art. These papers have been accepted to the highly selective International Conference on Machine Learning (ICML).
Principal Component Analysis (PCA) reduces data dimensionality by projecting it onto principal subspaces spanned by the leading eigenvectors of the sample covariance matrix. PCA is arguably the workhorse of high dimensional analysis (one of the most widely used algorithms with applications ranging from computer vision, document clustering to network anomaly detection). Sparse PCA is a useful variant that offers higher data interpretability. In WNCG's recent work, the team developed their framework and used it to design a novel algorithm for Sparse PCA. For several datasets, the WNCG researchers obtained excellent empirical performance and provable upper bounds that guarantee their objective is close to the unknown optimum.
Given a large graph and a parameter k, WNCG researchers are interested in detecting a small dense subgraph of size k embedded into an unweighted undirected background. The Densest-k-Subgraph (DkS) problem is fundamental for many applications, including graph and cluster analysis, cyber-community detection and computer security. Using this framework, the team developed a novel algorithm with provable approximation guarantees for DkS. Furthermore, the research team implemented a distributed version of our algorithm using the MapReduce framework, scaling up to 800 cores on Amazon EC2. This allowed us to find dense clusters in massive graphs with billions of edges.
Nonnegative Sparse PCA
For a given matrix A, nonnegative sparse PCA requires finding a sparse vector with nonnegative entries x that maximizes the quadratic form xTAx. This is a computationally intractable problem that relates to obtaining an approximate nonnegative matrix factorization (NMF). Such nonnegative factorizations are useful in numerous applications, including air emission control, graph clustering, text mining and hyperspectral data analysis for remote sensing. In a recent work, the WNCG team extended their spannogram methodology into handling non-negativity constraints. This allows researchers to obtain a novel algorithm for NMF and some novel approximation guarantees under spectral constraints.