Abstract: Given samples from an unknown distribution, p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C, by at least ε in total variation distance? This fundamental question has received substantial attention in Statistics and Computer Science. Nevertheless, even for basic classes of distributions such as monotone, log-concave, unimodal, or product, the optimal sample complexity is unknown. We provide optimal testers for these families.
The 13th annual Texas Wireless Summit (TWS) provides a forum on emerging technology and business models for industry leaders and academics. Hosted by the University of Texas at Austin's Wireless Networking and Communications Group (WNCG), the Summit offers direct access to cutting-edge research and innovations from industry leaders, investors, academics and startups.
We consider the task of summing (integrating) a non-negative function over a discrete domain, e.g., to compute the partition function of a graphical model. It is known that in a probabilistic approximate sense, summation can be reduced to maximization over random subsets of the domain defined by systems of linear equations modulo 2 (parity constraints). Unfortunately, random parity constraints with many variables are computationally intractable, while random parity constraints with few variables have poor statistical performance.
Fitting a low-rank matrix to data is a fundamental and widely used primitive in machine learning. For most problems beyond the very basic PCA, theoretically sound methods have overwhelmingly combined statistical models of the data with convex optimization. As the size and dimensionality of data increases, this approach is overly computationally wasteful, not least because it represents an nr dimensional object with n^2 parameters.
Anonymous messaging platforms, such as Secret, Whisper and Yik Yak, have emerged as important social media for sharing one's thoughts without the fear of being judged by friends, family, or the public. Further, such anonymous platforms are crucial in nations with authoritarian governments, where the right to free expression and sometimes the personal safety of the message author depends on anonymity. Whether for fear of judgment or personal endangerment, it is crucial to keep anonymous the identity of the users who initially posted sensitive messages in these platforms.
Modern datasets are rapidly growing in size and complexity, and this wealth of data holds the promise for many transformational applications. Machine learning is seemingly poised to deliver on this promise, having proposed and rigorously evaluated a wide range of data processing techniques over the past several decades. However, concerns over scalability and usability present major roadblocks to the wider adoption of these methods.
A fundamental problem in Markov Chains is estimating the probability of transitioning to a given terminal state in k steps from some initial distribution. This has received added attention in recent years due to the success of PageRank and related Markov-Chain based centrality measures for networks. Standard approaches to this problem use linear-algebraic methods (power iteration) or Monte Carlo.