# Exploiting geometric and algebraic structure in data

As signal acquisition systems become increasingly interconnected, complex and diverse, new methods of data analysis and processing have become crucial. Often, the development of such methods benefits from modeling the inherent structure present in data. The structure may come from (a) low intrinsic complexity of high-dimensional data, as captured by the low-rank assumption in many problems in machine learning; (b) the physics of the signal acquisition as in distributed video sensor networks; or (c) the very nature of the process we are trying to study, for instance genome evolution via rearrangements. In this talk I will describe how we can leverage techniques from group representation theory and differential geometry to utilize structure in data and develop efficient machine learning and signal processing solutions for a wide set of problems.

The first half of the talk will focus on the problem of optimal estimation and detection in homogeneous spaces, i.e. signal spaces where a transformation group acts transitively on a set (or manifold). I will discuss the problem of recovering transformations from pairs of datasets in homogeneous spaces and specialize the discussion to the rotation group acting on the unit sphere (with application to super-resolution of omni-directional images and visual homing of robots) and the permutation group (with application to genome rearrangements and multi-object tracking). In the second half of the talk I will discuss a group theoretical approach for learning rotations in an online fashion. The proposed algorithm involves multiplicative updates that are matrix exponentials of skew-symmetric matrices comprising the Lie algebra of the rotation group. I will discuss the stability and convergence of the proposed algorithm and application to left-stochastic decomposition (LSD) of similarity matrices. LSD is a variant of non-negative matrix factorization and is equivalent to soft kernel k-means clustering. I will give conditions for existence and uniqueness of LSD factorization and clustering, present error bounds for noisy settings and discuss experimental results on simulated and real similarity datasets.