Large Scale Learning from Text, Images and Social Interactions

11 Jul 2014

In this project, WNCG Prof. Joydeep Ghosh and his students and collaborators developed numerous discriminative and generative models for solving large scale transfer learning problems prevalent in text document analysis, social network study, recommender systems, object recognition from images, evolution of financial data and sports analytics.

Scarcity of labeled data in applications is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of data changes over time and renders models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning creates a convenient framework for overcoming problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others.

All simultaneous-learning approaches maintain a low dimensional space shared across multiple domains. For sequential knowledge transfer, parameters of the model trained with data from an older domain are carefully adapted to fit the new distributions. Applications of such frameworks in problems like text classification, object recognition from images, network modeling for community detection and count data evolution have shown promising results so far.

Simultaneous knowledge transfer has also been integrated with active learning to gain additional benefits in domains where labeled data is expensive to obtain. Current research focuses on three applications: 1. simultaneous knowledge transfer with explicit feedback from human annotators, 2. development of non-parametric dynamic state-space models for analysis of count data that changes over time, 3. dynamic network modeling for security applications and anomaly detection.

This research is funded by the National Science Foundation, the U.S. Army Research Laboratory and the Office of Naval Research.

Paper 1: Active Multitask Learning using Both Supervised and Latent Share Topics

Paper 2: A Differential Evolution Algorithm to Optimize the Combination of Classifier and Cluster Ensembles

Paper 3: An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Non-Transductive Semi-Supervised Learning and Transfer Learning (to appear in ACM Transactions on Knowledge Discovery from Data, 2014)