Modeling and Algorithms for Aggregated Data
Databases in domains such as healthcare are routinely released to the public in aggregated form to preserve privacy. However, naive application of existing modeling techniques on aggregated data is affected by ecological fallacy that can drastically reduce the accuracy of results and often lead to misleading inferences at the individual level. The project by Prof. Ghosh and student Avradeep Bhowmik, addresses the scenario under a generalized linear model setting where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. It involves designing simple algorithms that exploit properties of generalized linear models to accurately estimate the model parameters and reconstruct the database at the individual level given relatively coarse histograms for the target variables.