Data Driven Insights for Health Informatics

11 Jul 2014

The move to Electronic Health Records and rapidly expanding availability of health-related information, from billing data body sensors, is providing unprecedented opportunities for innovative data driven solutions to problems in personalized medicine and population health. However, there are many formidable challenges in using EHR data that have limited their utility for clinical research so far, including diverse populations, heterogeneous and noisy information, longitudinal data, interpretability, domain constraints, and privacy concerns.

Our work takes a significant step towards the promise of exploiting large-scale EHR data for effective population health care and management. We are working on a variety of approaches for the analysis of such data, ranging from 1) high-throughput phenotyping via sparse non-negative tensor factorization of health data tensors, 2) providing new models that deal with very rare classes (e.g. rare conditions or diseases), 3) studying the utility vs. privacy trade-off in healthcare data analytics, and 4) ways of using multidimensional time series representing human physiological measures, for predictive modeling, e.g. determining if a patient is likely to go into cardiac arrest in the next few hours.

This research funded by the Schlumberger Chair and the USAA.

Paper 1: DYNACARE: Dynamic Cardiac Arrest Risk Estimation

Paper 2: Multivariate Temporal Symptomatic Characterization of Cardiac Arrest

Paper 3: Ensemble of Alpha-Trees for Imbalanced Classification Problems

Paper 4: Graph Databases for Large-Scale Healthcare Systems: A Proposal for Efficient Data Management and Service

Paper 5: Septic Shock Prediction for Patients with Missing Data

Paper 6: LAMORE: A Stable, Scalable Approach to Latent Vector Autoregressive Modeling of Categorical Time Series

Paper 7: Limestone: High-throughput Candidate Phenotype Generation via Tensor Factorization

Paper 8: Marble: High-throughput Phenotyping from Electronic Health Records via Sparse Nonnegative Tensor Factorization (to appear)

Paper 9: LUDIA: An Aggregate-Constrained Low-Rank Reconstruction Algorithm to Leverage Publicly Released Health Data (to appear)

Paper 10: Extracting Phenotypes from Patient Claim Records using Non-negative Tensor Factorization (to appear)

Paper 11: A Hierarchical Ensemble of alpha-Trees for Predicting Expensive Hospital Visits (to appear)