Digesting Big Data

Share this content

August 24, 2015

With Big Data playing a role in the lives of companies and individuals across the globe, and data being collected on everything from apps to electronic health records to parking meters, society debates how best to use this mass of information.

“The stormy sea of Big Data can lead to data indigestion,” WNCG Associate Director Prof. Constantine Caramanis states. “We are interested in the application of data for engineering problems, from petroleum to health to recommendation engines.”

Instead of avoiding dirty data, Prof. Caramanis continues, researchers need to embrace it and determine how best to use it. If used correctly, Big Data can affect marketing, communications and commercialization. It could detect spammers and epidemics in networks, change shopping and eating habits as well as redefine the healthcare system and patient diagnostics. 

Prof. Caramanis, along with WNCG Profs. Alex Dimakis, Joydeep Ghosh, Sriram Vishwanath and Sujay Sanghavi are tackling these data challenges head on.

From graph analytics and algorithms to visualization through student startups and cell phone apps, WNCG researchers focus on bringing sense to Big Data.

“Our goal is to develop algorithms that make good recommendations while being resistant to manipulators,” Prof. Caramanis states. In addition to his work on robustness in data analytics, his recent research focuses on epidemics detection in social and human networks. He also studies applications of graph analytics on Big Data with Prof. Dimakis.

“A graph is a network that captures interactions between data,” Prof. Dimakis states. “Using these interactions, you can learn things you wouldn’t be able to learn if you looked at the data without the structure.”

Prof. Dimakis is currently developing parallel distributed algorithms for graph problems and graph engines. These tools help simplify the work required by programmers to review data and can even affect search engine page-ranking.

Healthcare challenges are more difficult  to solve, Prof. Vishwanath mentions, since it involves private data that is difficult to sift through.

As the adoption of Electronic Health Records (EHR) increases in the U.S., the complexity of EHR data is growing dramatically. EHR data now covers diverse information about patients, including diagnosis, medication, lab results, genomic information and clinical notes.

However, such large volumes of information do not readily provide accurate and succinct patient representations for effective and customized healthcare.

According to Prof. Ghosh, the trick is to transform data into knowledge by translating complex, interconnected EHR data into concise and meaningful clinical concepts, or phenotypes, about patients. 

A phenotype is a collection of observable traits that results from the interactions between genetic expression and environmental influence. These phenotypes can be more easily interpreted, accepted and used by physicians. However, current phenotypes from EHRs consume time and demand much human expertise.

The goal of Prof. Ghosh’s research is to model data as multiple, interconnected relationships, such as the relationship between a patient, their medication and diagnosis, or a patient and their symptoms. His research team is developing scalable algorithms to analyze these relationships and derive hidden concepts from the available data. Clinical experts will refine these concepts into specific phenotypes.

Prof. Ghosh predicts his research will lay the foundation for large-scale studies of EHR data that combine computer science and medical informatics to enable new clinical discoveries.

“The key is to match patterns,” Prof. Vishwanath states. “If you match patterns, you can look at a particular population and make conclusions. This type of extremely fine, predictive targeting will play a big role in improving healthcare relations.”

To address these healthcare challenges in a real-world setting, Profs. Ghosh and Vishwanath teamed up with WNCG students Joyce Ho and Yubin Park to launch Accordion Health, a company that uses Big Data insights to help consumers make healthcare decisions.

“We can use Big Data to solve problems that directly influence consumer decisions,“ Prof. Vishwanath, who focuses primarily on the applications and consumer-products side of Big Data research, mentions. “As I walk down the street, anything and everything I’m about to buy can change depending on my behavior.”

To implement this idea, Prof. Vishwanath, along with students from the 2014 Eureca undergraduate summer research program, founded Tastebud, an app that offers discounts to restaurant users based on timing and location.

The main goal of sifting through Big Data and making it digestible, Prof. Caramanis mentions, is to use algorithms wisely to manage the data. However, many research challenges remain before Big Data can be seamlessly accessible to all. Many companies and organizations do not currently share their data with researchers, often citing privacy concerns.

“I’m particularly interested in the power of the crowd in finding and suggesting products and making recommendations,” Prof. Caramanis states. “Big Data harnesses the power of the crowd.” 

To help harness this power of the crowd and solve these complex engineering problems, the WNCG Big Data team started a Data Sciences and Engineering Initiative. 

WNCG Data Science Research supported by: WNCG Affiliates, NSF, USDOT, DTRA, Google and DOCOMO Innovations. 

News category: