Near-optimal message-passing algorithm for crowdsourcing

Friday, April 26, 2013

Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms where large-scale projects are broken into small tasks that are electronically distributed to numerous on-demand contributors. Because these low-paid workers can be unreliable, we need to devise schemes to increase confidence in our answers, typically by assigning each task multiple times and combining the answers in some way. I will present a rigorous treatment of this problem, and provide both an optimal task assignment scheme (using a random graph) and an optimal inference algorithm (based on low-rank matrix approximation and belief propagation) for that task assignment.We represent crowdsourcing systems using graphical models and address the problem of inference in this graphical model. Standard techniques like belief propagation are difficult to implement in practice because they require knowledge of a priori distribution of the problem parameters. Instead, we propose a message-passing algorithm that does not require any knowledge of the apriori distributions. We show that this algorithm achieves performance close to a minimax lower bound. To analyze the performance of this message-passing algorithm, we borrow techniques from statistical physics and coding theory such as phase transition, correlation decay, and density evolution. Precisely, we show that above a phase transition, the graphical model exhibits correlation decay property. Then, an analysis technique known as density evolution gives a precise description of the density (or distribution) of the messages. Time permitting, I will discuss an interesting connection between this message-passing algorithm and the singular vectors of sparse random matrices.This is an on-going joint work with David R. Karger (MIT) and Devavrat Shah (MIT).


Sewoong Oh is an Assistant Professor of Industrial and Enterprise Systems Engineering at UIUC. His research interest is in understanding how to extract meaningful information from societal data, such as aggregating opinions on social computation platforms like Mechanical Turk, making recommendations from comparisons, and finding ranking from comparisons.