Title: The Intelligent Management of Crowd-Powered Machine Learning
Advisors: Dan Weld and Mausam
Supervisory Committee: Dan Weld (co-Chair), Mausam (co-Chair), Thomas Richardson (GSR, STAT), Eric Horvitz (MSR), and Carlos Guestrin
Abstract: Artificial intelligence and machine learning powers many technologies today, from spam filters to self-driving cars to medical decision assistants. While this revolution has hugely benefited from developments in core areas like deep learning, it also could not have occurred without data, which nowadays is frequently procured at massive scale from crowds. Because data is so crucial, a key next step towards truly autonomous agents is the design of better methods for intelligently managing the now-ubiquitous crowd-powered data-gathering process.
This dissertation takes this key next step by developing algorithms for the online and dynamic control of data acquisition. We consider how to gather data for its two primary and independent purposes: training and evaluation.
In the first part of the dissertation, we develop algorithms for obtaining data for testing. The most important requirement of testing data is that it must be extremely clean. Thus to deal with noisy human annotations, machine learning practitioners typically rely on careful workflow design and advanced statistical techniques for label aggregation. A common process involves designing and testing multiple crowdsourcing workflows for their tasks, identifying the single best-performing workflow, and then aggregating worker responses from redundant runs of that single workflow. We improve upon this process in two ways: we build a control models that allow for switching between many workflows depending on how well a particular workflow is performing for a given example and worker; and we build a control model that can aggregate labels from tasks that do not have a finite predefined set of multiple choice answers (\eg\ counting tasks.)
We then implement agents that use our new models to dynamically choose whether to acquire more labels from the crowd or stop, and show that they can produce higher quality labels at a cheaper cost than the state-of-the-art baselines.
In the second part of the dissertation, we shift to tackle the second purpose of data: training.
Because learning algorithms are often robust to noise, instead of optimizing for accuracy like test sets, training sets can make tradeoffs between quantity, accuracy, and diversity. We first investigate the tradeoff between quantity and accuracy, in which given a fixed budget, one can spend money on acquiring cleaner labels, or one can spend money acquiring more examples. We survey how inductive bias, worker accuracy, and budget affect whether a larger and noisier training set or a smaller and cleaner one will train better classifiers. We then set up a formal framework for dynamically choosing the next example to label or relabel by generalizing active learning to allow for relabeling, which we call re-active learning, and we design new algorithms for re-active learning that outperform active learning baselines. Finally, we consider the effect of domain skew on strategies for increasing label diversity in a training set. We introduce an algorithm that dynamically switches between crowd generation and crowd labeling and show that it achieves better performance than state-of-the-art baselines across several domains with varying skew.