Accurate inference about crowdsourcing problems when using efficient allocation strategies
Conference Year
January 2019
Abstract
Accurate inference about crowdsourcing problems when using efficient allocation strategies Crowdsourcing is a modern technique to solve complex and computationally challenging sets of problems using the abilities of human participants [1]. However, human participants are relatively expensive compared with computational methods, so considerable research has investigated algorithmic strategies for efficiently distributing problems to participants and determining when problems have been sufficiently completed [2, 3, 4]. Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual problems. We show that allocation algorithms introduce bias by allocating workers to easy tasks at the expense of difficult tasks and by ceasing to obtain information about tasks once the algorithm has concluded. As a result, data gathered with allocation algorithms are biased and not representative of the true distribution of those data. This bias challenges inference of crowdsourcing features such as typical task difficulty or worker completion times. To study crowdsourcing algorithms and problem bias we introduce a model for crowdsourcing a set of problems where we can tune the distribution of problem difficulty. We then apply an allocation algorithm, Requallo [2], to our model and find that the distribution of problem difficulty is biased—Requallo-completed tasks are more likely to be easy tasks and less likely to be hard tasks. Finally, we introduce an inference procedure, Decision-Explicit Probability Sampling (DEPS), to estimate the true problem difficulty distribution given only an allocation algorithm’s responses, allowing us to reason about the larger problem space while leveraging the efficiency of the allocation method. Results on real and synthetic crowdsourcing classifications show that DEPS creates a more accurate representation of the underlying distribution than baseline methods. The ability to perform accurate inference when using non-representative data allows crowdsourcers to extract more knowledge out of a given budget. References [1] Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1), 75-90. [2] Li, Q., Ma, F., Gao, J., Su, L., & Quinn, C. J. (2016). Crowdsourcing high quality labels with a tight budget. In WSDM’16, ACM. [3] Chen, Xi, Qihang Lin, and Dengyong Zhou. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. International conference on machine learning. 2013. [4] McAndrew, T. C., Guseva, E. A., & Bagrow, J. P. (2017). Reply & Supply: E fficient crowdsourcing when workers do more than answer questions. PloS one, 12(8), e0182662.
Primary Faculty Mentor Name
James Bagrow
Status
Graduate
Student College
College of Engineering and Mathematical Sciences
Program/Major
Statistics
Primary Research Category
Engineering & Physical Sciences
Secondary Research Category
Social Sciences
Accurate inference about crowdsourcing problems when using efficient allocation strategies
Accurate inference about crowdsourcing problems when using efficient allocation strategies Crowdsourcing is a modern technique to solve complex and computationally challenging sets of problems using the abilities of human participants [1]. However, human participants are relatively expensive compared with computational methods, so considerable research has investigated algorithmic strategies for efficiently distributing problems to participants and determining when problems have been sufficiently completed [2, 3, 4]. Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual problems. We show that allocation algorithms introduce bias by allocating workers to easy tasks at the expense of difficult tasks and by ceasing to obtain information about tasks once the algorithm has concluded. As a result, data gathered with allocation algorithms are biased and not representative of the true distribution of those data. This bias challenges inference of crowdsourcing features such as typical task difficulty or worker completion times. To study crowdsourcing algorithms and problem bias we introduce a model for crowdsourcing a set of problems where we can tune the distribution of problem difficulty. We then apply an allocation algorithm, Requallo [2], to our model and find that the distribution of problem difficulty is biased—Requallo-completed tasks are more likely to be easy tasks and less likely to be hard tasks. Finally, we introduce an inference procedure, Decision-Explicit Probability Sampling (DEPS), to estimate the true problem difficulty distribution given only an allocation algorithm’s responses, allowing us to reason about the larger problem space while leveraging the efficiency of the allocation method. Results on real and synthetic crowdsourcing classifications show that DEPS creates a more accurate representation of the underlying distribution than baseline methods. The ability to perform accurate inference when using non-representative data allows crowdsourcers to extract more knowledge out of a given budget. References [1] Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1), 75-90. [2] Li, Q., Ma, F., Gao, J., Su, L., & Quinn, C. J. (2016). Crowdsourcing high quality labels with a tight budget. In WSDM’16, ACM. [3] Chen, Xi, Qihang Lin, and Dengyong Zhou. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. International conference on machine learning. 2013. [4] McAndrew, T. C., Guseva, E. A., & Bagrow, J. P. (2017). Reply & Supply: E fficient crowdsourcing when workers do more than answer questions. PloS one, 12(8), e0182662.