#### Presentation Title

Accurate inference about crowdsourcing problems when using efficient allocation strategies

#### Abstract

Accurate inference about crowdsourcing problems when using efficient allocation strategies Crowdsourcing is a modern technique to solve complex and computationally challenging sets of problems using the abilities of human participants [1]. However, human participants are relatively expensive compared with computational methods, so considerable research has investigated algorithmic strategies for efficiently distributing problems to participants and determining when problems have been sufficiently completed [2, 3, 4]. Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual problems. We show that allocation algorithms introduce bias by allocating workers to easy tasks at the expense of difficult tasks and by ceasing to obtain information about tasks once the algorithm has concluded. As a result, data gathered with allocation algorithms are biased and not representative of the true distribution of those data. This bias challenges inference of crowdsourcing features such as typical task difficulty or worker completion times. To study crowdsourcing algorithms and problem bias we introduce a model for crowdsourcing a set of problems where we can tune the distribution of problem difficulty. We then apply an allocation algorithm, Requallo [2], to our model and find that the distribution of problem difficulty is biased—Requallo-completed tasks are more likely to be easy tasks and less likely to be hard tasks. Finally, we introduce an inference procedure, Decision-Explicit Probability Sampling (DEPS), to estimate the true problem difficulty distribution given only an allocation algorithm’s responses, allowing us to reason about the larger problem space while leveraging the efficiency of the allocation method. Results on real and synthetic crowdsourcing classifications show that DEPS creates a more accurate representation of the underlying distribution than baseline methods. The ability to perform accurate inference when using non-representative data allows crowdsourcers to extract more knowledge out of a given budget. References [1] Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1), 75-90. [2] Li, Q., Ma, F., Gao, J., Su, L., & Quinn, C. J. (2016). Crowdsourcing high quality labels with a tight budget. In WSDM’16, ACM. [3] Chen, Xi, Qihang Lin, and Dengyong Zhou. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. International conference on machine learning. 2013. [4] McAndrew, T. C., Guseva, E. A., & Bagrow, J. P. (2017). Reply & Supply: E fficient crowdsourcing when workers do more than answer questions. PloS one, 12(8), e0182662.

#### Primary Faculty Mentor Name

James Bagrow

#### Status

Graduate

#### Student College

College of Engineering and Mathematical Sciences

#### Program/Major

Statistics

#### Primary Research Category

Engineering & Physical Sciences

#### Secondary Research Category

Social Sciences

Accurate inference about crowdsourcing problems when using efficient allocation strategies

Accurate inference about crowdsourcing problems when using efficient allocation strategies Crowdsourcing is a modern technique to solve complex and computationally challenging sets of problems using the abilities of human participants [1]. However, human participants are relatively expensive compared with computational methods, so considerable research has investigated algorithmic strategies for efficiently distributing problems to participants and determining when problems have been sufficiently completed [2, 3, 4]. Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual problems. We show that allocation algorithms introduce bias by allocating workers to easy tasks at the expense of difficult tasks and by ceasing to obtain information about tasks once the algorithm has concluded. As a result, data gathered with allocation algorithms are biased and not representative of the true distribution of those data. This bias challenges inference of crowdsourcing features such as typical task difficulty or worker completion times. To study crowdsourcing algorithms and problem bias we introduce a model for crowdsourcing a set of problems where we can tune the distribution of problem difficulty. We then apply an allocation algorithm, Requallo [2], to our model and find that the distribution of problem difficulty is biased—Requallo-completed tasks are more likely to be easy tasks and less likely to be hard tasks. Finally, we introduce an inference procedure, Decision-Explicit Probability Sampling (DEPS), to estimate the true problem difficulty distribution given only an allocation algorithm’s responses, allowing us to reason about the larger problem space while leveraging the efficiency of the allocation method. Results on real and synthetic crowdsourcing classifications show that DEPS creates a more accurate representation of the underlying distribution than baseline methods. The ability to perform accurate inference when using non-representative data allows crowdsourcers to extract more knowledge out of a given budget. References [1] Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1), 75-90. [2] Li, Q., Ma, F., Gao, J., Su, L., & Quinn, C. J. (2016). Crowdsourcing high quality labels with a tight budget. In WSDM’16, ACM. [3] Chen, Xi, Qihang Lin, and Dengyong Zhou. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. International conference on machine learning. 2013. [4] McAndrew, T. C., Guseva, E. A., & Bagrow, J. P. (2017). Reply & Supply: E fficient crowdsourcing when workers do more than answer questions. PloS one, 12(8), e0182662.