Researchers have many options for web-based survey data collection, ranging from access to curated probability-based panels, where individuals are selectively invited to join based on their membership in a representative population, to convenience panels, which are open for anyone to join. The mix of respondents available also varies greatly regarding representation of a population of interest and in motivation to provide thoughtful and accurate responses. Despite the additional dataset-building labor required of the researcher, convenience panels are much less expensive than probability-based panels. However, it is important to understand what may be given up regarding data quality for those cost savings. This study examined the relative costs and data quality of fielding equivalent surveys on Amazon's Mechanical Turk (MTurk), a convenience panel, and KnowledgePanel, a nationally representative probability-based panel. We administered the same survey measures to MTurk (in 2021) and KnowledgePanel (in 2022) members. We applied several recommended quality assurance steps to enhance the data quality achieved using MTurk. Ipsos, the owner of KnowledgePanel, followed their usual (industry standard) protocols. The survey was designed to support psychometric analyses and included >60 items from the Patient-Reported Outcomes Measurement Information System (PROMIS), demographics, and a list of health conditions. We used 2 fake conditions ("syndomitis" and "chekalism") to identify those more likely to be honest respondents. We examined the quality of each platform's data using several recommended metrics (eg, consistency, reliability, representativeness, missing data, and correlations) including and excluding those respondents who had endorsed a fake condition and examined the impact of weighting on representativeness. We found that prescreening in the MTurk sample (removing those who endorsed a fake health condition) improved data quality but KnowledgePanel data quality generally remained superior. While MTurk's unweighted point estimates for demographics exhibited the usual mismatch with national averages (younger, better educated, and lower income), weighted MTurk data matched national estimates. KnowledgePanel's point estimates better matched national benchmarks even before poststratification weighting. Correlations between PROMIS measures and age and income were similar in MTurk and KnowledgePanel; the mean absolute value of the difference between each platform's 137 correlations was 0.06, and 92% were <0.15. However, correlations between PROMIS measures and educational level were dramatically different; the mean absolute value of the difference across these 17 correlation pairs was 0.15, the largest difference was 0.29, and the direction of more than half of these relationships in the MTurk sample was the opposite from that expected from theory. Therefore, caution is needed if using MTurk for studies where educational level is a key variable. The data quality of our MTurk sample was often inferior to that of the KnowledgePanel sample but possibly not so much as to negate the benefits of its cost savings for some uses. RR2-10.1186/s12891-020-03696-2.
Read full abstract