Attention checks and how to use them: Review and practical recommendations

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Web surveys dominate contemporary data collection in numerous disciplines within the broadly understood social sciences. However, this mode of data collection comes with additional challenges, particularly related to careless or insufficient effort responding (C/IER), which can distort study results and poses a direct threat to the validity. One of the recommended approaches to address this problem is using attention checks, which are additional tasks or items with objective answers that indicate attentive responding. Despite the potential benefits of attention checks, recent evidence suggests that they are still not sufficiently researched to justify their uncritical use in screening out inattentive participants. This article provides an abridged review of the attention checks literature, offers evidence-based practical recommendations, and highlights crucial gaps in research regarding attention checks. Evidence-based recommendations concerning the type, number, and placement of attention checks in a survey are presented. Generally, including more than one attention check in a survey is advisable, especially for longer surveys. Long instructed manipulation checks should be avoided, instead, covert attention checks, which are difficult for participants to identify, are recommended to reduce negative side effects such as noncompliance. In addition to attention checks, other criteria, such as item-level response time analysis, should be used in combination to identify inattentive participants. It is crucial to carefully analyse all data before making decisions about participant elimination. Ethical considerations related to the use of attention checks are also discussed, recognizing the importance of maintaining participant trust and understanding the potential impact on survey completion rates and data quality. Overall, attention checks hold certain promise as a tool to enhance data quality, but further research and a thoughtful implementation are necessary to maximise their effectiveness.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1177/01492063251330268
Insufficient Effort Responding in Management Research: A Critical Review and Future Directions
  • Apr 30, 2025
  • Journal of Management
  • Jason L Huang + 4 more

Insufficient effort responding (IER) presents a significant challenge in management research, potentially leading to flawed inferences. This review critically examines IER practices in 17 leading management journals from 2012 to 2023, highlighting inconsistencies in screening methods, cutoffs, and reporting. We find that IER screening is more prevalent in studies using online paid samples, experimental tasks, and computerized data collection. However, researchers’ IER-related practices, specifically the use of multiple detection methods, predicted IER removal rate above and beyond these study characteristics. Our review revealed that, despite increasing awareness, IER detection and reporting remain unstandardized, with varied practices across studies. While attention checks are frequently used, details about their implementation are often inadequately reported, and multiple detection methods, though recommended, are inconsistently applied. Variability in cutoffs and reliance on single-item checks raise concerns about the risk of retaining IER cases or mistakenly excluding attentive respondents. Our assessment of the impact of IER removal suggests that while it generally improves reliability and model fit, its effect can vary widely across measures and studies. We call on methodologists to resolve existing inconsistencies by developing clearer, empirically derived guidelines for IER detection and removal. We urge researchers to adopt more comprehensive and transparent reporting practices to enhance replicability and methodological rigor, with a flowchart to guide research design and method communication. This review underscores the need for a more systematic approach to IER mitigation in management research to enhance data quality and research validity.

  • Research Article
  • Cite Count Icon 5
  • 10.1037/pha0000645
Are the attention checks embedded in delay discounting tasks a valid marker for data quality?
  • Oct 1, 2023
  • Experimental and clinical psychopharmacology
  • Shahar Almog + 3 more

To ensure good quality delay discounting (DD) data in research recruiting via crowdsourcing platforms, including attention checks within DD tasks have become common. These attention checks are typically identical in format to the task questions but have one sensical answer (e.g., "Would you prefer $0 now or $100 in a month?"). However, the validity of these attention checks as a marker for DD or overall survey data quality has not been directly examined. To address this gap, using data from two studies (total N = 700), the validity of these DD attention checks was tested by assessing performance on other non-DD attention checks and data quality measures both specific to DD and overall survey data (e.g., providing nonsystematic DD data, responding inconsistently in questionnaires). We also tested whether failing the attention checks was associated with degree of discounting or other participant characteristics to screen for potential bias. While failing the DD attention checks was associated with a greater likelihood of nonsystematic DD data, their discriminability was inadequate, and failure was sometimes associated with individual differences (suggesting that data exclusion might introduce bias). Failing the DD attention checks was also not associated with failing other attention checks or data quality indicators. Overall, the DD attention checks do not appear to be an adequate indicator of data quality on their own, for either the DD task or surveys overall. Strategies to enhance the validity of DD attention checks and data cleaning procedures are suggested, which should be evaluated in future research. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

  • Research Article
  • Cite Count Icon 1
  • 10.1037/pas0001379
Investigating the effect of experience sampling study design on careless and insufficient effort responding identified with a screen-time-based mixture model.
  • Aug 1, 2025
  • Psychological assessment
  • Esther Ulitzsch + 6 more

When using the experience sampling method (ESM), researchers must navigate a delicate balance between obtaining fine-grained snapshots of phenomena of interest and avoiding undue respondent burden, which can lead to disengagement and compromise data quality. To guide that process, we investigated how questionnaire length and sampling frequency impact careless and insufficient effort responding (C/IER) as an important yet understudied aspect of ESM data quality. To this end, we made use of existing experimental ESM data (Eisele et al., 2022) from 163 students randomly assigned to one of two questionnaire lengths (30/60 items) and one of three sampling frequencies (3/6/9 assessments per day). We employed a novel mixture modeling approach (Ulitzsch, Nestler, et al., 2024) that leverages screen time data to disentangle attentive responding from C/IER and allows investigating how the occurrence of C/IER evolved within and across ESM study days. We found sampling frequency, but not questionnaire length, impacted C/IER, with higher frequencies resulting in higher overall C/IER proportions and sharper increases of C/IER across, but not within days. These effects proved robust across various model specifications. Further, we found no substantial relationships between model-implied C/IER and other engagement measures, such as self-reported attentiveness, attention checks, response-pattern-based attentiveness indicators, and compliance. Our findings contrast previous studies on noncompliance, suggesting that respondents may employ different strategies to lower the different types of burden imposed by questionnaire length and sampling frequency. Implications for designing ESM studies are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

  • Research Article
  • Cite Count Icon 1
  • 10.1111/modl.70000
Quiet threat: Insufficient effort responding in applied linguistics and its impact
  • Oct 26, 2025
  • The Modern Language Journal
  • Melissa Dan Wang + 2 more

Self‐report surveys are widely used in applied linguistics. Nevertheless, insufficient effort responding—often stemming from a lack of motivation from participants—can compromise survey data quality and distort research findings. This study investigated insufficient effort responding through an online survey assessing second language (L2) teachers’ assessment literacy, employing multiple methods to identify insufficient effort responding: an attention‐check item, self‐reported engagement, response time, and the post hoc index lz . Results indicated that 1.9% to 26.4% of participants exhibited signs of insufficient effort responding, depending on the method used. Insufficient effort responding distorted sample distributions by inflating means and standard deviations and negatively impacted scale reliability and model fit, both of which improved after removing flagged responses. Additionally, insufficient effort responding was less prevalent among older participants, and participants who perceived the survey as “intriguing” or “valuable” reported higher engagement. Furthermore, completing the survey as a favor for friends encouraged participants to put more effort into their responses. Our findings highlight the importance of addressing insufficient effort responding in survey‐based studies. Practical guidelines are provided for improving survey design and administration, emphasizing strategies to prevent and identify insufficient effort responding, thereby enhancing the reliability and validity of self‐report measures in applied linguistics research.

  • Research Article
  • Cite Count Icon 145
  • 10.1007/s10869-016-9479-0
Intra-individual Response Variability as an Indicator of Insufficient Effort Responding: Comparison to Other Indicators and Relationships with Individual Differences
  • Nov 22, 2016
  • Journal of Business and Psychology
  • Alexandra M Dunn + 3 more

Surveys are one of the most popular ways to collect employee information. Because of their widespread use, data quality is an increasingly important concern. The purpose of this paper is to (1) introduce the intra-individual response variability (IRV) index as an easily calculated and flexible way to detect insufficient effort responding (IER); (2) examine the extent to which various IER indices detect the same or different respondents engaging in IER behavior; and (3) investigate relationships between individual differences and commonly used IER indices to better understand systematic and theoretically relevant IER behavior. In a two-part study, 199 undergraduates responded to questionnaires online, and various IER indices were calculated. The IRV index identifies different respondents than other IER indices. Values on the IRV index (as well as other IER indices) are related to scores on theoretically meaningful individual differences in conscientiousness, agreeableness, and boredom proneness. This study provides researchers with a robust, easily calculated, and flexible means for screening questionnaire data for IER behavior. Practical recommendations for finding and making decisions about IER behavior patterns are provided. This study introduces the IRV index, an extension of the long string, used to identify survey research participants who likely engaged in one type of IER behavior. It is also one of the first studies to evaluate the extent to which IER indices identify different respondents as having engaged in IER and provides additional evidence that values on these indices are related to individual differences.

  • Research Article
  • 10.1016/j.actpsy.2025.106162
Uncovering multidimensional patterns of insufficient effort responding: A latent profile analysis integrating survey data and cognitive task performance.
  • Feb 1, 2026
  • Acta psychologica
  • Giryong Park + 3 more

Uncovering multidimensional patterns of insufficient effort responding: A latent profile analysis integrating survey data and cognitive task performance.

  • Research Article
  • 10.1186/s40536-025-00260-z
Data quality disparities in large-scale assessments: insufficient effort responding across student groups, schools, and cultures
  • Jul 24, 2025
  • Large-scale Assessments in Education
  • Melissa Dan Wang

Although self-report surveys are widely used for data collection, data quality can vary across populations because certain groups are more likely to engage in insufficient effort responding (IER). Our study examined how different levels of the educational system—student groups, schools, and cultural contexts—affect data quality due to IER, using a three-level Poisson regression analysis of the PISA 2018 dataset. We observed IER prevalence ranging from 14 to 74% in PISA questionnaire, depending on the indicators used. Notably, students with low academic performance were more likely to engage in IER. Additionally, students from low academic performance or high-SES schools exhibited higher IER tendencies, while IER showed minimal variation across cultural contexts. These findings emphasized that IER introduced systematic biases into survey data, undermining the fairness of group comparisons and confounding results with participants’ characteristics. Our study advocated for a careful approach to addressing IER to enhance the validity of self-report measurements.

  • Research Article
  • Cite Count Icon 46
  • 10.1111/apps.12058
Insufficient Effort Survey Responding: An Under‐Appreciated Problem in Work and Organisational Health Psychology Research
  • Dec 30, 2015
  • Applied Psychology
  • Alyssa K Mcgonagle + 2 more

Insufficient effort responding (IER) is problematic in that it can add a systematic source of variance for variables with average responses that depart from the scale midpoints. We present a rationale for why IER is of particular importance to Work and Organisational Health Psychology (WOHP) researchers. We also demonstrate its biasing effects using several variables of interest to WOHP researchers (perceived work ability, negative affectivity, perceived disability, work–safety tension, accident/injury frequencies, and experienced and instigated incivility) in two datasets. As expected, IER was significantly correlated with the focal study variables. We also found some evidence that hypothesised bivariate correlations between these variables were inflated when IER respondents were included. Corroborating IER's potential confounding role, we further found significant declines in the magnitude of the hypothesised bivariate correlations after partialling out IER. In addition, we found evidence for biasing (under‐estimation) effects for predictors not contaminated by IER in multiple regression models where some predictors and the outcome were both contaminated by IER. We call for WOHP researchers to routinely discourage IER from occurring in their surveys, screen for IER prior to analyzing survey data, and establish a standard practice for handling IER cases.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.2196/39488
Data Quality and Study Compliance Among College Students Across 2 Recruitment Sources: Two Study Investigation
  • Dec 9, 2022
  • JMIR Formative Research
  • Abby L Braitman + 5 more

BackgroundModels of satisficing suggest that study participants may not fully process survey items and provide accurate responses when survey burden is higher and when participant motivation is lower. Participants who do not fully process survey instructions can reduce a study’s power and hinder generalizability. Common concerns among researchers using self-report measures are data quality and participant compliance. Similarly, attrition can hurt the power and generalizability of a study.ObjectiveGiven that college students comprise most samples in psychological studies, especially examinations of student issues and psychological health, it is critical to understand how college student recruitment sources impact data quality (operationalized as attention check items with directive instructions and correct answers) and retention (operationalized as the completion of follow-up surveys over time). This examination aimed to examine the following: whether data quality varies across recruitment sources, whether study retention varies across recruitment sources, the impact of data quality on study variable associations, the impact of data quality on measures of internal consistency, and whether the demographic qualities of participants significantly vary across those who failed attention checks versus those who did not.MethodsThis examination was a follow-up analysis of 2 previously published studies to explore data quality and study compliance. Study 1 was a cross-sectional, web-based survey examining college stressors and psychological health (282/407, 69.3% female; 230/407, 56.5% White, 113/407, 27.8% Black; mean age 22.65, SD 6.73 years). Study 2 was a longitudinal college drinking intervention trial with an in-person baseline session and 2 web-based follow-up surveys (378/528, 71.6% female; 213/528, 40.3% White, 277/528, 52.5% Black; mean age 19.85, SD 1.65 years). Attention checks were included in both studies to assess data quality. Participants for both studies were recruited from a psychology participation pool (a pull-in method; for course credit) and the general student body (a push-out method; for monetary payment or raffle entry).ResultsA greater proportion of participants recruited through the psychology pool failed attention checks in both studies, suggesting poorer data quality. The psychology pool was also associated with lower retention rates over time. After screening out those who failed attention checks, some correlations among the study variables were stronger, some were weaker, and some were fairly similar, potentially suggesting bias introduced by including these participants. Differences among the indicators of internal consistency for the study measures were negligible. Finally, attention check failure was not significantly associated with most demographic characteristics but varied across some racial identities. This suggests that filtering out data from participants who failed attention checks may not limit sample diversity.ConclusionsInvestigators conducting college student research should carefully consider recruitment and include attention checks or other means of detecting poor quality data. Recommendations for researchers are discussed.

  • Single Report
  • 10.15760/etd.6337
Insufficient Effort Responding on Mturk Surveys: Evidence-Based Quality Control for Organizational Research
  • Jan 1, 2000
  • Lee Cyr

Each year, crowdsourcing organizational research grows increasingly popular. However, this source of sampling receives much scrutiny focused on data quality and related research methods. Specific to the present research, survey attentiveness poses a unique dilemma. Research on updated conceptualizations of attentiveness--insufficient effort responding (IER)--shows that it carries substantial concerns for data quality beyond random noise, which further warrants deleting inattentive participants. However, personal characteristics predict IER, so deleting data may cause sampling bias. Therefore, preventing IER becomes paramount, but research seems to ignore whether IER prevention itself may create systematic error. This study examines the detection and prevention of IER in Amazon's Mechanical Turk (Mturk) by evaluating three IER detection methods pertinent to concerns of attentiveness on the platform and using two, promising, IER prevention approaches--Mturk screening features and IER preventive warning messages. I further consider how these issues relate to organizational research and answer the call for a more nuanced understanding of the Mturk population by focusing on psychological phenomena often studied/measured in organizational literature--the congruency effect and approach-avoidance motivational theories, Big Five personality, positive and negative affectivity, and core self-evaluations. I collected survey data from screened and non-screened samples and manipulated warning messages using four conditions--no warning, gain-framed, loss-framed, and combined-framed messages. I used logistic regression to compare the prevalence of IER across conditions and the effectiveness of warning messages given positively or negatively valenced motivational tendencies. I also used 4x2 factorial ANCOVAs to test for differences in personal characteristics across conditions. The sample consisted of 1071 Mturk workers (turkers). Results revealed differences in IER prevalence among detection methods and between prevention conditions, counter-intuitive results for congruency effects and motivational theories, and differences across conditions for agreeableness, conscientiousness, and positive and negative affectivity. Implications, future research, and recommendations are discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3389/fpsyg.2021.784375
The Relationship of Insufficient Effort Responding and Response Styles: An Online Experiment.
  • Jan 12, 2022
  • Frontiers in Psychology
  • Gene M Alarcon + 1 more

While self-report data is a staple of modern psychological studies, they rely on participants accurately self-reporting. Two constructs that impede accurate results are insufficient effort responding (IER) and response styles. These constructs share conceptual underpinnings and both utilized to reduce cognitive effort when responding to self-report scales. Little research has extensively explored the relationship of the two constructs. The current study explored the relationship of the two constructs across even-point and odd-point scales, as well as before and after data cleaning procedures. We utilized IRTrees, a statistical method for modeling response styles, to examine the relationship between IER and response styles. To capture the wide range of IER metrics available, we employed several forms of IER assessment in our analyses and generated IER factors based on the type of IER being detected. Our results indicated an overall modest relationship between IER and response styles, which varied depending on the type of IER metric being considered or type of scale being evaluated. As expected, data cleaning also changed the relationships of some of the variables. We posit the difference between the constructs may be the degree of cognitive effort participants are willing to expend. Future research and applications are discussed.

  • Research Article
  • Cite Count Icon 64
  • 10.1177/0013164419865316
Methods of Detecting Insufficient Effort Responding: Comparisons and Practical Recommendations.
  • Jul 31, 2019
  • Educational and Psychological Measurement
  • Maxwell Hong + 2 more

Insufficient effort responding (IER) affects many forms of assessment in both educational and psychological contexts. Much research has examined different types of IER, IER's impact on the psychometric properties of test scores, and preprocessing procedures used to detect IER. However, there is a gap in the literature in terms of practical advice for applied researchers and psychometricians when evaluating multiple sources of IER evidence, including the best strategy or combination of strategies when preprocessing data. In this study, we demonstrate how the use of different IER detection methods may affect psychometric properties such as predictive validity and reliability. Moreover, we evaluate how different data cleansing procedures can detect different types of IER. We provide evidence via simulation studies and applied analysis using the ACT's Engage assessment as a motivating example. Based on the findings of the study, we provide recommendations and future research directions for those who suspect their data may contain responses reflecting careless, random, or biased responding.

  • Research Article
  • Cite Count Icon 354
  • 10.1016/j.jom.2017.06.001
Attention by design: Using attention checks to detect inattentive respondents and improve data quality
  • Jul 5, 2017
  • Journal of Operations Management
  • James D Abbey + 1 more

Attention by design: Using attention checks to detect inattentive respondents and improve data quality

  • Research Article
  • 10.1177/00027642221132801
In the Mode. . .Text-to-Web Survey Data Collection: An Exploratory Study in Preelection Polling of the U.S. Presidential Election
  • Nov 5, 2022
  • American Behavioral Scientist
  • Spencer Kimball + 1 more

As our society rapidly employs new forms of communication, new modes of data collection are challenging the best practices developed over years of polling. Preelection polling must simultaneously evolve, as new modes have emerged in the past few decades, including computer-mediated communication, mobile texting, and the use of touch tone keypads to communicate information. A tension exists between traditional and novel means of interpersonal communication, and researchers are struggling to determine which traditional methods of data collection still have a place in the modern industry. This study examined three relatively new modes of preelection poll data collection, online, mobile, and IVR (interactive voice recognition) to determine what relationships exist, if any, between the mode of data collection and the composition of a sample across eight demographic variables: age, education, gender, political affiliation, race, region, 2016 Vote History, and 2020 Vote Intention. Twenty-six preelection polls were used in the study, with each poll ranging in collection dates between August 30 and October 31, 2020. The total combined sample size for this study is n = 19,886; 49% were IVR respondents ( n = 9,795), 25% was collected from online panels ( n = 5,039), and 25% was collected from short message service (SMS)-to-web respondents ( n = 5,052). A χ2 (chi-square) test for association was conducted using a significance level of p < .05 and a 95% confidence interval (CI) and found a significant difference between each mode of data collection across the eight aforementioned variables. A significant difference between political party affiliation/registration and mode of data collection was attributed to the educational attainment of individuals participating in each preelection polls based on the mode of data collection. This study suggests that underlying variables within the sample composition of different modes of data collection can have an impact on the accuracy of preelection polls.

  • Research Article
  • Cite Count Icon 53
  • 10.2196/jmir.7331
Building the Evidence Base for Remote Data Collection in Low- and Middle-Income Countries: Comparing Reliability and Accuracy Across Survey Modalities.
  • May 5, 2017
  • Journal of Medical Internet Research
  • Abigail R Greenleaf + 4 more

BackgroundGiven the growing interest in mobile data collection due to the proliferation of mobile phone ownership and network coverage in low- and middle-income countries (LMICs), we synthesized the evidence comparing estimates of health outcomes from multiple modes of data collection. In particular, we reviewed studies that compared a mode of remote data collection with at least one other mode of data collection to identify mode effects and areas for further research.ObjectiveThe study systematically reviewed and summarized the findings from articles and reports that compare a mode of remote data collection to at least one other mode. The aim of this synthesis was to assess the reliability and accuracy of results.MethodsSeven online databases were systematically searched for primary and grey literature pertaining to remote data collection in LMICs. Remote data collection included interactive voice response (IVR), computer-assisted telephone interviews (CATI), short message service (SMS), self-administered questionnaires (SAQ), and Web surveys. Two authors of this study reviewed the abstracts to identify articles which met the primary inclusion criteria. These criteria required that the survey collected the data from the respondent via mobile phone or landline. Articles that met the primary screening criteria were read in full and were screened using secondary inclusion criteria. The four secondary inclusion criteria were that two or more modes of data collection were compared, at least one mode of data collection in the study was a mobile phone survey, the study had to be conducted in a LMIC, and finally, the study should include a health component.ResultsOf the 11,568 articles screened, 10 articles were included in this study. Seven distinct modes of remote data collection were identified: CATI, SMS (singular sitting and modular design), IVR, SAQ, and Web surveys (mobile phone and personal computer). CATI was the most frequent remote mode (n=5 articles). Of the three in-person modes (face-to-face [FTF], in-person SAQ, and in-person IVR), FTF was the most common (n=11) mode. The 10 articles made 25 mode comparisons, of which 12 comparisons were from a single article. Six of the 10 articles included sensitive questions.ConclusionsThis literature review summarizes the existing research about remote data collection in LMICs. Due to both heterogeneity of outcomes and the limited number of comparisons, this literature review is best positioned to present the current evidence and knowledge gaps rather than attempt to draw conclusions. In order to advance the field of remote data collection, studies that employ standardized sampling methodologies and study designs are necessary to evaluate the potential for differences by survey modality.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.