A Framework to Improve Data Quality and Manage Dropout in Web-Based Medical Surveys: Insights from an Ai Awareness Study among Italian Physicians

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background Ensuring data quality in self-reported online surveys remains a critical challenge in digital health research, particularly when targeting healthcare professionals [1,2]. Self-reported data are susceptible to multiple biases, including careless responding, social desirability bias, and dropout-related attrition, all of which may compromise the validity of findings [3,4]. In web-based surveys where researcher oversight is limited, structured quality control measures are essential to detect low-quality responses, minimise sampling bias, and enhance data reliability [5]. Previous studies have demonstrated that inadequate quality checks can lead to inflated error rates, reduced statistical power, and misleading conclusions [6]. Objective This study presents a comprehensive methodological framework for optimising data quality in web-based medical surveys, applied to a national study on AI awareness among Italian physicians. Integrating pre-survey validation, real-time dashboards, response-time filtering, and post-hoc careless responding detection would address key challenges in digital research, while providing a replicable model for future studies. Methods We conducted a national web-based survey using a validated instrument (doi:10.1101/2025.04.11.25325592) via the LimeSurvey platform. The survey incorporated two main sections: (1) a core module assessing knowledge, attitudes and practices regarding AI in medicine; (2) clinical scenarios evaluating diagnostic agreement with AI-generated proposals. Multiple quality control strategies were implemented throughout the survey lifecycle. In terms of survey design and logic, the questionnaire employed an adaptive flow structure, whereby respondents were routed through clinical scenarios relevant to their medical speciality. To reduce the incidence of partial completions and missing data, key questions were marked as mandatory, and completion status was actively tracked. In the monitoring and recruitment phase, a real-time dashboard monitored participant distribution (gender/geographical areas/speciality); referral links were rotated to minimise snowball bias [7]. Time-based data quality checks excluded outliers (completion time <1st or >99th percentile) [8]. Completion time for the first section was analysed for all completers to assess correlations between response speed and quality indicators. Dropout patterns were analysed using Kaplan-Meier survival analysis and logistic regression, to identify systematic attrition predictors. Data quality assessments were performed on the outlier-cleaned dataset (n=587). Response quality was assessed using complementary careless responding indicators applied specifically to opinion scale items (Likert 1-5). Two detection methods were used: low response variance analysis, identifying respondents with insufficient variability (SD < 0.5), and excessive same-response detection, flagging participants using identical responses for >75% of items. Internal consistency analysis (Cronbach's α) evaluated scale reliability across different quality levels. Results A total of 736 accesses were recorded on the survey platform. As an initial inclusion criterion, only participants who indicated current registration with the Italian Medical Council were considered eligible: 79 (10.7%) were excluded, yielding a sample of 657 eligible participants (89.3%). Among eligible respondents, 597 completed the first section, yielding a dropout rate of 9.1% (n=60). A Kaplan-Meier survival analysis using total survey time revealed that most dropouts occurred early, with critical points at 45% after demographic, 51% after personal AI knowledge items, 71% after opinion items, and 100% before clinical scenarios. Logistic regression showed no significant predictors of completion (LR χ²(6)=3.46, p= 0.7497; pseudo-R²=0.014; AUC=0.60, 95%CI: 0.50–0.70). Completion time showed no correlation with response quality (Spearman's ρ = -0.019, p = 0.645). Following outlier removal, data quality assessment among 587 who completed the first section revealed two complementary patterns of careless responding: 8.52% (n=50) exhibited low response variance, while 32 (5.45%) demonstrated excessive same response patterns. Cross-classification analysis showed 23 participants (3.92%) flagged by both indicators, with 71.88% of excessive same responders also showing low variance. Overall, 50 participants (10.05%, 95% CI: 7.9%- 12.8%) exhibited careless responding detectable by at least one indicator. Internal consistency analysis showed robust scale reliability (Cronbach's α = 0.754) that remained stable across quality levels. Conclusion The integration of real-time monitoring, adaptive design, time-based validation, and systematic careless responding detection provides a robust methodological framework for web-based medical surveys, particularly for complex topics like AI adoption. Comprehensive data quality assessment revealed a 10.05% careless responding rate among completers, which aligns with the literature. The absence of correlation between completion time and response quality shows that careless responding could reflect attentional rather than temporal factors. Our findings suggest that both phenomena likely reflect situational or contextual factors rather than systematic participant characteristics or survey design flaws. This supports the validity and generalizability of the final dataset while providing a replicable quality control framework for future web-based medical research.

Similar Papers
  • Front Matter
  • Cite Count Icon 39
  • 10.1111/add.13221
Unfaithful findings: identifying careless responding in addictions research.
  • Dec 14, 2015
  • Addiction
  • Alexandra Godinho + 2 more

Keywords: Careless responding; data cleaning; data integrity; invalid responding; online survey; research outcome quality

  • Research Article
  • 10.1111/acer.70024
Check your data before you wreck your model: The impact of careless responding on substance use data quality.
  • Mar 16, 2025
  • Alcohol, clinical & experimental research
  • Abby L Braitman + 6 more

The accuracy of survey responses is a concern in research data quality, especially in college student samples. However, examination of the impact of removing participants from analyses who respond inaccurately or carelessly is warranted given the potential for loss of information or sample diversity. This study aimed to understand if careless responding varies across a number of demographic indices, substance use behaviors, and the timing of survey completion. College students (N = 5809; 70.7% female; 75.7% White, non-Hispanic) enrolled in psychology classes from six universities completed an online survey assessing a variety of demographic and substance use-related information, which included four attention check questions dispersed throughout the hour-long survey. Differences in careless responding were assessed across multiple demographic groups, and we examined the impact of careless responding on data quality via a confirmatory factor analysis of a validated substance use measure, the Drinking Motives Questionnaire-Revised Short Form. Careless responding varied significantly by participant race, sex, gender, sexual orientation, and socioeconomic status. Substance use was generally unassociated with careless responding, though careless responding was associated with experiencing more alcohol-related problems. Careless responding was more prevalent when the survey was completed near the end of the semester. Finally, the factor structure of the drinking motives measure was affected by the inclusion of those who failed two or more attention check questions. Including attention checks in surveys is an effective method to detect and address careless responding. However, omitting participants from analyses who evidence any careless responding may bias the sample demographics. We discuss recommendations for the use of attention check questions in undergraduate substance use cross-sectional surveys, including retaining participants who fail only one attention check, as this has a minimal impact on data quality while preserving sample diversity.

  • Research Article
  • Cite Count Icon 5
  • 10.1080/15305058.2021.2019747
Survey mode and data quality: Careless responding across three modes in cross-cultural contexts
  • Dec 18, 2021
  • International Journal of Testing
  • Zoe Magraw-Mickelson + 2 more

Much psychological research depends on participants’ diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants’ diligence vs. “careless responding” tendencies and, thus, data quality? Results from prior studies suggest that different data collection modes produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diversified populations, this question needs to be readdressed and taken further. The present research examined the effect of survey mode on careless responding in a repeated-measures design with data from three different samples. First, in a sample of working adults from China, we found that participants were slightly more careless when completing computer/web-based survey materials than in paper/pencil mode. Next, in a German student sample, participants were slightly more careless when completing the paper/pencil mode compared to the smartphone mode. Finally, in a sample of Chinese-speaking students, we found no difference between modes. Overall, in a meta-analysis of the findings, we found minimal difference between modes across cultures. Theoretical and practical implications are discussed.

  • Research Article
  • 10.5465/ambpp.2019.16004abstract
Survey Mode and Data Quality: A Cross-Cultural Comparison of Careless Responding Across Three Modes
  • Aug 1, 2019
  • Academy of Management Proceedings
  • Zoe Magraw‐Mickelson + 2 more

Much psychological research depends on participants’ diligence in filling out materials such as tests or surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with the quality of the data. Our question is: how do different modes of data collection - paper/pencil, computer/web-based, and smartphone - affect participants’ diligence vs. “careless responding” tendencies and, thus, the data quality? Results from prior studies suggest that different modes of data collection produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diverse populations, this question needs to be readdressed and taken further by looking at cultural differences. The present research examined the effect of survey mode on careless responding across three waves in a repeated-measures design. Following recommendations in the literature, we computed a careless responding index as a composite of eight indicators that capture aspects of a participant’s inattentiveness. In a sample of working adults from China, we found that participants were significantly more careless when completing computer/web-based survey materials than in paper/pencil mode. In a sample of German students, participants were significantly more careless when completing the paper/pencil mode compared to the smartphone mode. In a sample of Chinese-speaking students, we found no difference between the modes. This paper will discuss why these results deviate from past findings that investigated study modes and hypothesize about potential cross-cultural differences.

  • Research Article
  • Cite Count Icon 286
  • 10.1177/1073191120957102
The Effects of Sampling Frequency and Questionnaire Length on Perceived Burden, Compliance, and Careless Responding in Experience Sampling Data in a Student Population.
  • Sep 10, 2020
  • Assessment
  • Gudrun Eisele + 6 more

Currently, little is known about the association between assessment intensity, burden, data quantity, and data quality in experience sampling method (ESM) studies. Researchers therefore have insufficient information to make informed decisions about the design of their ESM study. Our aim was to investigate the effects of different sampling frequencies and questionnaire lengths on burden, compliance, and careless responding. Students (n = 163) received either a 30- or 60-item questionnaire three, six, or nine times per day for 14 days. Preregistered multilevel regression analyses and analyses of variance were used to analyze the effect of design condition on momentary outcomes, changes in those outcomes over time, and retrospective outcomes. Our findings offer support for increased burden and compromised data quantity and quality with longer questionnaires, but not with increased sampling frequency. We therefore advise against the use of long ESM questionnaires, while high-sampling frequencies do not seem to be associated with negative consequences.

  • Research Article
  • Cite Count Icon 74
  • 10.1093/jamia/ocaa245
Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data.
  • Nov 9, 2020
  • Journal of the American Medical Informatics Association : JAMIA
  • Jiang Bian + 11 more

ObjectiveTo synthesize data quality (DQ) dimensions and assessment methods of real-world data, especially electronic health records, through a systematic scoping review and to assess the practice of DQ assessment in the national Patient-centered Clinical Research Network (PCORnet).Materials and MethodsWe started with 3 widely cited DQ literature—2 reviews from Chan et al (2010) and Weiskopf et al (2013a) and 1 DQ framework from Kahn et al (2016)—and expanded our review systematically to cover relevant articles published up to February 2020. We extracted DQ dimensions and assessment methods from these studies, mapped their relationships, and organized a synthesized summarization of existing DQ dimensions and assessment methods. We reviewed the data checks employed by the PCORnet and mapped them to the synthesized DQ dimensions and methods.ResultsWe analyzed a total of 3 reviews, 20 DQ frameworks, and 226 DQ studies and extracted 14 DQ dimensions and 10 assessment methods. We found that completeness, concordance, and correctness/accuracy were commonly assessed. Element presence, validity check, and conformance were commonly used DQ assessment methods and were the main focuses of the PCORnet data checks.DiscussionDefinitions of DQ dimensions and methods were not consistent in the literature, and the DQ assessment practice was not evenly distributed (eg, usability and ease-of-use were rarely discussed). Challenges in DQ assessments, given the complex and heterogeneous nature of real-world data, exist.ConclusionThe practice of DQ assessment is still limited in scope. Future work is warranted to generate understandable, executable, and reusable DQ measures.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1093/acrefore/9780190224851.013.303
Careless Responding and Insufficient Effort Responding
  • Aug 31, 2021
  • Jason L Huang + 1 more

Careless responding, also known as insufficient effort responding, refers to survey/test respondents providing random, inattentive, or inconsistent answers to question items due to lack of effort in conforming to instructions, interpreting items, and/or providing accurate responses. Researchers often use these two terms interchangeably to describe deviant behaviors in survey/test responding that threaten data quality. Careless responding threatens the validity of research findings by bringing in random and systematic errors. Specifically, careless responding can reduce measurement reliability, while under specific circumstances it can also inflate the substantive relations between variables. Numerous factors can explain why careless responding happens (or does not happen), such as individual difference characteristics (e.g., conscientiousness), survey characteristics (e.g., survey length), and transient psychological states (e.g., positive and negative affect). To identify potential careless responding, researchers can use procedural detection methods and post hoc statistical methods. For example, researchers can insert detection items (e.g., infrequency items, instructed response items) into the questionnaire, monitor participants’ response time, and compute statistical indices, such as psychometric antonym/synonym, Mahalanobis distance, individual reliability, individual response variability, and model fit statistics. Application of multiple detection methods would be better able to capture careless responding given convergent evidence. Comparison of results based on data with and without careless respondents can help evaluate the degree to which the data are influenced by careless responding. To handle data contaminated by careless responding, researchers may choose to filter out identified careless respondents, recode careless responses as missing data, or include careless responding as a control variable in the analysis. To prevent careless responding, researchers have tried utilizing various deterrence methods developed from motivational and social interaction theories. These methods include giving warning, rewarding, or educational messages, proctoring the process of responding, and designing user-friendly surveys. Interest in careless responding has been growing not only in business and management but also in other related disciplines. Future research and practice on careless responding in the business and management areas can also benefit from findings in other related disciplines.

  • Research Article
  • Cite Count Icon 20
  • 10.5334/egems.286
DataGauge: A Practical Process for Systematically Designing and Implementing Quality Assessments of Repurposed Clinical Data.
  • Jul 25, 2019
  • eGEMs (Generating Evidence & Methods to improve patient outcomes)
  • Jose-Franck Diaz-Garelli + 5 more

The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ assessments to evaluate repurposed data for a specific secondary use. DataGauge is composed of five steps: (1) Define information needs, (2) Develop a formal Data Needs Model (DNM), (3) Use the DNM and DQ theory to develop goal-specific DQ assessment requirements, (4) Extract DNM-specified data, and (5) Evaluate according to DQ requirements. DataGauge’s main contribution is integrating general DQ theory and DQ assessment methods into a systematic process. This process supports the integration and practical implementation of existing Electronic Health Record-specific DQ assessment guidelines. DataGauge also provides an initial theory-based guidance framework that ties the DNM to DQ testing methods for each DQ dimension to aid the design of DQ assessments. This framework can be augmented with existing DQ guidelines to enable systematic assessment. DataGauge sets the stage for future systematic DQ assessment research by defining an assessment process, capable of adapting to a broad range of clinical datasets and secondary uses. Defining DataGauge sets the stage for new research directions such as DQ theory integration, DQ requirements portability research, DQ assessment tool development and DQ assessment tool usability.

  • Research Article
  • Cite Count Icon 7
  • 10.1037/met0000580
Modeling careless responding in ambulatory assessment studies using multilevel latent class analysis: Factors influencing careless responding.
  • May 11, 2023
  • Psychological methods
  • Kilian Hasselhorn + 2 more

As the number of studies using ambulatory assessment (AA) has been increasing across diverse fields of research, so has the necessity to identify potential threats to AA data quality such as careless responding. To date, careless responding has primarily been studied in cross-sectional surveys. The goal of the present research was to identify latent profiles of momentary careless responding on the occasion level and latent classes of individuals (who differ in the distribution of careless responding profiles across occasions) on the person level using multilevel latent class analysis (ML-LCA). We discuss which of the previously proposed indices seem promising for investigating careless responding in AA studies, and we show how ML-LCA can be applied to model careless responding in intensive longitudinal data. We used data from an AA study in which the sampling frequency (3 vs. 9 occasions per day, 7 days, n = 310 participants) was experimentally manipulated. We tested the effect of sampling frequency on careless responding using multigroup ML-LCA and investigated situational and respondent-level covariates. The results showed that four Level 1 profiles ("careful," "slow," and two types of "careless" responding) and four Level 2 classes ("careful," "frequently careless," and two types of "infrequently careless" respondents) could be identified. Sampling frequency did not have an effect on careless responding. On the person (but not the occasion) level, motivational variables were associated with careless responding. We hope that researchers might find the application of an ML-LCA approach useful to shed more light on factors influencing careless responding in AA studies. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.ijmedinf.2024.105381
Informing nursing policy: An exploration of digital health research by nurses in England
  • Feb 23, 2024
  • International Journal of Medical Informatics
  • Siobhan O'Connor + 2 more

Informing nursing policy: An exploration of digital health research by nurses in England

  • Conference Article
  • 10.4995/head25.2025.20018
The impact of careless responding on employability research
  • Jun 17, 2025
  • Inés Tomás + 3 more

Careless Responding (CR) compromises data quality in psychological and educational contexts. In this study we examine the impact of CR on research involving a key construct in Higher Education: personal employability. Particularly we assess how this impact depends on the strategy used to address CR by comparing four strategies: 1) using the total sample without taking any action, 2) eliminating careless respondents, 3) introducing CR as a control variable, 4) introducing CR as a moderating variable. Using a sample of 360 university graduates, results show that removing careless respondents reduces statistical power and some hypothesized employability effects become non-significant. In contrast, incorporating CR as a moderating variable was highly informative and yielded the best goodness-of-fit. These findings highlight the importance of considering CR behaviours and suggest that introducing CR as a moderator is an effective approach to maintaining data quality, statistical power and representativeness in employability research.

  • Research Article
  • Cite Count Icon 2
  • 10.1136/bmjopen-2023-075009
How digital health translational research is prioritised: a qualitative stakeholder-driven approach to decision support evaluation
  • Nov 1, 2023
  • BMJ Open
  • Adeola Bamgboje-Ayodele + 10 more

ObjectivesDigital health is now routinely being applied in clinical care, and with a variety of clinician-facing systems available, healthcare organisations are increasingly required to make decisions about technology implementation and...

  • Research Article
  • Cite Count Icon 428
  • 10.13063/2327-9214.1244
A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.
  • Sep 11, 2016
  • eGEMs (Generating Evidence & Methods to improve patient outcomes)
  • Michael G Kahn + 19 more

Objective:Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is ‘fit’ for specific uses.Materials and Methods:DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies.Results:Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.Discussion:Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.Conclusion:A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.

  • Research Article
  • Cite Count Icon 8
  • 10.4103/jphi.jphi_31_18
Digital health research: A scientometric assessment of global publications output during 2007–2016
  • Jan 1, 2018
  • International Journal of Pharmaceutical Investigation
  • Kk Mueen Ahmed + 2 more

Aim and Scope: To study the scientometric assessment of global publications on Digital Health Research. Methods: The paper examines digital health research covering 6981 global publications sourced from Scopus database during 2007–2016. Results: Digital health research across 109 countries registered 8.03% growth and averaged to 7.33 citations per paper. The top 10 most productive countries individually contributed 2.75% to 33.82% share to global publications output and together they accounted for 79.30% share during the period. Their international collaborative publications varied from 3% to 14.49%. Medicine is the most studied subject with largest publication share in digital health research (53.55%), followed by computer science (33.85%), engineering (24.97%), health profession (13.24%), and others. The top 20 most productive organizations and authors together contributed 12.32% and 2.99% of global publications share, respectively, and 38.91% and 3.28% of global citations share, respectively. The top 20 journals contributed 12.32% share to the global output in journals during 2007–2016. Of the total digital health research, 46 (0.65%) were highly cited papers, citations to them ranged from 100 to 1104 per paper, with 257.76 citations per paper. Conclusion: A total of 415 authors from 242 organizations contributed 46 highly cited papers which appeared in 37 journals. Four papers appeared in CA Cancer Journal of Clinicians , three papers in Annals of Internal Medicine , two papers each in European Urology , Journal of American Medical Informatics Association , New England Journal of Medicine , Pediatrics and Stroke , and one paper each in 30 other journals.

  • Research Article
  • 10.1017/psy.2025.10041
A Beta Mixture Model for Careless Respondent Detection in Visual Analogue Scale Data
  • Sep 1, 2025
  • Psychometrika
  • Lijin Zhang + 3 more

Visual Analogue scales (VASs) are increasingly popular in psychological, social, and medical research. However, VASs can also be more demanding for respondents, potentially leading to quicker disengagement and a higher risk of careless responding. Existing mixture modeling approaches for careless response detection have so far only been available for Likert-type and unbounded continuous data but have not been tailored to VAS data. This study introduces and evaluates a model-based approach specifically designed to detect and account for careless respondents in VAS data. We integrate existing measurement models for VASs with mixture item response theory models for identifying and modeling careless responding. Simulation results show that the proposed model effectively detects careless responding and recovers key parameters. We illustrate the model’s potential for identifying and accounting for careless responding using real data from both VASs and Likert scales. First, we show how the model can be used to compare careless responding across different scale types, revealing a higher proportion of careless respondents in VAS compared to Likert scale data. Second, we demonstrate that item parameters from the proposed model exhibit improved psychometric properties compared to those from a model that ignores careless responding. These findings underscore the model’s potential to enhance data quality by identifying and addressing careless responding.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.