A hierarchical clustering approach to identify repeated enrollments in web survey data.

Elizabeth A Handorf,Lee Ritterband,Susan Darlow,Michael Slifker,Carolyn J Heckman,Ivo D Dinov

doi:10.1371/journal.pone.0204394

Elizabeth A Handorf, Lee Ritterband + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0204394

Copy DOI

Abstract

IntroductionOnline surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants. Particularly, if a study offers incentives, some participants may attempt to enroll multiple times. We propose a method to identify clusters of non-independent enrollments in a web-based study, motivated by an analysis of survey data which tests the effectiveness of an online skin-cancer risk reduction program.MethodsTo identify groups of enrollments, we used a hierarchical clustering algorithm based on the Euclidean distance matrix formed by participant responses to a series of Likert-type eligibility questions. We then systematically identified clusters that are unusual in terms of both size and similarity, by repeatedly simulating datasets from the empirical distribution of responses under the assumption of independent enrollments. By performing the clustering algorithm on the simulated datasets, we determined the distribution of cluster size and similarity under independence, which is then used to identify groups of outliers in the observed data. Next, we assessed 12 other quality indicators, including previously proposed and study-specific measures. We summarized the quality measures by cluster membership, and compared the cluster groupings to those found when using the quality indicators with latent class modeling.Results and conclusionsWhen we excluded the clustered enrollments and/or lower-quality latent classes from the analysis of study outcomes, the estimates of the intervention effect were larger. This demonstrates how including repeat or low quality participants can introduce bias into a web-based study. As much as is possible, web-based surveys should be designed to verify participant quality. Our method can be used to verify survey quality and identify problematic groups of enrollments when necessary.

Highlights

Online surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants
Applying the hierarchical clustering approach described in Section 2.2.1, we created a dendrogram to illustrate the structure of the study data (Fig 1)
As cluster definition depended on height, we explored several thresholds: 4.5, 5.0, and 5.5

Summary

Objectives

Their goal was to identify stable clusters encompassing the whole dataset, while our objective was to identify large, unusually similar groups of responses

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Sep 25, 2018
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A hierarchical clustering approach to identify repeated enrollments in web survey data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Strategies and Lessons Learned During Cleaning of Data From Research Panel Participants: Cross-sectional Web-Based Health Behavior Survey Study
Mariana Arevalo ... Anna R Giuliano
JMIR Formative Research | VOL. 6
Mariana Arevalo, et. al.Mariana Arevalo ... Anna R Giuliano
23 Jun 2022
JMIR Formative Research | VOL. 6

Accurate measurement of field size is essential for analysis of smallholder survey data
Rica Joy Flor ... Shen Yuan
Field Crops Research | VOL. 311
Rica Joy Flor, et. al.Rica Joy Flor ... Shen Yuan
22 Apr 2024
Field Crops Research | VOL. 311

Are There Distinct Cardiovascular Subclasses in Acute Respiratory Distress Syndrome? Maybe.
Pratik Sinha ... Patrick R Lawler
Critical Care Medicine | VOL. 51
Pratik Sinha, et. al.Pratik Sinha ... Patrick R Lawler
18 Mar 2023
Critical Care Medicine | VOL. 51

Clusters Across Multiple Domains of Health-Related Quality of Life Reveal Complex Patient Outcomes After Subarachnoid Hemorrhage.
Julianne Murphy ... Yuan Luo
Critical Care Explorations | VOL. 3
Julianne Murphy, et. al.Julianne Murphy ... Yuan Luo
14 Sep 2021
Critical Care Explorations | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hierarchical clustering approach to identify repeated enrollments in web survey data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE