Large data sets can be dangerous!

Robert E Drake,Gregory J Mchugo

doi:10.1176/appi.ps.54.2.133

Abstract

Back to table of contents Next article Taking IssueFull AccessLarge Data Sets Can Be Dangerous!Robert E. Drake, M.D., Ph.D., and Gregory J. McHugo, Ph.D., Robert E. DrakeSearch for more papers by this author, M.D., Ph.D., and Gregory J. McHugoSearch for more papers by this author, Ph.D., Dartmouth Medical School, Hanover, New HampshirePublished Online:1 Feb 2003https://doi.org/10.1176/appi.ps.54.2.133AboutSectionsPDF/EPUB ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail Researchers generally believe in the advantages of having more data, often as an antidote to problems with recruiting, retention, and statistical power. Yet the increasing availability of large administrative databases and computerized clinical records and the easy manipulation of data by computerized statistical packages have created a different set of problems that journal reviewers now encounter more commonly. Among the problems are poor quality of data, statistical significance without meaningfulness, the use of multiple tests that capitalize on chance, and post hoc interpretations.First, data collected for purposes other than research—for example, for billing or for clinical records—are, as a general rule, rarely of research quality. To complicate matters, researchers often have little information on the reliability and validity of such data. The danger is that invalid data are used for invalid analyses that lead to invalid conclusions—a common occurrence.Second, very large samples yield numerous statistically significant but meaningless associations for a variety of well-documented reasons, such as similar biases that apply across the measures. Statistically significant findings are unimportant when they reflect measurement errors or represent tiny differences that do not approach clinical significance. Without studying measurement accuracy and specifying a meaningful difference a priori, researchers sometimes synthesize a pattern of trivial findings into a publishable paper.Third, with computers and large data sets, the temptation to sift through numerous associations and pick out the ones that seem to fit the investigators' hypotheses—or, even worse, the ones that seem to cohere according to post hoc explanations—is ever present. Many investigators do not report all the tests they have run or all the variables they have examined and do not correct for multiple tests. The inevitable result is a proliferation of type 1 errors.Fourth, large existing data sets encourage investigators to look for research questions that fit the data—usually imperfectly—rather than find data that can answer a meaningful question. For example, investigators are tempted to use whatever comparison group exists rather than a group that makes sense on the basis of logic and a priori hypotheses.What is to be done? Researchers can emphasize research ethics, oversight by senior researchers, the criterion of common sense in research training, more quality and less quantity of publications, and adherence to scientific standards. Mental health journals are necessarily adopting new standards for disclosure and review, such as the use of effect sizes and corrections for multiple tests. FiguresReferencesCited byDetailsCited byEthically Using Administrative Data in Research13 December 2010 | Administration & Society, Vol. 43, No. 2Journal of Child and Family Studies, Vol. 20, No. 5Delinquent Behavior Across Adolescence: Investigating the Shifting Salience of Key Criminological Predictors20 December 2010 | Deviant Behavior, Vol. 32, No. 1Population health surveys: An introduction to basic conceptsInternational Journal of Therapy and Rehabilitation, Vol. 16, No. 4 Volume 54Issue 2 February 2003Pages 133-133 Metrics PDF download History Published online 1 February 2003 Published in print 1 February 2003

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large data sets can be dangerous!

Abstract

Talk to us

Similar Papers

More From: Psychiatric Services

Lead the way for us

Journal: Psychiatric Services	Publication Date: Feb 1, 2003
Citations: 11

Similar Papers

Postmarket Surveillance for Drug-Eluting Coronary Stents
Donald S Baim ... David Malenka
Circulation | VOL. 113
Donald S Baim, et. al.Donald S Baim ... David Malenka
06 Feb 2006
Circulation | VOL. 113

Multiple testing in orthopedic literature: a common problem?
Monique Mj Walenkamp ... Mohit Bhandari
BMC Research Notes | VOL. 6
Monique Mj Walenkamp, et. al.Monique Mj Walenkamp ... Mohit Bhandari
21 Sep 2013
BMC Research Notes | VOL. 6

Multiple significance tests and their importance in the judging of results. A quality analysis of the journal Strahlentherapie und Onkologie
Hans-Peter Beck-Bornholdt ... Hans-Hermann Dubben
Strahlentherapie und Onkologie : Organ der Deutschen Rontgengesellschaft ... [et al] | VOL. 176
Hans-Peter Beck-Bornholdt, et. al.Hans-Peter Beck-Bornholdt ... Hans-Hermann Dubben
01 Aug 2000
Strahlentherapie und Onkologie : Organ der Deutschen Rontgengesellschaft ... [et al] | VOL. 176

The role of social relationships in the recovery from psychotic disorders.
...
American Journal of Psychiatry | VOL. 141
, et. al. ...
01 Aug 1984
American Journal of Psychiatry | VOL. 141

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large data sets can be dangerous!

Abstract

Talk to us

Similar Papers

More From: Psychiatric Services