Abstract
In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions.
Highlights
The validation of findings in complex disease research The accepted gold standard for demonstrating associations in omic research settings, such as genome wide association studies, is the independent replication of preliminary findings [1]
If our goal is to find factors, such as genetic or environmental factors that contribute to pathophysiology, we need to consider whether using standard validation methodology alone provides the best approach
We propose an additional validation framework that can be used to enhance discovery and validation in omic research settings, such as transcriptome, exposome, and genome-wide association studies (GWAS)
Summary
The validation of findings in complex disease research The accepted gold standard for demonstrating associations in omic research settings, such as genome wide association studies, is the independent replication of preliminary findings [1]. Many large genetic epidemiology studies and meta-analyses do not use samples from one source population, and do not attempt replication per se, but validation [2]. This conventional confirmation process can help to minimize false positive findings, and in doing so provides fairly compelling evidence for the existence of true associations. In recent years it has become evident that chance, limited power, publication bias and a variety of other factors can make this evidence less compelling than it otherwise would be [3,4] This methodology can mask many true associations that would otherwise advance etiological research. Given that the efficacy and efficiency of research depends on reducing both false positive and false negative conclusions, validation approaches should be developed that can better prevent both types of erroneous conclusions
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have