Abstract

A consistent debate is ongoing on genome-wide association studies (GWAs). A key point is the capability to identify low-penetrance variations across the human genome. Among the phenomena reducing the power of these analyses, phenocopy level (PE) hampers very seriously the investigation of complex diseases, as well known in neurological disorders, cancer, and likely of primary importance in human ageing. PE seems to be the norm, rather than the exception, especially when considering the role of epigenetics and environmental factors towards phenotype. Despite some attempts, no recognized solution has been proposed, particularly to estimate the effects of phenocopies on the study planning or its analysis design. We present a simulation, where we attempt to define more precisely how phenocopy impacts on different analytical methods under different scenarios. With our approach the critical role of phenocopy emerges, and the more the PE level increases the more the initial difficulty in detecting gene-gene interactions is amplified. In particular, our results show that strong main effects are not hampered by the presence of an increasing amount of phenocopy in the study sample, despite progressively reducing the significance of the association, if the study is sufficiently powered. On the opposite, when purely epistatic effects are simulated, the capability of identifying the association depends on several parameters, such as the strength of the interaction between the polymorphic variants, the penetrance of the polymorphism and the alleles (minor or major) which produce the combined effect and their frequency in the population. We conclude that the neglect of the possible presence of phenocopies in complex traits heavily affects the analysis of their genetic data.

Highlights

  • Highthroughput genetic analysis represents the present and the future in catching the genetic determinants of complex diseases[1,2,3,4,5,6]

  • The following datasets have been extracted from the population: a) 6 different case-control datasets with increasing phenocopy level generated with the method implemented within the software (PM1); b) 6 different case-control datasets with increasing phenocopy level generated with an alternative method (PM2) develop in our lab, as described in materials and methods; c) 6 pedigree datasets with increasing phenocopy level generated as implemented in genomeSIMLA

  • We can argue that the genetic scenario of the most important complex traits is not explainable in black and white, i.e. only by the presence of very rare variants yet to be discovered with sequencing or by the presence of purely epistatic effects

Read more

Summary

Introduction

Highthroughput genetic analysis represents the present and the future in catching the genetic determinants of complex diseases[1,2,3,4,5,6]. The most widely used statistical tests are single point statistics (chi-square, or Cochrane-Armitage test) along the genome; these tests can be integrated with haplotype (or multi-marker) analysis once the linkage disequilibrium (LD) structure is drawn and haplotype blocks have been identified. All these tests can be performed under different assumptions and with slightly different approaches, and multivariate analyses are generally performed. As for false positives, many different approaches have been proposed and, provided the sample collection to be large enough, a multi-stage design has been shown to be very effective in detecting key leads in the genome, often replicated in other populations.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call