Abstract
BackgroundWhole exome sequencing (WES) has recently emerged as an effective approach for identifying genetic variants underlying human diseases. However, considerable time and labour is needed for careful investigation of candidate variants. Although filtration based on population frequencies and functional prediction scores could effectively remove common and neutral variants, hundreds or even thousands of rare deleterious variants still remain. In addition, current WES platforms also provide variant information in flanking noncoding regions, such as promoters, introns and splice sites. Despite of being recognized to harbour causal variants, these regions are usually ignored by current analysis pipelines.ResultsWe present a novel computational method, called Glints, to overcome the above limitations. Glints is capable of identifying disease-causing SNVs in both coding and flanking noncoding regions from exome sequencing data. The principle behind Glints is that disease-causing variants should manifest their effect at both variant and gene levels. Specifically, Glints integrates 14 types of functional scores, including predictions for both coding and noncoding variants, and 9 types of association scores, which help identifying disease relevant genes. We conducted a large-scale simulation studies based on 1000 Genomes Project data and demonstrated the effectiveness of our method in both coding and flanking noncoding regions. We also applied Glints in two real exome sequencing and demonstrated its effectiveness for uncovering disease-causing SNVs. Both standalone software and web server are available at our website http://bioinfo.au.tsinghua.edu.cn/jianglab/glints.ConclusionsGlints is effective for uncovering disease-causing SNVs in coding and flanking noncoding regions, which is supported by both simulation and real case studies. Glints is expected to be a useful tool for human genetics research based on exome sequencing data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1325-x) contains supplementary material, which is available to authorized users.
Highlights
Whole exome sequencing (WES) has recently emerged as an effective approach for identifying genetic variants underlying human diseases
Exon refers to nonsynonymous SNV in coding region, Promoter refers to regions overlapping 500 bp upstream of TSS plus UTR5 and UTR3 regions, Intron refers to inner 3–10 bp regions from exon/intron boundaries, and Splice site refers to inner 2 bp regions from exon/intron boundaries
We annotate each variant with functional prediction scores of its functionally damaging effect according to its group information
Summary
Whole exome sequencing (WES) has recently emerged as an effective approach for identifying genetic variants underlying human diseases. Such prediction scores, though having been announced with high accuracy in such public data sets as HGMD [11], Siwss-prot [12] and ClinVar [13], usually have high false positives and low explanatory power in real experimental studies [14, 15] To overcome this limitation, the second group of methods, represented by eXtasy [16], SPRING [17] and snvForest [18], integrate multiple functional predictions of variants, association information between genes and diseases, as well as phenotype information to prioritize candidate variants. This strategy has the local property because only diseases having very high phenotype similarity with the query disease contribute to the inference procedure
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have