Abstract

Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.

Highlights

  • Generation Sequencing (NGS) is widely used in clinical diagnosis as well as in population genetics to investigate patterns of genetic variants in healthy individuals

  • Variants were obtained from highquality Exome Aggregation Consortium (ExAC) database and selected to have minor allele frequency between 1 and 25%

  • We investigated further the performances on different populations, allele frequencies, separately for males and females, chromosome wise and for population unique and non-unique variants

Read more

Summary

Introduction

Generation Sequencing (NGS) is widely used in clinical diagnosis as well as in population genetics to investigate patterns of genetic variants in healthy individuals. There are on average about 10,000 variants per genome that cause amino acid substitutions [1]. Several databases enable annotation of disease relevance of variants and frequencies among healthy individuals. These include numerous locus specific variation databases (LSDBs) that are curated by experts in the genes and diseases. While LSDBs typically concentrate on individual genes and proteins or diseases, the general databases have much wider scope such as ClinVar [2], Online Mendelian Inheritance in Man (OMIM) [3] and the UniProt Knowledgebase (UniProtKB) [4]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.