Abstract
As the use of next-generation sequencing (NGS) for the Mendelian diseases diagnosis is expanding, the performance of this method has to be improved in order to achieve higher quality. Typically, performance measures are considered to be designed in the context of each application and, therefore, account for a spectrum of clinically relevant variants. We present EphaGen, a new computational methodology for bioinformatics quality control (QC). Given a single NGS dataset in BAM format and a pre-compiled VCF-file of targeted clinically relevant variants it associates this dataset with a single arbiter parameter. Intrinsically, EphaGen estimates the probability to miss any variant from the defined spectrum within a particular NGS dataset. Such performance measure virtually resembles the diagnostic sensitivity of given NGS dataset. Here we present case studies of the use of EphaGen in context of BRCA1/2 and CFTR sequencing in a series of 14 runs across 43 blood samples and 504 publically available NGS datasets. EphaGen is superior to conventional bioinformatics metrics such as coverage depth and coverage uniformity. We recommend using this software as a QC step in NGS studies in the clinical context. Availability: https://github.com/m4merg/EphaGen or https://hub.docker.com/r/m4merg/ephagen.
Highlights
Next-generation sequencing has transformed the landscape of the whole field of medical genetics
We have developed EphaGen, an open-source application implemented in Perl/R, which can be used as a standalone version
Progress in the development of appropriate performance measures is essential to advancing applied science and engineering
Summary
Next-generation sequencing has transformed the landscape of the whole field of medical genetics. It enhanced the performance of the genetic testing as well as expanded and facilitated understanding of clinical genetics [1–4]. Decades of research efforts and routine testing shed light on the spectrum of variations in human genes, associated with a wide range of genetic disorders and their clinical significance in terms of variable penetrance and expressivity [5]. For the most wide-spread genetic diseases, numeric research collaborations and public databases provided information on common and population specific minor allele frequencies for clinically significant variants. As of May 2018, Breast Cancer Information Core database [6] contains information on relative clinically relevant variants across BRCA1 and BRCA2 genes, implicated in hereditary breast cancer development, based on the 11 344 affected population size
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.