Abstract

The growing number of next-generation sequencing (NGS) data presents a unique opportunity to study the combined impact of mitochondrial and nuclear-encoded genetic variation in complex disease. Mitochondrial DNA variants and in particular, heteroplasmic variants, are critical for determining human disease severity. While there are approaches for obtaining mitochondrial DNA variants from NGS data, these software do not account for the unique characteristics of mitochondrial genetics and can be inaccurate even for homoplasmic variants. We introduce MitoScape, a novel, big-data, software for extracting mitochondrial DNA sequences from NGS. MitoScape adopts a novel departure from other algorithms by using machine learning to model the unique characteristics of mitochondrial genetics. We also employ a novel approach of using rho-zero (mitochondrial DNA-depleted) data to model nuclear-encoded mitochondrial sequences. We showed that MitoScape produces accurate heteroplasmy estimates using gold-standard mitochondrial DNA data. We provide a comprehensive comparison of the most common tools for obtaining mtDNA variants from NGS and showed that MitoScape had superior performance to compared tools in every statistically category we compared, including false positives and false negatives. By applying MitoScape to common disease examples, we illustrate how MitoScape facilitates important heteroplasmy-disease association discoveries by expanding upon a reported association between hypertrophic cardiomyopathy and mitochondrial haplogroup T in men (adjusted p-value = 0.003). The improved accuracy of mitochondrial DNA variants produced by MitoScape will be instrumental in diagnosing disease in the context of personalized medicine and clinical diagnostics.

Highlights

  • Both mitochondrial DNA and nuclear DNA variants are known to impair the function and structure of mitochondria, leading to primary mitochondrial disease [1]

  • Identifying mitochondrial DNA (mtDNA) sequence accurately is complicated by the presence of nuclear encoded mitochondrial sequences (NUMTs), which are homologous to mtDNA

  • We introduce MitoScape, a novel, big-data, software which models mitochondrial genetics through machine learning to accurately identify mtDNA sequence from next-generation sequencing (NGS) data

Read more

Summary

Introduction

Both mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) variants are known to impair the function and structure of mitochondria, leading to primary mitochondrial disease [1]. Studies have implicated mtDNA variants in a myriad of common, complex, human diseases, including cancer, cardiovascular disease, diabetes and neurodegenerative disease [2,3,4,5,6]. There is a need to interrogate both mtDNA and nDNA variants simultaneously in both primary mitochondrial and complex disease. Large-scale, next-generation sequencing (NGS) datasets are a valuable resource for retrospectively analyzing both mtDNA and nDNA variation in an array of common diseases. Today, such large datasets are both abundant and necessary in genetic association studies for overcoming biases and false negatives due to a lack of statistical power. The Cancer Mitochondrial Atlas (TCMA) identified signatures of mtDNA variation in different forms of cancer, using data from thousands of whole genome sequencing (WGS) samples [5]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call