Abstract

Whole-genome sequencing (WGS) data present a readily available resource for mitochondrial genome (mitogenome) haplotypes that can be utilized for genetics research including population studies. However, the reconstruction of the mitogenome is complicated by nuclear mitochondrial DNA (mtDNA) segments (NUMTs) that co-align with the mtDNA sequences and mimic authentic heteroplasmy. Two minimum variant detection thresholds, 5% and 10%, were assessed for the ability to produce authentic mitogenome haplotypes from a previously generated WGS dataset. Variants associated with NUMTs were detected in the mtDNA alignments for 91 of 917 (~8%) Swedish samples when the 5% frequency threshold was applied. The 413 observed NUMT variants were predominantly detected in two regions (nps 12,612–13,105 and 16,390–16,527), which were consistent with previously documented NUMTs. The number of NUMT variants was reduced by ~97% (400) using a 10% frequency threshold. Furthermore, the 5% frequency data were inconsistent with a platinum-quality mitogenome dataset with respect to observed heteroplasmy. These analyses illustrate that a 10% variant detection threshold may be necessary to ensure the generation of reliable mitogenome haplotypes from WGS data resources.

Highlights

  • Whole-genome sequencing (WGS) data generated for genetic variation studies are often submitted to public databases, such as GenBank [1], or made publicly available as part of the publication

  • WGS data proved difficult at a 5% frequency threshold, even with specialized bioinformatic workflows and multiple assessments to prevent the inclusion of low-level variants caused by nuclear mitochondrial DNA (mtDNA) segments (NUMTs)

  • The 13 NUMT variants that exceeded the 10% threshold were localized to a NUMT hotspot region and were clearly distinguishable from point heteroplasmies (PHPs)

Read more

Summary

Introduction

Whole-genome sequencing (WGS) data generated for genetic variation studies are often submitted to public databases, such as GenBank [1], or made publicly available as part of the publication. Public WGS datasets provide a wealth of mitochondrial genome (mitogenome) sequences that could benefit several fields, including population studies, disease association studies and forensic genetics. Donor metadata is often available since WGS researchers are interested in associations between genetic information and individual characteristics. These metadata, such as geographic origin, ancestry, metapopulation, phenotype, age and sex, may be useful in mitochondrial. WGS studies typically involve a large number of samples that are sequenced using next-generation sequencing (NGS) technologies and high-throughput instruments to make the study more cost-effective.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call