Bioinformatic removal of NUMT-associated variants in mitotiling next-generation sequencing data from whole blood samples.

Charla Marshall,Joseph David Ring,Kimberly Sturk-Andreaggi,Michelle Alyse Peck

doi:10.1002/elps.201800135

Abstract

Nuclear mitochondrial DNA segments (NUMTs) have arisen because of the transposition of segments of the mitochondrial DNA genome (mitogenome) into the nuclear genome. When using a "mitotiling" strategy, NUMTs may be more readily amplified when targeting the entire mitogenome compared to the control region, as hundreds of primers are required for complete sequencing coverage. In samples with a high percentage of nuclear DNA copies per cell, such as whole blood, NUMT coenrichment may be exacerbated. The present study examined bioinformatic approaches for removing NUMTs and NUMT-associated variants (NAVs) from next-generation sequence data generated using two mitotiling kits (Precision ID and QIAseq). Across 16 samples with low mtDNA copy number, NUMT coenrichment produced 890 NAVs with >5% variant frequency. The use of the consensus sequence to eliminate NUMT reads proved to be effective for QIAseq data, and resulted in >85% NAV removal in Precision ID data. This method was bolstered by NAV filtering in Precision ID analysis. Alternative high stringency mapping to the revised Cambridge Reference Sequence (rCRS) and the human genome reference GRCh38 for the QIAseq data caused a reduction in mitogenome coverage without complete NUMT removal. These bioinformatic solutions facilitate mitotiling sequence data analysis for low-level variant detection.

Full Text