Mitochondrial DNA (mtDNA) has an important yet often overlooked role in health and disease. Constraint models quantify the removal of deleterious variation from the population by selection and represent powerful tools for identifying genetic variation that underlies human phenotypes1-4. However, nuclear constraint models are not applicable to mtDNA, owing to its distinct features. Here we describe the development of a mitochondrial genome constraint model and its application to the Genome Aggregation Database (gnomAD), a large-scale population dataset that reports mtDNA variation across 56,434 human participants5. Specifically, we analyse constraint by comparing the observed variation in gnomAD to that expected under neutrality, which was calculated using a mtDNA mutational model and observed maximum heteroplasmy-level data. Our results highlight strong depletion of expected variation, which suggests that many deleterious mtDNA variants remain undetected. To aid their discovery, we compute constraint metrics for every mitochondrial protein, tRNA and rRNA gene, which revealed a range of intolerance to variation. We further characterize the most constrained regions within genes through regional constraint and identify the most constrained sites within the entire mitochondrial genome through local constraint, which showed enrichment of pathogenic variation. Constraint also clustered in three-dimensional structures, which provided insight into functionally important domains and their disease relevance. Notably, we identify constraint at often overlooked sites, including in rRNA and noncoding regions. Last, we demonstrate that these metrics can improve the discovery of deleterious variation that underlies rare and common phenotypes.
Read full abstract