Collagen type I alpha 1 (COL1a1), which encodes the primary subunit of type I collagen, the main structural and most abundant protein in vertebrates, harbors hundreds of mutations linked to human diseases like osteoporosis and osteogenesis imperfecta. Previous studies have attempted to predict the phenotypic severity associated with type I collagen mutations, yet an evolutionary analysis that compares historical and recent selective pressures, including across noncoding regions, has never been conducted. Here, we use a comparative genomic and species evolutionary analysis representing ∼450 My of vertebrate history to investigate functional constraints associated with both exons and introns of the >17-kb COL1a1 gene. We find that although the COL1a1 amino acid sequence is highly conserved, there are both spatial and temporal signatures of varying selective constraint across protein domains. Furthermore, sites of high evolutionary constraint significantly correlate with the location of disease-associated mutations, the latter of which also cluster with respect to specific severity classes typically categorized in clinical studies. Finally, we find that COL1a1 introns are significantly short in length with high GC content, patterns that are shared across highly diverged vertebrates, and which may be a signature of strong stabilizing selection for high COL1a1 gene expression. In conclusion, although previous studies focused on COL1a1 coding regions, the current results implicate introns as areas of high selective constraint and targets of bone-related phenotypic variation. From a broader perspective, our comparative evolutionary approach provides further resolution to models predicting mutations associated with bone-related function and disease severity.
Read full abstract