Abstract
Gene families underlie genetic innovation and phenotypic diversification. However, our understanding of the early genomic and functional evolution of tandemly arranged gene families remains incomplete as paralog sequence similarity hinders their accurate characterization. The Drosophila melanogaster-specific gene family Sdic is tandemly repeated and impacts sperm competition. We scrutinized Sdic in 20 geographically diverse populations using reference-quality genome assemblies, read-depth methodologies, and qPCR, finding that ∼90% of the individuals harbor 3–7 copies as well as evidence of population differentiation. In strains with reliable gene annotations, copy number variation (CNV) and differential transposable element insertions distinguish one structurally distinct version of the Sdic region per strain. All 31 annotated copies featured protein-coding potential and, based on the protein variant encoded, were categorized into 13 paratypes differing in their 3′ ends, with 3–5 paratypes coexisting in any strain examined. Despite widespread gene conversion, the only copy present in all strains has functionally diverged at both coding and regulatory levels under positive selection. Contrary to artificial tandem duplications of the Sdic region that resulted in increased male expression, CNV in cosmopolitan strains did not correlate with expression levels, likely as a result of differential genome modifier composition. Duplicating the region did not enhance sperm competitiveness, suggesting a fitness cost at high expression levels or a plateau effect. Beyond facilitating a minimally optimal expression level, Sdic CNV acts as a catalyst of protein and regulatory diversity, showcasing a possible evolutionary path recently formed tandem multigene families can follow toward long-term consolidation in eukaryotic genomes.
Highlights
Structural variants have been largely overlooked in genetic variation surveys, limiting our understanding on the genetic basis of phenotypic change (Feyereisen et al 2015; Huddleston and Eichler 2016; Chakraborty et al 2019)
Occurring copy number variation (CNV) in the Sdic Region To generate a global portrait of Sdic CNV in D. melanogaster, we examined two different panels of strains
Thirteen of them correspond to strains from the Drosophila Synthetic Population Resource (DSPR) and are virtually isogenic (King, Merkes, et al 2012); Clifton et al . doi:10.1093/molbev/msaa109
Summary
Structural variants have been largely overlooked in genetic variation surveys, limiting our understanding on the genetic basis of phenotypic change (Feyereisen et al 2015; Huddleston and Eichler 2016; Chakraborty et al 2019). Structural variants include >50-nt-long duplications and deletions, transpositions, inversions, and translocations Complex genomic regions, those that exhibit unusually high levels of structural variation often in the form multiple copies of particular, high identity sequences generated by some kind of duplicative mechanism, are predominantly affected by this oversight. Those that exhibit unusually high levels of structural variation often in the form multiple copies of particular, high identity sequences generated by some kind of duplicative mechanism, are predominantly affected by this oversight These regions are often grossly misassembled or absent altogether in reference genome assemblies (Hollox 2012; Ranz and Clifton 2019).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have