Abstract

Insertions and deletions (INDELs) remain understudied, despite being the most common form of genetic variation after single nucleotide polymorphisms. This stems partly from the challenge of correctly identifying the ancestral state of an INDEL and thus identifying it as an insertion or a deletion. Erroneously assigned ancestral states can skew the site frequency spectrum, leading to artificial signals of selection. Consequently, the selective pressures acting on INDELs are, at present, poorly resolved. To tackle this issue, we have recently published a maximum likelihood approach to estimate the mutation rate and the distribution of fitness effects for INDELs. Our approach estimates and controls for the rate of ancestral state misidentification, overcoming issues plaguing previous INDEL studies. Here, we apply the method to INDEL polymorphism data from ten high coverage (∼44×) European great tit (Parus major) genomes. We demonstrate that coding INDELs are under strong purifying selection with a small proportion making it into the population (∼4%). However, among fixed coding INDELs, 71% of insertions and 86% of deletions are fixed by positive selection. In noncoding regions, we estimate ∼80% of insertions and ∼52% of deletions are effectively neutral, the remainder show signatures of purifying selection. Additionally, we see evidence of linked selection reducing INDEL diversity below background levels, both in proximity to exons and in areas of low recombination.

Highlights

  • Insertion and deletion mutations (INDELs) are an important source of genetic variation, often separated into long and short INDELs due to different calling approaches required for longer variants

  • Using the high coverage resequencing data from Corcoran et al (2017) we called polymorphic INDELs and single nucleotide polymorphisms (SNPs) according to a GATK based pipeline (Van der Auwera et al, 2013)

  • We find that the use of either neutral reference results in a very similar bimodal distribution of fitness effects (DFE), with a majority of INDELs being strongly deleterious, and a minority weakly deleterious (Table 2)

Read more

Summary

Introduction

Insertion and deletion mutations (INDELs) are an important source of genetic variation, often separated into long and short INDELs due to different calling approaches required for longer variants. There is one short INDEL (here ≤50bp) for every 8 single nucleotide polymorphisms (SNPs) in humans (Montgomery et al, 2013), representing a significant proportion of variation. INDEL studies, are under-represented in the literature In part, this is due to the need to categorise INDELs into insertions and deletions, which requires knowledge of the ancestral state for each variant. This is due to the need to categorise INDELs into insertions and deletions, which requires knowledge of the ancestral state for each variant This can be obtained using multi-species genome alignments. The result is a proportion of deletions are mistakenly identified as insertions (and visa versa), which can confound estimates of selection (Kvikstad and Duret, 2014) (see figure 1 in Barton and Zeng (2018))

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.