Abstract

Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined recombination hot/cold spots based on public high-resolution Spo11-oligo-seq data, then characterized them in terms of DNA sequence and epigenetic marks, and finally presented classifiers to identify hotspots. We found that, in addition to some previously discovered DNA-based features like GC-skew, recombination hotspots in yeast can also be characterized by some remarkable features associated with DNA physical properties and shape. More importantly, by using DNA-based features and several epigenetic marks, we built several classifiers to discriminate hotspots from coldspots, and found that SVM classifier performs the best with an accuracy of ∼92%, which is also the highest among the models in comparison. Feature importance analysis combined with prediction results show that epigenetic marks and variation of sequence-based features along the hotspots contribute dominantly to hotspot identification. By using incremental feature selection method, an optimal feature subset that consists of much less features was obtained without sacrificing prediction accuracy.

Highlights

  • Meiotic recombination is crucial to gametogenesis as it helps the faithful separation of homologous chromosomes into gametes by forming chiasma (Coop and Przeworski, 2007)

  • We defined negative dataset of recombination coldspots as the genomic regions of at least 500 bp long with no Spo11-oligo signal based on the full Spo11oligo map (Pan et al, 2011). Defining coldspots in this way, we focus on relatively large cold regions with low recombination, which may not result from the noise or limited sequencing depth in Spo11-oligo seq

  • GC-skew shows a characteristic reversed skew between the two sides of the hotspot center, probably due to mutational bias (Smagulova et al, 2011); Mutual information has a dramatic peak at hotspot center (Figure 2), suggesting the possible biased usage of dinucleotides (Liu and Li, 2008); DNA shape parameters such as slide, shift, rise, helical twist, roll, stretch, opening, propeller twist, and minor groove width (MGW) show a peak or dip at the hotspot center (Figure 3)

Read more

Summary

INTRODUCTION

Meiotic recombination is crucial to gametogenesis as it helps the faithful separation of homologous chromosomes into gametes by forming chiasma (Coop and Przeworski, 2007). DNA shape and physical properties were shown to affect recombination hotspot identification (Chen et al, 2013), but the importance of individual DNA shape feature is unclear because they were implicitly incorporated in the model in the form of pseudo nucleotide composition. As far as we know, DNA shape parameter sets derived from different groups differ a lot (Liu et al, 2016), suggesting that the accuracy of the parameter estimation is unclear. In this aspect, it is worth noting that the DNA shape parameters are sequence contextdependent (Zhou et al, 2013), and context-dependent estimation of DNA shape parameters may assist hotspot prediction. Comparison with other models demonstrated the good performance of our model

MATERIALS AND METHODS
RESULTS AND DISCUSSION
Method
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call