In the yeast, meiotic recombination is initiated by double-strand DNA breaks (DSBs) which occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Although observations concerning individual hot/cold spots have given clues as to the mechanism of recombination initiation, the prediction of hot/cold spots from DNA sequence information is a challenging task. In this article, we introduce a random forest (RF) prediction model to detect recombination hot/cold spots from yeast genome. The out-of-bag (OOB) estimation of the model indicated that the RF classifier achieved high prediction performance with 82.05% total accuracy and 0.638 Mattew's correlation coefficient (MCC) value. Compared with an alternative machine-learning algorithm, support vector machine (SVM), the RF method outperforms it in both sensitivity and specificity. The prediction model is implemented as a web server (RF-DYMHC) and it is freely available at http://www.bioinf.seu.edu.cn/Recombination/rf_dymhc.htm. Given a yeast genome and prediction parameters (RI-value and non-overlapping window scan size), the program reports the predicted hot/cold spots and marks them in color.
Read full abstract