Abstract

Identifying cis-regulatory modules (CRMs) is one of the important challenges in molecular biology, and the current computing methods are still the main way to find CRMs. However, these methods generally have a problem of high false positive rate, and one of the ways to reduce the false positive rate is the parameter optimization. Overcoming the deficiency of traditional CRMs identification methods, an alignment-free statistic is proposed to predict the site of CRMs which is called D2S statistic. At the same time, two other statistics (D2 and D2star) are also proposed for comparison. The result shows that the accuracy of D2S is best in the three statistics for the different parameters k (k-tuple length value) and Markov order M. D2S performs very well when k is equal to 7 and M is equal to 1 by adjusting the parameters k and M according to the AUC curve. Thus statistic D2S can be used to predict the sites of CRMs so as to increase the sensitivity and specificity of predictive software for CRMs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call