Abstract

DNA Replication plays the most crucial part in biological inheritance, ensuring an even flow of genetic information from parent to offspring. The beginning site of DNA Replication which is called the Origin of Replication (ORI), plays a significant role in understanding the molecular mechanisms and genomic analysis of DNA. Hence, it is paramount to accurately identify the origin of replication to gain a more accurate understanding of the biochemical and genomic properties of DNA. In this paper, We have proposed a new approach named OriC-ENS that uses sequence-based feature extraction techniques, K-mer, K-gapped Mono-Di, and Di Mono, and an ensemble classification technique that uses majority voting for the identification of Origin of Replication. We have used three SVM classifiers, one for the K-mer features and two more for K-Gapped Mono-Di and K-Gapped Di-mono features. Finally, we used majority voting to combine the prediction by each predictor. Experimental results on the S. Cerevisiae dataset have shown that our method achieves an accuracy of 91.62 % which outperforms other state-of-the-art methods by a significant margin. We have also tested our method using other evaluation metrics such as Matthews Correlation Coefficient (MCC), Area Under Curve(AUC), Sensitivity, and Specificity, where it has achieved a score of 0.83, 0.98, 0.90, and 0.92 respectively. We have further evaluated our model on an independent test set collected from OriDB, consisting of the sequences of Schizosaccharomyces pombe where we have seen that our model can predict the origin of replication efficiently and with great precision. We have made our python-based source code available at https://github.com/MehediAzim/OriC-ENS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.