Abstract
DNase I hypersensitive sites (DHSs) are regarded as those regions of chromatin that are sensitive to cleavage by the DNase I enzyme. Identification of DNase I hypersensitive sites will provide useful insights for discovering DNA's functional elements from the non-coding sequences in the biomedical research. Because of the significance for DNase I hypersensitive sites, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this paper, we develop a model named iDHSs-MFF by combining multiple fusion features and F-score features selection approach. The multiple fusion features include three auto-correlation descriptors based on the dinucleotide property matrix and the trinucleotide property matrix (TPM), Pseudo-DPM and Pseudo-TPM. Evaluation by the jackknife cross-validation indicates that the selected features by F-score are effective in the identification of DNase I hypersensitive sites. Experimental results on two benchmark datasets demonstrate that the proposed model outperforms some highly related models. Systematic application of this computational approach will greatly facilitate the analysis of transcriptional regulatory elements. The datasets and Matlab source codes are freely available at: https://github.com/shengli0201/Datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.