Abstract

DNase I hypersensitive sites (DHSs) are highly sensitive active chromatin regions to DNase I enzymes, which provide the basis for the study of gene transcriptional regulation mechanism and play an important role in the analysis of gene expression regulatory elements. The identification of DHSs has contributed to biomedical research and genome analysis. There are already southern blotting technology and high-throughput sequencing technology to identify DHSs, but these experimental methods are often time-consuming and expensive, thus, novel and powerful computational methods are needed to predict DHSs. It is understood that researchers in related fields have proposed many feasible methods for the identification of DNase I hypersensitive sites. However, the accuracy of these methods is not satisfactory, so it is necessary to use more effective methods to predict DHSs. Therefore, on the basis of previous studies, we design a novel predictor called iDHS-DXG. First of all, we choose three sequence-derived feature representation methods to extract features, including kmer, mismatch and the dinucleotide property matrix based on Moran coefficient. Truncated singular value decomposition is selected for reducing the dimensionality of the benchmark dataset, and the optimal dimension is obtained through the test. Then, synthetic minority over-sampling technique is utilized to balance the positive and negative samples. After that, we introduce extreme gradient boosting ensemble classifier to predict DHSs. Compared with the previous research results, the main performance evaluation metrics of our method have been improved after five-fold cross-validation test. DHSs were identified on two human genome datasets with an accuracy of 90.84% and 91.27% respectively. This result shows that our method is a feasible, effective and competitive tool for the analysis of gene regulatory elements. Our research is helpful for biologists and geneticists to study genome analysis and gene regulation mechanism. Meanwhile, it is also of great significance to the development of human disease and drug design. Furthermore, the datasets and codes of iDHS-DXG can be obtained from the website: http://github.com/Xtian-696/iDHS-DXG/ .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.