Abstract
Enhancers are short DNA regulatory elements which play a vital role in gene expression. Due to their important roles in genomics, several computational models have been proposed in the literature for identification of enhancers and their strengths using traditional machine learning algorithms, however, the proposed models are unable to identify enhancers and their strength with reasonable accuracy because of high non-linearity in DNA sequences. This article proposes a two-level intelligent model based on Deep Neural Network (DNN) along with multiple feature extraction methods. Firstly, the proposed model represents the given DNA sequences into feature vectors using Pseudo K-tuple Nucleotide Composition (PseKNC) and FastText methods. Secondly, the features vectors are fused to make a heterogeneous features vector that considered the local and global correlation amongst the given sequences along with internal structure information. Finally, the heterogeneous feature vector is given to a DNN model to make final predictions. The proposed iEnhancer-DHF is developed using two-layer approach. The first layer predicts whether the given DNA samples are enhancers or non-enhancers whereas the second layer identifies either the enhancers are strong enhancers or weak enhancers. The outcome of the proposed model was rigorously assessed using both training and independent datasets via 10-fold cross validation method. The validation outcome demonstrated that the iEnhancer-DHF model yielded accuracies 86.07% and 69.60% at first layer and second layer respectively utilizing the training dataset. Similarly, the model yielded accuracies 83.21% and 67.54% at first layer and at second layer respectively by using the independent dataset. Additionally, the outcomes of the proposed model was initially compared with widely applied classifiers such as Support Vector Machine, Random Forest and K-nearest Neighbor and subsequently the performance is compared with the existing models using both the training and independent datasets. The comparison results exhibited that the iEnhancer-DHF model performed superior than the recently published models.
Highlights
The paper presented a reliable and robust predictor based on Deep Neural Network (DNN) for prediction of enhancers and their strengths using FastText, Pseudo K-tuple Nucleotide Composition (PseKNC) and heterogeneous features
The paper combined the FastText and the PseKNC features vectors to construct a heterogeneous features vector having a large number of diverse features
The performance of the DNN was compared with widely used machine learning algorithms and the comparison results showed the DNN performed better than the conventional learning algorithms
Summary
An enhancer is a short (50– 1500bp) element of DNA which performs a. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. A huge number of enhancers can be found in the human genome i.e. prokaryotes and eukaryotes [4]. Genetic variations in human enhancers are associated with several diseases i.e. inflammatory bowel and cancer diseases [5]–[7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.