Abstract

Although bioinformatics-based methods accurately identify SEs (Super-enhancers), the results depend on feature design. It is foundational to representing biological sequences and automatically extracting their key features for improving SE identification. We propose a deep learning model MuSE (Multi-Feature Fusion for Super-Enhancer), based on multi-feature fusion. This model utilizes two encoding methods, one-hot and DNA2Vec, to signify DNA sequences. Specifically, one-hot encoding reflects single nucleotide information, while k-mer representations based on DNA2Vec capture both local sequence fragment information and global sequence characteristics. These types of feature vectors are conducted and combined by neural networks, which aim at SE prediction. To validate the effectiveness of MuSE, we design extensive experiments on human and mouse species datasets. Compared to baselines such as SENet, MuSE improves the prediction of F1 score to a maximum improvement exceeding 0.05 on mouse species. The k-mer representations based on DNA2Vec among the given features have the most important impact on predictions. This feature effectively captures context semantic knowledge and positional information of DNA sequences. However, its representation of the individuality of each species negatively affects MuSE's generalization ability. Nevertheless, the cross-species prediction results of MuSE improve again to reach an AUC of nearly 0.8, after removing this type of feature. Source codes are available at https://github.com/15831959673/MuSE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.