Abstract
The interaction between proteins and RNA is closely related to various human diseases. Computer-aided drug design can be facilitated by detecting the RNA sites that bind proteins. However, due to the aggregation of binding sites in RNA sequences, high sample similarity occurs when extracting RNA fragments by using a sliding window. Considering these problems, we present a method, DFpin, to predict protein-interacting nucleotides in RNA. To retain more key nucleotide sites, we used the redundancy method based on feature similarity, that is, feature redundancy is removed based on the RNA mono-nucleotide composition to maintain the diversity of RNA samples and avoid the residue of redundant data. In addition, to extract key abstract features and avoid over-fitting, we used the cascade structure of a deep forest model to predict protein-interacting nucleotides. Overall, DFpin demonstrated excellent classification with 85.4% accuracy and 93.3% area under the curve. Compared with other methods, the accuracy of DFpin was better, suggesting that feature-based redundancy removal and deep forest can help predict nucleotides of protein interactions. The source code and all dataset are available at: https://github.com/zhaoxj-tech/DFpin.git.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.