Abstract
Evidence has accumulated enough to prove non-coding RNAs (ncRNAs) play important roles in cellular biological processes and disease pathogenesis. High throughput techniques have produced a large number of ncRNAs whose function remains unknown. Since the accurate identification of ncRNAs family is helpful to the research of their function, it is of necessity and urgency to predict the family of each ncRNAs. Although several traditional excellent methods are applicable to predict the family of ncRNAs, their complex procedures or inaccurate performance remain major problems confronting us. The main idea of those methods is first to predict the secondary structure, and then identify ncRNAs family according to properties of the secondary structure. Unfortunately, the multi-step error superposition, especially the imperfection of RNA secondary structure prediction tools, maybe the cause of low accuracy. In this paper, a novel end-to-end method 'ncRFP' was proposed to complete the prediction task based on Deep Learning. Instead of predicting the secondary structure, ncRFP predicts the ncRNAs family by automatically extracting features from ncRNAs sequences. Compared with other methods, ncRFP not only simplifies the process but also improves accuracy. The source code of ncRFP can be available at https://github.com/linyuwangPHD/ncRFP.
Highlights
THE expression of protein-coding genes has been the focus of life studies for decades
The primary structure is the bases sequence. It seems that ncRNAs sequences are not as conservative as secondary/tertiary structures, the same family of ncRNAs contains unify seeds
The secondary structure refers to the planar structure formed by a combination of secondary structure elements with a variety of specific shapes through its own folding and base pairing within the sequence
Summary
THE expression of protein-coding genes (messenger RNAs: mRNAs) has been the focus of life studies for decades. GraPPLE uses machine learning methods to predict ncRNAs family based on the secondary structure features. NRC currently represents the state-of-art method, where the secondary structure typical features are first extracted by Moss [16] and processed into one-hot code, and the convolutional neural network is employed to identify ncRNAs. RNAcon considers 20 graph features obtained from the predicted ncRNAs secondary structure and adopts an RF classifier. NRC currently represents the state-of-art method, where the secondary structure typical features are first extracted by Moss [16] and processed into one-hot code, and the convolutional neural network is employed to identify ncRNAs Those traditional methods are all required to get the secondary structure of ncRNAs at the beginning and identify ncRNAs family with the secondary structure features. It predigests the process and has the potential to improve the prediction accuracy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.