Abstract

Support vector machine (SVM) achieves successful classification performance with the application in non-coding RNA (ncRNA) data. With the rapid increase of the species and sizes of ncRNA sequences, several fast SVM methods based on data distribution and contour information have been developed to reduce their time complexity. However, they are sensitive to both noise and class imbalance problems. In this paper, a fast and robust SVM with anti-noise convex hull for large-scale ncRNA data classification (called FRSVM-ANCH) is proposed. FRSVM-ANCH discards the outliers in the feature space and obtains the convex hull of different classes. Then, the convex hull as the training data, along with its weight is used to train the SVM. Due to less sensitive to noise, pinball loss is adopted in SVM classifier. Theoretical analysis and experimental results verify the advantages of FRSVM-ANCH in classification performance and training time on large scale noisy and imbalanced ncRNA datasets.

Highlights

  • Non-coding RNAs are defined as all functional RNA transcripts other than protein encoding messenger RNAs

  • To establish a more effective Support vector machine (SVM) for large-scale ncRNA classification, in this study, we propose a fast and robust support vector machine with anti-noise convex hull called FRSVM-ANCH, which aims to select training samples based on the random projection and anti-noise convex hull strategies in noisy and imbalanced ncRNA scenarios

  • The proposed FRSVM-ANCH method maps the original sample to multiple feature subspaces by Random projection (RP), and calculates the antinoise convex hull in the feature space

Read more

Summary

Introduction

Non-coding RNAs (ncRNA) are defined as all functional RNA transcripts other than protein encoding messenger RNAs. Studies have shown that the proportion of coding and non-coding parts in biological genomes increases with the increase of complexity of organisms, and the proportion of ncRNA in genomic transcription products increases, especially for eukaryotes. Only 30% of genomic transcripts do not encode proteins, while in the Drosophila genome, the non-coding portion reaches 75% and humans reach 98% [1]. Types of ncRNAs include rRNA, tRNA, snRNA, snoRNA, microRNA, as well as RNA with unknown function. NcRNA can be further divided into small ncRNA (such as snoRNA, microRNA, siRNA, exRNA, etc.) and long ncRNA (lncRNA). More and more studies have shown that ncRNA is involved in the occurrence of various diseases.

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.