Abstract

An extremely crucial step in the diagnosis of cancers is to select a small number of informative genes for accurate classification. This issue has become a hot focus in the data mining of gene expression profiles. Especially for data with a large number of cancer types, many conventional classification methods show very poor performance. Here, we proposed a new approach for gene selection and multi-cancer classification based on step-by-step improvement of classification performance (SSiCP). The SSiCP gene selection algorithms were evaluated over the NCI60 and GCM benchmark datasets, with accuracy of 96.6% and 95.5% in 10-fold cross-validation, respectively. Furthermore, the SSiCP outperformed recently published algorithms when applied to another two multi-cancer data sets. Computational evidence indicated that SSiCP can avoid overfitting effectively. Compared with various gene selection algorithms, the implementation of SSiCP is simple and many of the selected genes by SSiCP are shown to be closely related to cancers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.