Abstract

Kidney cancer is one of the deadliest diseases and its diagnosis and subtype classification are crucial for patients’ survival. Thus, developing automated tools that can accurately determine kidney cancer subtypes is an urgent challenge. It has been confirmed by researchers in the biomedical field that miRNA dysregulation can cause cancer. In this paper, we propose a machine learning approach for the classification of kidney cancer subtypes using miRNA genome data. Through empirical studies we found 35 miRNAs that possess distinct key features that aid in kidney cancer subtype diagnosis. In the proposed method, Neighbourhood Component Analysis (NCA) is employed to extract discriminative features from miRNAs and Long Short Term Memory (LSTM), a type of Recurrent Neural Network, is adopted to classify a given miRNA sample into kidney cancer subtypes. In the literature, only a couple of kidney subtypes have been considered for classification. In the experimental study, we used the miRNA quantitative read counts data, which was provided by The Cancer Genome Atlas data repository (TCGA). The NCA procedure selected 35 of the most discriminative miRNAs. With this subset of miRNAs, the LSTM algorithm was able to group kidney cancer miRNAs into five subtypes with average accuracy around 95% and Matthews Correlation Coefficient value around 0.92 under 10 runs of randomly grouped 5-fold cross-validation, which were very close to the average performance of using all miRNAs for classification.

Highlights

  • Kidney cancer is one of the deadliest diseases and it is hard to detect early through normal clinical means [1]

  • Where f t is the activation vector of the forget gate, σ is the sigmoid function, W is weight matrices to be learned during training, xt is input vector to the Long Short Term Memory (LSTM) unit, b is bias vector parameters to be learned during training, it is activation vector of the input gate, Ct is cell state vector, Qt is activation vector of the output gate, and ht is output vector of the LSTM unit

  • We used kidney cancer RNA-sequence data represented by the miRNA expression that is publicly available on The Cancer Genome Atlas (TCGA) database website

Read more

Summary

Introduction

Kidney cancer is one of the deadliest diseases and it is hard to detect early through normal clinical means [1]. Various efforts have been developed to differentiate among sub-types of kidney cancer One of these promising paths is the analysis of the genetic information of the patient. We focus on kidney cancer sub-type detection and classification in an effort to assist researchers in medicine to address the key points of kidney subtypes and their characteristics. To assess the performance of the proposed method, we adopted the Data Analysis Protocol and the Matthews Correlation Coefficient [6]. It has to be stressed, that the effectiveness of the selected miRNA subset for diagnosing specific subtypes of kidney needs to be investigated clinically.

The RNA Sequence and Kidney Cancer
Machine Learning
Neighborhood Component Analysis
Data Preparation and Results
Data Preparation and Categorization
Results and Discussions
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.