Abstract

Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and small-n. Therefore, RF can be used to select and rank the genes for the diagnosis and effective treatment of cancer. The microarray gene expression data of colon, leukemia, and prostate cancers were collected from public databases. Primary preprocessing was done on them using limma package, and then, the RF classification method was implemented on datasets separately in R software. Finally, the selected genes in each of the cancers were evaluated and compared with those of previous experimental studies and their functionalities were assessed in molecular cancer processes. The RF method extracted very small sets of genes while it retained its predictive performance. About colon cancer data set DIEXF, GUCA2A, CA7, and IGHA1 key genes with the accuracy of 87.39 and precision of 85.45 were selected. The SNCA, USP20, and SNRPA1 genes were selected for prostate cancer with the accuracy of 73.33 and precision of 66.67. Also, key genes of leukemia data set were BAG4, ANKHD1-EIF4EBP3, PLXNC1, and PCDH9 genes, and the accuracy and precision were 100 and 95.24, respectively. The current study results showed most of the selected genes involved in the processes and cancerous pathways were previously reported and had an important role in shifting from normal cell to abnormal.

Highlights

  • Gene expression profiling, using highthroughput technology in different cells and tissue, is an important source to discover helpful molecular patterns [1]

  • Primary preprocessing was done on them using limma package, and the random forest (RF) classification method was implemented on datasets separately in R software

  • The current study results showed most of the selected genes involved in the processes and cancerous pathways were previously reported and had an important role in shifting from normal cell to abnormal

Read more

Summary

Introduction

Gene expression profiling, using highthroughput technology in different cells and tissue, is an important source to discover helpful molecular patterns [1]. These patterns indicated the genes activity in a cell, and subsequently, the associated states. Microarray technology allows studying whole genome, transcriptome, and proteome in different cells as well as tissue and diverse conditions The yield of this method is very high and it is able to analyze a significant amount of information at a short time. RF can be used to select and rank the genes for the diagnosis and effective treatment of cancer

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call