Abstract

In the past two decades, a huge amount of high-throughput -omics data, such as genomics, transcriptomics, metabolomics, and proteomics, have been generated regarding variations in DNA, RNA, or protein features for many cancers. The tremendous volume and complexity of these data bring significant challenges for biostatisticians, biologists, and clinicians. One of the central goal of analyzing these data is disease classification, which is fundamental for us to explore knowledge, formulate diagnosis, and develop personalized treatment. Here, we review the statistical and machine learning techniques studied in cancer classification and the process or difficulties of categorizing cancer subtypes from their genomic features.

Highlights

  • In the past two decades, a huge amount of high-throughput -omics data, such as genomics, transcriptomics, metabolomics, and proteomics, have been generated regarding variations in DNA, RNA, or protein features for many cancers

  • We review the statistical and machine learning techniques studied in cancer classification and the process or difficulties of categorizing cancer subtypes from their genomic features

  • The dramatically wave of genomic data accelerate the trend of classifying cancer subtypes by the clinical outcome or treatment option

Read more

Summary

Introduction

In the past two decades, a huge amount of high-throughput -omics data, such as genomics, transcriptomics, metabolomics, and proteomics, have been generated regarding variations in DNA, RNA, or protein features for many cancers. Predict the value of a class outcome using input variables from a training set of samples with known class labels [2]. Another very popular machine learning technique, clustering, falls into the category of unsupervised learning, which doesn’t need outcome label but has the goal to describe the associations and patterns among a set of input variables [2]. Soft classification rules first estimate the conditional outcome class probabilities and predict the class label based on the maximum probability. The dramatically wave of genomic data accelerate the trend of classifying cancer subtypes by the clinical outcome or treatment option. Exploration of multi-classification problems is essential for successful application in precision medicine

Method
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call