Abstract

Given the heterogeneity in the clinical behavior of cancer patients with identical histopathological diagnosis, the search for unrecognized molecular subtypes, subtype-specific markers and the evaluation of their clinical-biological relevance are a necessity. This task is benefiting today from the high-throughput genomic technologies and free access to the datasets generated by the international genomic projects and the repositories of information. Machine learning strategies have proven to be useful in the identification of hidden trends in large datasets, contributing to the understanding of the molecular mechanisms and subtyping of cancer. However, the translation of new molecular subclasses and biomarkers into clinical settings requires their analytic validation and clinical trials to determine their clinical utility. Here, we provide an overview of the workflow to identify and confirm cancer subtypes, summarize a variety of methodological principles, and highlight representative studies. The generation of public big data on the most common malignancies is turning the molecular pathology into a database-driven discipline.

Highlights

  • The diagnosis of cancer is made primarily through histopathological classification systems that take into account the morphological characteristics of the tumor, allowing their identification and clinical stage assignment

  • It is necessary to identify patterns in large datasets and at a genomewide scale using machine learning strategies. This task benefits from the high-throughput genomic technologies, the enormous amount of genomic datasets generated by the international genomic projects, and the availability of data analysis algorithms, allowing a comprehensive and unprecedented characterization of the disease

  • In the case of microarray data, raw data are pre-processed in a process that involves three steps: background correction to adjust the intensity readings for nonspecific signals; adjustment of the intensity readings for technical variability to ensure that the measurements of all samples are comparable; and computation of a summary value for the different probes representing each gene

Read more

Summary

Introduction

The diagnosis of cancer is made primarily through histopathological classification systems that take into account the morphological characteristics of the tumor, allowing their identification and clinical stage assignment. The existing histopathological subtypes are heterogeneous; this is evident at the levels of molecular pathogenesis, clinical course, and treatment responsiveness [1,2]. The machine learning approaches can be used to dissect the complexity of cancer These are the computational tools that recognize and classify patterns based on models derived from the data. Machine learning for cancer subtyping has been performed mainly with expression data This technique can be applied to other levels of biological information, such as promoter methylation, miRNAs, and single nucleotide polymorphisms, analyzed with hybridization array technology or generation sequencing, allowing the study of the data structure in many different levels and providing an integrated view of the biological processes involved

Unsupervised and Supervised Learning for Cancer Study
Classification method Linear discriminant analysis
DLBCL Breast cancer
Subtypes with molecular heterogeneity
Clustering of patients
Classification and validation of results
Datasets and Analysis Tools
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.