Abstract

Medulloblastoma is a highly heterogeneous pediatric brain tumor with five molecular subtypes, Sonic Hedgehog TP53-mutant, Sonic Hedgehog TP53-wildtype, WNT, Group 3, and Group 4, defined by the World Health Organization. The current mechanism for classification into these molecular subtypes is through the use of immunostaining, methylation, and/or genetics. We surveyed the literature and identified a number of RNA-Seq and microarray datasets in order to develop, train, test, and validate a robust classifier to identify medulloblastoma molecular subtypes through the use of transcriptomic profiling data. We have developed a GPL-3 licensed R package and a Shiny Application to enable users to quickly and robustly classify medulloblastoma samples using transcriptomic data. The classifier utilizes a large composite microarray dataset (15 individual datasets), an individual microarray study, and an RNA-Seq dataset, using gene ratios instead of gene expression measures as features for the model. Discriminating features were identified using the limma R package and samples were classified using an unweighted mean of normalized scores. We utilized two training datasets and applied the classifier in 15 separate datasets. We observed a minimum accuracy of 85.71% in the smallest dataset and a maximum of 100% accuracy in four datasets with an overall median accuracy of 97.8% across the 15 datasets, with the majority of misclassification occurring between the heterogeneous Group 3 and Group 4 subtypes. We anticipate this medulloblastoma transcriptomic subtype classifier will be broadly applicable to the cancer research and clinical communities.

Highlights

  • Medulloblastoma (MB) is the most common of childhood brain tumors and accounts for nearly 20% of all pediatric CNS neoplasms [1]

  • This expression matrix was converted from a gene matrix to a gene expression ratios (GER) matrix and the limma R package was used to identify GERs specific to each subtype based on this dataset

  • We filtered to the 1,795 GERs and performed t-SNE analysis using the R package Rtsne using the following parameters: initial dimensions to be retained in the initial PCA step i.e. initial_dims set to 200, perplexity set to 10 and maximum iterations i.e. max_iter set to 500

Read more

Summary

Introduction

Medulloblastoma (MB) is the most common of childhood brain tumors and accounts for nearly 20% of all pediatric CNS neoplasms [1]. In 2016, based on several profiling studies, five molecular subtypes of MB were recognized, Sonic HedgeHog (SHH) TP53 mutant, SHH TP53 wild-type, WNT, Group 3, and Group 4 [5,6] These subtypes were independently identified and demonstrated as concordant from multiple bioinformatic analyses of gene expression, comparative genomic hybridization, and DNA methylation microarray data: prediction analysis of microarrays [7], unsupervised two-way hierarchical clustering and bootstrap analysis [8], unsupervised SubMap [7], nonnegative matrix factorization [7,9]. We sought to develop a tool that can accurately predict the four major molecular subtypes of medulloblastoma, SHH, WNT, Group 3, and Group 4 using any type of transcriptomic data, including RNA-Seq, microarray data, or panel data from NanoString nCounter or HTG platforms. At the current time, we can only capture the four main subtypes of MB [11]

Design and implementation
Results
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.