Glioblastoma (GBM) presents a significant clinical challenge due to its aggressive nature and extensive heterogeneity. Tumour purity, the proportion of malignant cells within a tumour, is an important covariate for understanding the disease, having direct clinical relevance or obscuring signal of the malignant portion in molecular analyses of bulk samples. However, current methods for estimating tumour purity are non-specific and technically demanding. Therefore, we aimed to build a reliable and accessible purity estimator for GBM. We developed GBMPurity, a deep-learning model specifically designed to estimate the purity of IDH-wildtype primary GBM from bulk RNA-seq data. The model was trained using simulated pseudobulk tumours of known purity from labelled single-cell data acquired from the GBmap resource. The performance of GBMPurity was evaluated and compared to several existing tools using independent datasets. GBMPurity outperformed existing tools, achieving a mean absolute error of 0.15 and a concordance correlation coefficient of 0.88 on validation datasets. We demonstrate the utility of GBMPurity through inference on bulk RNA-seq samples and observe reduced purity of the Proneural molecular subtype relative to the Classical, attributed to the increased presence of healthy brain cells. GBMPurity provides a reliable and accessible tool for estimating tumour purity from bulk RNA-seq data, enhancing the interpretation of bulk RNA-seq data and offering valuable insights into GBM biology. To facilitate the use of this model by the wider research community, GBMPurity is available as a web-based tool at: https://gbmdeconvoluter.leeds.ac.uk/.
Read full abstract