Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations

Veronica Zelli,Andrea Manno,Chiara Compagnoni,Rasheed Oyewole Ibraheem,Francesca Zazzeroni,Edoardo Alesse,Fabrizio Rossi,Claudio Arbib,Alessandra Tessitore

doi:10.1186/s12967-023-04720-4

Abstract

BackgroundMachine learning (ML) represents a powerful tool to capture relationships between molecular alterations and cancer types and to extract biological information. Here, we developed a plain ML model aimed at distinguishing cancer types based on genetic lesions, providing an additional tool to improve cancer diagnosis, particularly for tumors of unknown origin.MethodsTCGA data from 9,927 samples spanning 32 different cancer types were downloaded from cBioportal. A vector space model type data transformation technique was designed to build consistently homogeneous new datasets containing, as predictive features, calls for somatic point mutations and copy number variations at chromosome arm-level, thus allowing the use of the XGBoost classifier models. Considering the imbalance in the dataset, due to large difference in the number of cases for each tumor, two preprocessing strategies were considered: i) setting a percentage cut-off threshold to remove less represented cancer types, ii) dividing cancer types into different groups based on biological criteria and training a specific XGBoost model for each of them. The performance of all trained models was mainly assessed by the out-of-sample balanced accuracy (BACC) and the AUC scores.ResultsThe XGBoost classifier achieved the best performance (BACC 77%; AUC 97%) on a dataset containing the 10 most represented tumor types. Moreover, dividing the 18 most represented cancers into three different groups (endocrine-related carcinomas, other carcinomas and other cancers),such analysis models achieved 78%, 71% and 86% BACC, respectively, with AUC scores greater than 96%. In addition, the model capable of linking each group to a specific cancer type reached 81% BACC and 94% AUC. Overall, the diagnostic potential of our model was comparable/higher with respect to others already described in literature and based on similar molecular data and ML approaches.ConclusionsA boosted ML approach able to accurately discriminate different cancer types was developed. The methodology builds datasets simpler and more interpretable than the original data, while keeping enough information to accurately train standard ML models without resorting to sophisticated Deep Learning architectures. In combination with histopathological examinations, this approach could improve cancer diagnosis by using specific DNA alterations, processed by a replicable and easy-to-use automated technology. The study encourages new investigations which could further increase the classifier’s performance, for example by considering more features and dividing tumors into their main molecular subtypes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Translational Medicine	Publication Date: Nov 21, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations

Abstract

Talk to us

Similar Papers

More From: Journal of Translational Medicine

Lead the way for us

Similar Papers

Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis
Xuan Song ... Chunting Wang
International Journal of Medical Informatics | VOL. 151
Xuan Song, et. al.Xuan Song ... Chunting Wang
08 May 2021
International Journal of Medical Informatics | VOL. 151

A Review of Computational Intelligence Models for Brain Tumour Classification and Prediction
Justice Kwame Appati ... Godfred Akwetey Brown
International Journal of Software Science and Computational Intelligence | VOL. 13
Justice Kwame Appati, et. al.Justice Kwame Appati ... Godfred Akwetey Brown
01 Oct 2021
International Journal of Software Science and Computational Intelligence | VOL. 13

Methodological progress note: Machine learning methods in healthcare research.
Colin Rogerson ... Matt Hall
Journal of Hospital Medicine | VOL. 18
Colin Rogerson, et. al.Colin Rogerson ... Matt Hall
13 Mar 2023
Journal of Hospital Medicine | VOL. 18

185. A predictive model for C5 palsy after instrumented cervical fusion
Akash A Shah ... Don Y Park
The Spine Journal | VOL. 22
Akash A Shah, et. al.Akash A Shah ... Don Y Park
19 Aug 2022
185. A predictive model for C5 palsy after instrumented cervical fusion
Akash A Shah ... Don Y Park

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations

Abstract

Talk to us

Similar Papers

More From: Journal of Translational Medicine