Abstract
Introduction: Cell of origin classification of diffuse large B-cell lymphoma (DL-BCL) using gene expression signatures identifies patients with distinct molecular features and survival outcomes. Whilst the identification of activated B-cell (ABC) and germinal center B-cell (GCB) subtypes of DL-BCL is now routinely used in diagnostics, however, a significant proportion of patients display gene expression signatures that are intermediate between classes. Whilst some of these intermediate cases display genuine hallmarks of both ABC or GCB subtypes, leading to an intermediate DL-BCL COO-UNC type, others appear to display distinct molecular signatures. It is possible that that these signatures represent inherent noise in the expression profiling process, or biological noise resulting from, for example, high levels of T-cell infiltration in samples. A more intriguing possibility is that these cases represent intermediate types between DL-BCL and other types of lymphoma. The existence of intermediate molecular signatures raises important questions about which type of treatment is most applicable for these cases. Yet the identification of intermediate molecular signatures is challenging, because existing classification algorithms have tended to focus on a small subset of lymphoma types rather than across the pan-lymphoma spectrum. Moreover, it is not clear what features would be needed to identify the relationship between patients to diverse lymphoma classes. Methods: To address this issue, we have developed a pan-lymphoma classifier using a support vector machine (SVM) with an embedded recursive feature selection (RFS) algorithm. Our training and test dataset consisted of 431 samples, spanning eight different types of lymphoma: ABC (49), GCB (133), COO-UNC (40), Burkitt's (60), Hodgkin's (43), PMBL (35), PBL (39), and Plasmacytoma (32). Gene expression profiles for each sample were obtained using the Illumina DASL platform. We trained the SVM using 10-fold cross-validation with a 70:30 train/test split. On each iteration of the RFS, we removed the single least informative probe. We repeated this until a significant drop in average accuracy of the SVM was reached. Results: Initially training the SVM on 1125 probes, we obtained a pan-lymphoma average classification accuracy of 91%. We then used the SVM-RFS to obtain a list of just 23 probes that gave rise to an average classification accuracy of 86%. This allowed us to develop a between-class distance metric as a score of the relative association of any given sample to a lymphoma class. Retrospective classification of COO-UNC samples showed some share molecular features associated with non-DL-BCL classes. Conclusions: SVM-RFS appears to be a robust approach for identifying classes of lymphoma from underlying molecular phenotypes. A small number of most informative genes provide a separation of cases into subtypes that is in excellent agreement with laboratory diagnoses. Keywords: activated B-cell–like (ABC); GCB lymphoma subtype; gene expression profile (GEP)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.