Abstract

Histopathological images and omics profiles play important roles in prognosis of cancer patients. Here, we extracted quantitative features from histopathological images to predict molecular characteristics and prognosis, and integrated image features with mutations, transcriptomics, and proteomics data for prognosis prediction in lung adenocarcinoma (LUAD). Patients obtained from The Cancer Genome Atlas (TCGA) were divided into training set (n = 235) and test set (n = 235). We developed machine learning models in training set and estimated their predictive performance in test set. In test set, the machine learning models could predict genetic aberrations: ALK (AUC = 0.879), BRAF (AUC = 0.847), EGFR (AUC = 0.855), ROS1 (AUC = 0.848), and transcriptional subtypes: proximal-inflammatory (AUC = 0.897), proximal-proliferative (AUC = 0.861), and terminal respiratory unit (AUC = 0.894) from histopathological images. Moreover, we obtained tissue microarrays from 316 LUAD patients, including four external validation sets. The prognostic model using image features was predictive of overall survival in test and four validation sets, with 5-year AUCs from 0.717 to 0.825. High-risk and low-risk groups stratified by the model showed different survival in test set (HR = 4.94, p < 0.0001) and three validation sets (HR = 1.64–2.20, p < 0.05). The combination of image features and single omics had greater prognostic power in test set, such as histopathology + transcriptomics model (5-year AUC = 0.840; HR = 7.34, p < 0.0001). Finally, the model integrating image features with multi-omics achieved the best performance (5-year AUC = 0.908; HR = 19.98, p < 0.0001). Our results indicated that the machine learning models based on histopathological image features could predict genetic aberrations, transcriptional subtypes, and survival outcomes of LUAD patients. The integration of histopathological images and multi-omics may provide better survival prediction for LUAD.

Highlights

  • Lung cancer is the most common cancer and the main cause of cancer death worldwide, resulting in an estimated 2.1 million new cases and 1.8 million deaths annually (Bray et al, 2018)

  • Hematoxylin and eosin (H&E)-stained histopathological images of 522 lung adenocarcinoma (LUAD) patients were obtained from The Cancer Imaging Archive (TCIA),1 whereas the corresponding genomics, transcriptomics, and proteomics information were downloaded from The Cancer Genome Atlas (TCGA)2 and The Cancer Proteome Atlas (TCPA) repositories

  • To assess whether machine learning can be trained to predict genetic aberrations and transcriptional subtypes using histopathological image features as input, we downloaded the related data from TCGA

Read more

Summary

Introduction

Lung cancer is the most common cancer and the main cause of cancer death worldwide, resulting in an estimated 2.1 million new cases and 1.8 million deaths annually (Bray et al, 2018). Lung adenocarcinoma (LUAD) is the most major histological subtype, which is different from lung squamous cell carcinoma (LUSC) in clinical manifestations and therapeutic principles. LUAD occurs more frequently in never-smokers compared to LUSC (Herbst et al, 2018). There were small improvements in 5-year survival rate of lung cancer patients, the survival rates of patients with lymph node invasion (29.7%) or distant metastases (4.7%) were still not optimistic (Schabath and Cote, 2019). Identifying high-risk patients with worse prognosis is critical to the treatment and management of cancer patients. Various novel biomarkers are constantly emerging to better classify LUAD patients by their probable prognosis, and promote the development of precision medicine (Vargas and Harris, 2016)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call