Integrating pathological images and genomics data to identify prognostic features related to recurrence of sarcoma
Integrating pathological images and genomics data to identify prognostic features related to recurrence of sarcoma
- Conference Article
12
- 10.1109/bibm52615.2021.9669445
- Dec 9, 2021
Survival analysis is crucial to the evaluation of cancer treatment options and deep learning-based methods integrating pathological images and genomic data have been used for prognosis prediction. However, the most methods are based on the analysis of pathological image patches, thus ignoring the morphological structure information at larger field-of-view and intrinsic relationships between patches. Meanwhile, the existing models fail to exploit the powerful representation learning capabilities of the neural networks for effective multimodal feature fusion of pathological images and genomic data. In this paper, we propose a novel transformer-based fusion network integrating pathological images and genomic data (PGTFNet) for cancer survival analysis. Specifically, we present a transformer-based feature fusion module for multi-scale pathological slides to fully exploit the intra-modality relationships between image patches at various fields of view. Moreover, in order to make effective inter-modality feature fusion of pathological images and genomic data, we introduce a cross-attention transformer module that can exchange feature representations of different modalities between two transformers branches. The PG-TFNet is performed on the colorectal cancer dataset from the Cancer Genome Atlas (TCGA), which contains paired whole-slide images and genomic data with ground truth survival data. The experimental results from a 10-fold cross validation demonstrate that the proposed PG-TFNet facilitates the prognosis prediction of colorectal cancer and shows superiority over the existing methods.
- Research Article
1
- 10.1158/1538-7445.am2022-5045
- Jun 15, 2022
- Cancer Research
Background: Cancers of unknown primary (CUP) are a type of metastatic cancer. However, their primary anatomical site of origin cannot be clinically determined using routine history inquiries, laboratory tests, endoscopy, and imaging. CUP account for ~3-5% of cancers. Empirical chemotherapy (paclitaxel, carboplatin, etc.) is generally used, although the curative effect is poor. Finding the primary site of origin for CUP patients is of great significance for clinical treatment. Method: Our study employed 10,001 pathological images and 9,775 gene detection data points based on whole-exome sequencing (WES) or YUANSU®(OrigiMed, Shanghai, China) for primary cancer within a database containing 32 common cancers. We applied machine learning algorithms (autoML, Transformers, attention) and constructed two diagnostic models. The models diagnosed by identifying pathological images (Model 1) and genomic data (Model 2). Accuracy for the two models was verified. Result: Both models were evaluated using top-k differential diagnosis accuracy, in other words how often the ground truth label was found for k highest confidence predictions for the model. The pathological model (Model 1) achieved a top-3 accuracy of 83.38% and a top-5 accuracy of 90.36%. Using the same methodology, genomic model (Model 2) results were 87.5% and 92.2%, respectively. Conclusion: Using deep learning, we developed diagnostic models for CUP based on pathological images and genomic data. Accuracy for both the pathological image model (Model 1) and the genomic data model (Model 2) was not satisfactory. To improve diagnostic accuracy, further studies on developing a diagnostic model that combines pathological imaging and genomic data are ongoing.. Citation Format: Yanan Wang, Guanjun Zhang, Xi Liu, Yanfeng Xi, Pan Wang, Yuman Zhang, Xing Li. Genomics and pathology based deep learning to predict cancers of unknown primary [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 5045.
- Research Article
103
- 10.1016/j.cmpb.2018.04.008
- Apr 19, 2018
- Computer Methods and Programs in Biomedicine
Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome
- Research Article
11
- 10.1109/tmi.2023.3274652
- Oct 1, 2023
- IEEE Transactions on Medical Imaging
The tumor-infiltrating lymphocytes (TILs) and its correlation with tumors have shown significant values in the development of cancers. Many observations indicated that the combination of the whole-slide pathological images (WSIs) and genomic data can better characterize the immunological mechanisms of TILs. However, the existing image-genomic studies evaluated the TILs by the combination of pathological image and single-type of omics data (e.g., mRNA), which is difficulty in assessing the underlying molecular processes of TILs holistically. Additionally, it is still very challenging to characterize the intersections between TILs and tumor regions in WSIs and the high dimensional genomic data also brings difficulty for the integrative analysis with WSIs. Based on the above considerations, we proposed an end-to-end deep learning framework i.e., IMO-TILs that can integrate pathological image with multi-omics data (i.e., mRNA and miRNA) to analyze TILs and explore the survival-associated interactions between TILs and tumors. Specifically, we firstly apply the graph attention network to describe the spatial interactions between TILs and tumor regions in WSIs. As to genomic data, the Concrete AutoEncoder (i.e., CAE) is adopted to select survival-associated Eigengenes from the high-dimensional multi-omics data. Finally, the deep generalized canonical correlation analysis (DGCCA) accompanied with the attention layer is implemented to fuse the image and multi-omics data for prognosis prediction of human cancers. The experimental results on three cancer cohorts derived from the Cancer Genome Atlas (TCGA) indicated that our method can both achieve higher prognosis results and identify consistent imaging and multi-omics bio-markers correlated strongly with the prognosis of human cancers.
- Research Article
3
- 10.1109/jbhi.2024.3363161
- May 1, 2025
- IEEE journal of biomedical and health informatics
Accurate cancer survival prediction is crucial for oncologists to determine therapeutic plan, which directly influences the treatment efficacy and survival outcome of patient. Recently, multimodal fusion-based prognostic methods have demonstrated effectiveness for survival prediction by fusing diverse cancer-related data from different medical modalities, e.g., pathological images and genomic data. However, these works still face significant challenges. First, most approaches attempt multimodal fusion by simple one-shot fusion strategy, which is insufficient to explore complex interactions underlying in highly disparate multimodal data. Second, current methods for investigating multimodal interactions face the capability-efficiency dilemma, which is the difficult balance between powerful modeling capability and applicable computational efficiency, thus impeding effective multimodal fusion. In this study, to encounter these challenges, we propose an innovative multi-shot interactive fusion method named MIF for precise survival prediction by utilizing pathological and genomic data. Particularly, a novel multi-shot fusion framework is introduced to promote multimodal fusion by decomposing it into successive fusing stages, thus delicately integrating modalities in a progressive way. Moreover, to address the capacity-efficiency dilemma, various affinity-based interactive modules are introduced to synergize the multi-shot framework. Specifically, by harnessing comprehensive affinity information as guidance for mining interactions, the proposed interactive modules can efficiently generate low-dimensional discriminative multimodal representations. Extensive experiments on different cancer datasets unravel that our method not only successfully achieves state-of-the-art performance by performing effective multimodal fusion, but also possesses high computational efficiency compared to existing survival prediction methods.
- Research Article
1
- 10.1158/1538-7445.am2017-2593
- Jul 1, 2017
- Cancer Research
Introduction: The Children’s Brain Tumor Tissue Consortium (CBTTC), an international repository of genomic and phenotypic data, has partnered with Blackfynn, Inc., to create a cloud-based data management platform to facilitate team-science across disciplines. Background: The CBTTC through the CHOP Department of Biomedical and Health Informatics (DBHi) has developed a network of informatics and data applications for researchers across the globe to work together and perform real-time analyses on existing clinical, phenotypic, and genomic data. Historically, rare disease datasets are siloed, locked in proprietary formats, segregated by data types, and hidden from the view of experts in the field. This has been a significant barrier to finding effective therapeutics for children with pediatric brain tumors. Blackfynn was founded by a group of multidisciplinary experts in neuroscience, neurology, medicine, software development, engineering, computer science and business with the goal to empower researchers to cure neurologic disease and provide solutions to these challenges. Description of Methods: The CBTTC and Blackfynn teamed up to provide a cloud-based, team-focused data management and analytics platform. The platform provides a commercial grade, scalable approach to upload, view, and integrate digital pathology images with relevant subject data such as MRIs, pathology reports and genomic information. Stakeholders can search integrated data without requiring users to change their current workflow or conform to imposed data standards. This platform is a simple, intuitive, end-to-end software platform for teams of scientists and pathologists to review, annotate and discuss cases, enabling rapid diagnostic consensus, quality control, and empowered discovery. Summary of Unpublished Results: The CBTTC/Blackfynn data platform enabled CBTTC members to engage in a cross-institutional collaboration to reach consensus on digital pathology data in ways that were previously not possible. We demonstrated that this solution removes existing barriers to collaborative efforts and provides a rich analytic and discovery platform bridging imaging with genomics and other data formats. The platform provides a new model for the scientific community to facilitate translation towards improved treatments for children diagnosed with brain tumors. Discussion and Future Direction: This pilot project will be scaled to other CBTTC sites for centralized review of pathology images to enable the research community to collaborative on specific projects. The next phase of platform development will include further integration CBTTC platforms fully integrating genomics data, and side-by-side viewing and analyses of MRI, pathology and clincal data to facilitate specific project work around large and complex research data types in a cloud environment. Citation Format: Amanda Christini, Angela J. Waanders, Joost B. Wagenaar, Alex S. Felmeister, Mariarita Santi, Nitin R. Wadhwani, Jennifer L. Mason, Mateusz P. Koptyra, Jena V. Lilly, Jeffrey W. Pennington, Rishi R. Lulla, Adam C. Resnick. Accelerating pediatric brain tumor research through team science solutions [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 2593. doi:10.1158/1538-7445.AM2017-2593
- Research Article
- 10.1158/1538-7445.advbc23-b082
- Feb 1, 2024
- Cancer Research
Breast cancer prognostication guides treatment decisions, with transcriptomic assays like Prosigna and Oncotype DX providing valuable risk of recurrence (ROR) scores in ER+/HER2– patients. However, these tests are costly and require sufficient biopsy tissue for accuracy. Pathology images are routinely available and could expedite rapid risk stratification at lower cost. We propose a multi-modal, multi-task deep neural network that learns transcriptomic ROR using digital pathology images and hormone receptor expression status. Because the incorporation of both pathology images and clinical data has proven beneficial in other deep learning models, we hypothesized that incorporation of this complementary information would improve ROR predictions compared to using pathology images alone. We modified the “clustering-constrained-attention multiple-instance learning” (CLAM) deep learning method to perform multi-task regression and infer continuous risk scores (instead of discrete risk categories). We evaluated (1) PAM50 ROR-P, calculated using a multivariate model based on intrinsic subtype centroids and a proliferation score, and (2) a 21-gene assay recurrence score (RS) (research version of Oncotype DX), calculated from component scores relating to proliferation, ER, HER2, and invasion. Using data from 899 patients in the Carolina Breast Cancer Study, a population-based study of diverse patients including 50% Black women and 50% women under age 50, we constructed training (n=714), validation (n=93), and test (n=92) sets. Genomic data were assayed by Nanostring, and digital pathology was based on H&E-stained whole slide images (WSIs). Models were assessed using Pearson correlation between measured and predicted ROR-P or RS. The model with the highest correlations on the test set used H&E WSIs and ER/PR/HER2 expression status as inputs and generated 10 independent outputs: ROR-P, four PAM50 centroid correlations (i.e., Basal, HER2-enriched, Luminal A, and Luminal B), RS, and four RS component scores. Prediction was strong when considering all participants in the test set (ROR-P, 0.78; RS, 0.83, N, 92) but was reduced among more clinically homogeneous ER+/HER2– patients (ROR-P, 0.55; RS, 0.49; N, 49). For RS, correlations were higher in models that included ER/PR/HER2 (Overall, 0.83; ER+/HER2–, 0.50) than in those that relied on pathology alone (Overall, 0.60, ER+/HER2–, 0.34). Future models will consider strategies to enhance performance in ER+/HER2– patients, such as oversampling of ER+/HER2– patients and use of loss functions designed to focus learning on mis-predicted examples. In summary, multi-task, tissue-based regression deep learning models recapitulate transcription-based risk assays with high correlations and, with optimization (especially integration of additional clinicopathologic data), hold promise for personalized treatment decisions from early, routinely-collected biopsy images. Citation Format: Jakub R Kaczmarzyk, Luke A Torre-Healy, Richard A Moffitt, Rajarsi Gupta, Alina M Hamilton, Tahsin M Kurc, Katherine A Hoadley, Melissa A Troester, Joel H Saltz. Early risk stratification of ER+/HER2– breast cancer patients using digital pathology and multi-task, weakly-supervised deep learning [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Advances in Breast Cancer Research; 2023 Oct 19-22; San Diego, California. Philadelphia (PA): AACR; Cancer Res 2024;84(3 Suppl_1):Abstract nr B082.
- Research Article
82
- 10.1109/tmi.2019.2920608
- Jun 3, 2019
- IEEE Transactions on Medical Imaging
The integrative analysis of histopathological images and genomic data has received increasing attention for studying the complex mechanisms of driving cancers. However, most image-genomic studies have been restricted to combining histopathological images with the single modality of genomic data (e.g., mRNA transcription or genetic mutation), and thus neglect the fact that the molecular architecture of cancer is manifested at multiple levels, including genetic, epigenetic, transcriptional, and post-transcriptional events. To address this issue, we propose a novel ordinal multi-modal feature selection (OMMFS) framework that can simultaneously identify important features from both pathological images and multi-modal genomic data (i.e., mRNA transcription, copy number variation, and DNA methylation data) for the prognosis of cancer patients. Our model is based on a generalized sparse canonical correlation analysis framework, by which we also take advantage of the ordinal survival information among different patients for survival outcome prediction. We evaluate our method on three early-stage cancer datasets derived from The Cancer Genome Atlas (TCGA) project, and the experimental results demonstrated that both the selected image and multi-modal genomic markers are strongly correlated with survival enabling effective stratification of patients with distinct survival than the comparing methods, which is often difficult for early-stage cancer patients.
- Research Article
189
- 10.1136/amiajnl-2012-001469
- Nov 1, 2013
- Journal of the American Medical Informatics Association : JAMIA
The integration and visualization of multimodal datasets is a common challenge in biomedical informatics. Several recent studies of The Cancer Genome Atlas (TCGA) data have illustrated important relationships between morphology observed in whole-slide images, outcome, and genetic events. The pairing of genomics and rich clinical descriptions with whole-slide imaging provided by TCGA presents a unique opportunity to perform these correlative studies. However, better tools are needed to integrate the vast and disparate data types. To build an integrated web-based platform supporting whole-slide pathology image visualization and data integration. All images and genomic data were directly obtained from the TCGA and National Cancer Institute (NCI) websites. The Cancer Digital Slide Archive (CDSA) produced is accessible to the public (http://cancer.digitalslidearchive.net) and currently hosts more than 20,000 whole-slide images from 22 cancer types. The capabilities of CDSA are demonstrated using TCGA datasets to integrate pathology imaging with associated clinical, genomic and MRI measurements in glioblastomas and can be extended to other tumor types. CDSA also allows URL-based sharing of whole-slide images, and has preliminary support for directly sharing regions of interest and other annotations. Images can also be selected on the basis of other metadata, such as mutational profile, patient age, and other relevant characteristics. With the increasing availability of whole-slide scanners, analysis of digitized pathology images will become increasingly important in linking morphologic observations with genomic and clinical endpoints.
- Research Article
13
- 10.1016/j.compbiomed.2023.107796
- Dec 3, 2023
- Computers in Biology and Medicine
Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer
- Research Article
8
- 10.1016/j.jbi.2019.103194
- Apr 29, 2019
- Journal of Biomedical Informatics
LSCDFS-MKL: A multiple kernel based method for lung squamous cell carcinomas disease-free survival prediction with pathological and genomic data.
- Research Article
80
- 10.1093/bioinformatics/btab185
- Mar 18, 2021
- Bioinformatics
MotivationBreast cancer is a very heterogeneous disease and there is an urgent need to design computational methods that can accurately predict the prognosis of breast cancer for appropriate therapeutic regime. Recently, deep learning-based methods have achieved great success in prognosis prediction, but many of them directly combine features from different modalities that may ignore the complex inter-modality relations. In addition, existing deep learning-based methods do not take intra-modality relations into consideration that are also beneficial to prognosis prediction. Therefore, it is of great importance to develop a deep learning-based method that can take advantage of the complementary information between intra-modality and inter-modality by integrating data from different modalities for more accurate prognosis prediction of breast cancer.ResultsWe present a novel unified framework named genomic and pathological deep bilinear network (GPDBN) for prognosis prediction of breast cancer by effectively integrating both genomic data and pathological images. In GPDBN, an inter-modality bilinear feature encoding module is proposed to model complex inter-modality relations for fully exploiting intrinsic relationship of the features across different modalities. Meanwhile, intra-modality relations that are also beneficial to prognosis prediction, are captured by two intra-modality bilinear feature encoding modules. Moreover, to take advantage of the complementary information between inter-modality and intra-modality relations, GPDBN further combines the inter- and intra-modality bilinear features by using a multi-layer deep neural network for final prognosis prediction. Comprehensive experiment results demonstrate that the proposed GPDBN significantly improves the performance of breast cancer prognosis prediction and compares favorably with existing methods.Availabilityand implementationGPDBN is freely available at https://github.com/isfj/GPDBN.Supplementary informationSupplementary data are available at Bioinformatics online.
- Research Article
20
- 10.3389/fonc.2021.636451
- Sep 27, 2021
- Frontiers in Oncology
BackgroundColon adenocarcinoma (COAD) is one of the most common malignant tumors in the world. The histopathological features are crucial for the diagnosis, prognosis, and therapy of COAD.MethodsWe downloaded 719 whole-slide histopathological images from TCIA, and 459 corresponding HTSeq-counts mRNA expression and clinical data were obtained from TCGA. Histopathological image features were extracted by CellProfiler. Prognostic image features were selected by the least absolute shrinkage and selection operator (LASSO) and support vector machine (SVM) algorithms. The co-expression gene module correlated with prognostic image features was identified by weighted gene co-expression network analysis (WGCNA). Random forest was employed to construct an integrative prognostic model and calculate the histopathological-genomic prognosis factor (HGPF).ResultsThere were five prognostic image features and one co-expression gene module involved in the model construction. The time-dependent receiver operating curve showed that the prognostic model had a significant prognostic value. Patients were divided into high-risk group and low-risk group based on the HGPF. Kaplan-Meier analysis indicated that the overall survival of the low-risk group was significantly better than the high-risk group.ConclusionsThese results suggested that the histopathological image features had a certain ability to predict the survival of COAD patients. The integrative prognostic model based on the histopathological images and genomic features could further improve the prognosis prediction in COAD, which may assist the clinical decision in the future.
- Book Chapter
- 10.1016/b978-0-443-15452-2.00019-4
- Jan 1, 2025
- Mining Biomedical Text, Images and Visual Features for Information Retrieval
Chapter 19 - Biomedical image characterization and radio genomics using machine learning techniques
- Discussion
5
- 10.21037/tcr.2019.12.17
- Dec 1, 2019
- Translational Cancer Research
In recent years, machine learning and deep learning-based approaches, two sub-fields of artificial intelligence, have emerged as key components in biomedical data analyses (1-5). They can be applied to image segmentation, identifying insertion/deletion mutations, protein alignments, and so on. Several studies have integrated pathological image data with genomics data. Yuan et al. have quantitatively analyzed image data to better describe and validate the independent prognostic factors in estrogen receptor-negative breast cancer (6). Another study by Copper et al. also used histopathology images and genomics data to identify prognostic factors in breast cancer (7). Other types of cancers such as prostate cancer (8), renal cell carcinoma (9), low grade glioma (10), and non-small cell lung cancer (11), just to name a few, have also been studied by approaches integrating (multi-) omics data with pathology images.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.