Integrative analysis of histopathology images and genomic data enables the discovery of potential biomarkers and multimodal association patterns. However, few studies have established effective association models for complex diseases, such as sarcoma, by combining histopathological images with multiple genetic variation data. Here, we present an integrative multiple genomic imaging framework called multi-dimensional constrained joint non-negative matrix factorization (MDJNMF) to identify modules related to lung metastasis of sarcomas based on sample-matched whole-solid image, DNA methylation, and copy number variation features. Three types of feature matrices were projected onto a common feature space, in which heterogeneous variables with large coefficients in the same projected direction form a common module. The correlation between image features and genetic variation features is used as network-regularized constraints to improve the module accuracy. Sparsity and orthogonal constraints are utilized to achieve the modular sparse solution. Multi-level analysis indicates that our method effectively discovers biologically functional modules associated with sarcoma or lung metastasis. The representative module reveals a significant correlation between image features and genetic variation features and excavates potential diagnostic biomarkers. In summary, the proposed method provides new clues for identifying association patterns and biomarkers using multiple types of data sources for other diseases.
Read full abstract