Judging and identifying biological activities and biomarkers inside tissues from imaging features of diseases is challenging, so correlating pathological image data with genes inside organisms is of great significance for clinical diagnosis. This paper proposes a high-dimensional kernel non-negative matrix factorization (NMF) method based on muti-modal information fusion. This algorithm can project RNA gene expression data and pathological images (WSI) into a common feature space, where the heterogeneous variables with the largest coefficient in the same projection direction form a co-module. In addition, the miRNA-mRNA and miRNA-lncRNA interaction networks in the ceRNA network are added to the algorithm as a priori information to explore the relationship between the images and the internal activities of the gene. Furthermore, the radial basis kernel function is used to calculate the feature proportion between different kinds of genes mapped in the high-dimensional feature space and projected into the common feature space to explore the gene interaction in the high-dimensional situation. The original feature matrix is regularized to improve biological correlation, and the feature factors are sparse by orthogonal constraints to reduce redundancy. Experimental results show that the proposed NMF method is better than the traditional NMF method in stability, decomposition accuracy, and robustness. Through data analysis applied to lung cancer, genes related to tissue morphology are found, such as COL7A1, CENPF and BIRC5. In addition, gene pairs with a correlation degree exceeding 0.8 are found, and potential biomarkers of significant correlation with survival are obtained such as CAPN8. It has potential application value for the clinical diagnosis of lung cancer.
Read full abstract