Abstract
Multimodal data integration is an important framework for cancer subtype discovery as it can blend the inherent properties of individual modalities with their cross-platform correlations to infer clinically relevant subtypes. The main problem here is the appropriate selection of relevant and complementary modalities. Another problem is the 'high dimension-low sample size' nature of each modality. The current research work proposes a novel algorithm to construct a low-rank joint subspace from the low-rank subspaces of individual high-dimensional modalities. Statistical hypothesis testing is introduced to effectively estimate the rank of each modality by separating the signal component from its noise counterpart. Two quantitative indices are proposed to evaluate the quality of different modalities, the first one assesses the degree of relevance of the cluster structure embedded within each modality, while the second measure evaluates the amount of cluster information shared between two modalities. To construct the joint subspace, the algorithm selects the most relevant modalities with maximum shared information. During data integration, the intersection between two subspaces is also considered to select cluster information and filter out the noise from different subspaces. The efficacy of clustering on the joint subspace, extracted by the proposed algorithm, is compared with that of several existing integrative clustering approaches on real-life multimodal data sets. Experimental results show that the identified subtypes have closer resemblance with the clinically established subtypes as compared to the subtypes identified by the existing approaches. Survival analysis has revealed the significant differences between survival profiles of the identified subtypes, while robustness analysis shows that the identified subtypes are not sensitive towards perturbation of the data sets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.