Abstract

With the recent advancement of technologies, it is progressively easier to collect diverse types of genome-wide data. It is commonly expected that by analyzing these data in an integrated way, one can improve the understanding of a complex biological system. Current methods, however, are prone to overfitting heavy noise such that their applications are limited. High noise is one of the major challenges for multiomics data integration. This may be the main cause of overfitting and poor performance in generalization. A sample reweighting strategy is typically used to cope with this problem. In this article, we propose a robust multimodal data integration method, called SMSPL, which can simultaneously predict subtypes of cancers and identify potentially significant multiomics signatures. Especially, the proposed method leverages the linkages between different types of data to interactively recommend high-confidence samples, adopts a new soft weighting scheme to assign weights to the training samples of each type, and then iterates between weights recalculating and classifiers updating. Simulation and five real experiments substantiate the capability of the proposed method for classification and identification of significant multiomics signatures with heavy noise. We expect SMSPL to take a small step in the multiomics data integration and help researchers comprehensively understand the biological process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.