Integration of single cell multiomics data by deep transfer hypergraph neural network

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Multi-omics characterization of individual cells offers remarkable potential for analyzing the dynamics and relationships of gene regulatory states across millions of cells. How to integrate multimodal data is an open problem, existing integration methods struggle with accuracy and modality-specific biological variation retention. In this paper, we present scHyper (scalable, interpretable machine learning for single cell integration), a low-code and data-efficient deep transfer model designed for integrating paired and unpaired single-cell multimodal data. We benchmark scHyper against datasets from different multimodal data. ScHyper learns a low-dimensional representation and aligns the covariance matrices of the measured modalities, achieving high accuracy even with large scale atlas-level datasets with low memory and computational time across different cell lines, shedding light on regulatory relationships between different types of omics. Altogether, we show that scHyper is a versatile and robust tool for cell-type label transfer and integration from multimodal single-cell datasets.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.cag.2015.05.004
Multimodal volume illumination
  • May 19, 2015
  • Computers & Graphics
  • Erik Sundén + 2 more

Multimodal volume illumination

  • Research Article
  • Cite Count Icon 1
  • 10.3389/conf.fnins.2015.91.00005
Constructing subject-specific virtual brains from multimodal neuroimaging data
  • Jan 1, 2015
  • Frontiers in Neuroscience
  • Schirner Michael + 2 more

Event Abstract Back to Event Constructing subject-specific virtual brains from multimodal neuroimaging data Michael Schirner1, 2, Simon Rothmeier1, 2 and Petra Ritter1, 2* 1 Charité Berlin, Germany 2 Bernstein Center for Computational Neuroscience, Bernstein Focus State Dependencies of Learning, Germany Large amounts of multimodal neuroimaging data are acquired every year worldwide. In order to extract high dimensional information for computational neuroscience applications standardized data fusion and efficient reduction into integrative data structures are required. Such self-consistent multimodal data sets can be used for computational brain modeling to constrain models with individual measurable features of the brain, such as done with The Virtual Brain (TVB). TVB is a simulation platform that uses empirical structural and functional data to build full brain models of individual humans. For convenient model construction, we developed a shell scripted processing pipeline for structural, functional and diffusion-weighted magnetic resonance imaging (MRI) and optionally electroencephalography (EEG) data. The pipeline combines several state-of-the-art neuroinformatics tools to generate subject-specific cortical and subcortical parcellations, surface-tessellations, structural and functional connectomes, lead field matrices, electrical source activity estimates and region-wise aggregated blood oxygen level dependent (BOLD) functional MRI (fMRI) time-series. The output files of the pipeline can be directly uploaded to TVB to create and simulate individualized large-scale network models. We detail the pitfalls of the individual processing streams and discuss ways of validation. With the pipeline we also introduce novel ways of estimating the transmission strengths of fiber tracts in whole-brain structural connectivity (SC) networks and compare the outcomes of different tractography or parcellation approaches. We tested the functionality of the pipeline on 50 multimodal data sets. In order to quantify the robustness of the connectome extraction part of the pipeline we computed several metrics that quantify its rescan reliability and compared them to other tractography approaches. Together with the pipeline we present several principles to guide future efforts to standardize brain model construction. The code of the pipeline and the fully processed data sets are made available to the public via The Virtual Brain website (thevirtualbrain.org) and via Github (https://github.com/BrainModes/TVB-empirical-data-pipeline). Furthermore, the pipeline can be directly used with High Performance Computing (HPC) resources on the Neuroscience Gateway Portal (http://www.nsgportal.org) through a convenient web-interface. References Michael Schirner, Simon Rothmeier, Viktor K. Jirsa, Anthony Randal McIntosh, Petra Ritter, An automated pipeline for constructing personalized virtual brains from multimodal neuroimaging data, NeuroImage, Available online 31 March 2015, ISSN 1053-8119, http://dx.doi.org/10.1016/j.neuroimage.2015.03.055 Keywords: multi modal data, the virtual brain, connectome, tractography, computational modeling Conference: Neuroinformatics 2015, Cairns, Australia, 20 Aug - 22 Aug, 2015. Presentation Type: Poster, to be considered for oral presentation Topic: Computational neuroscience Citation: Schirner M, Rothmeier S and Ritter P (2015). Constructing subject-specific virtual brains from multimodal neuroimaging data. Front. Neurosci. Conference Abstract: Neuroinformatics 2015. doi: 10.3389/conf.fnins.2015.91.00005 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 29 May 2015; Published Online: 05 Aug 2015. * Correspondence: Dr. Petra Ritter, Charité Berlin, Berlin, Germany, petra.ritter@charite.de Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract Supplemental Data The Authors in Frontiers Michael Schirner Simon Rothmeier Petra Ritter Google Michael Schirner Simon Rothmeier Petra Ritter Google Scholar Michael Schirner Simon Rothmeier Petra Ritter PubMed Michael Schirner Simon Rothmeier Petra Ritter Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

  • Research Article
CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data
  • Apr 12, 2023
  • ArXiv
  • Hui Shen + 8 more

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.

  • Research Article
  • Cite Count Icon 1
  • 10.21203/rs.3.rs-2768563/v1
CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data
  • May 2, 2023
  • Research Square
  • Weihua Zhou + 8 more

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multiomics data.

  • Research Article
  • Cite Count Icon 2
  • 10.1186/s40537-024-01054-w
Transformer enabled multi-modal medical diagnosis for tuberculosis classification
  • Jan 14, 2025
  • Journal of Big Data
  • Sachin Kumar + 2 more

Recently, multimodal data analysis in medical domain has started receiving a great attention. Researchers from both computer science, and medicine are trying to develop models to handle multimodal medical data. However, most of the published work have targeted the homogeneous multimodal data. The collection and preparation of heterogeneous multimodal data is a complex and time-consuming task. Further, development of models to handle such heterogeneous multimodal data is another challenge. This study presents a cross modal transformer-based fusion approach for multimodal clinical data analysis using medical images and clinical data. The proposed approach leverages the image embedding layer to convert image into visual tokens, and another clinical embedding layer to convert clinical data into text tokens. Further, a cross-modal transformer module is employed to learn a holistic representation of imaging and clinical modalities. The proposed approach was tested for a multi-modal lung disease tuberculosis data set. Further, the results are compared with recent approaches proposed in the field of multimodal medical data analysis. The comparison shows that the proposed approach outperformed the other approaches considered in the study. Another advantage of this approach is that it is faster to analyze heterogeneous multimodal medical data in comparison to existing methods used in the study, which is very important if we do not have powerful machines for computation.

  • Research Article
  • Cite Count Icon 51
  • 10.3389/fgene.2019.00617
MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework
  • Jun 28, 2019
  • Frontiers in Genetics
  • Garam Lee + 4 more

As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer’s disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s40012-019-00236-9
Recent advances in multimodal big data analysis for cancer diagnosis
  • May 28, 2019
  • CSI Transactions on ICT
  • Pradipta Maji

With the rapid technological advances in acquiring data from diverse platforms in cancer research, numerous large scale omics and imaging data sets have become available, providing high-resolution views and multifaceted descriptions of biological systems. Simultaneous analysis of such multimodal data sets is an important task in integrative systems biology. The main challenge here is how to integrate them to extract relevant and meaningful information for a given problem. The multimodal data contains more information and the combination of multimodal data may potentially provide a more complete and discriminatory description of the intrinsic characteristics of pattern by producing improved system performance than individual modalities. In this regard, some recent advances in multimodal big data analysis for cancer diagnosis are reported in this article.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 63
  • 10.1038/s41587-023-01934-1
Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials
  • Sep 7, 2023
  • Nature Biotechnology
  • Yuanting Zheng + 60 more

Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free ‘absolute’ feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.

  • Book Chapter
  • 10.1007/978-3-319-69900-4_6
A New Method to Address Singularity Problem in Multimodal Data Analysis
  • Jan 1, 2017
  • Ankita Mandal + 1 more

In general, the ‘small sample (n)-large feature ( Open image in new window )’ problem of bioinformatics, image analysis, high throughput molecular screening, astronomy, and other high dimensional applications makes the features highly collinear. In this context, the paper presents a new feature extraction algorithm to address this ‘large Open image in new window small n’ issue associated with multimodal data sets. The proposed algorithm judiciously integrates the concept of both regularization and shrinkage with canonical correlation analysis to extract important features. To deal with the singularity problem, the proposed method increases the diagonal elements of covariance matrices by using regularization parameters, while the off-diagonal elements are decreased by shrinkage coefficients. The concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used to compute both significance and relevance measures of a feature. The importance of the proposed algorithm over other existing methods is established extensively on real life multimodal omics data set.

  • Research Article
  • 10.3724/sp.j.1089.2022.19194
Multimodal Human Motion Synchronization Dataset
  • Nov 1, 2022
  • Journal of Computer-Aided Design & Computer Graphics
  • Jingming Cheng + 4 more

Human motion dataset is an important foundation for researches such as motion data denoising, motion editing, motion synthesis, etc. In order to support more generic studies of multimodal motion data fusion, designing and collecting a public multimodal human motion data set is an urgent problem. First, the acquisition environment is designed for precise motion data collected by sensor-based motion capture devices, rough motion data collected by body sensing devices, and local inertial data collected by inertial measurement units (IMU). Then, temporal synchronization among equipment is applied based on network time protocol (NTP) and spatial synchronization is applied among multi modal data. A full body motion dataset named HFUT-MMD is captured, which contains 6 971 568 frames in 6 types from 12 actors/actresses. The experimental results on the HFUT-MMD dataset using the existing algorithm show that the low precision motion data can be optimized to obtain the motion data similar to the accurate motion data, which corroborates the consistency between the modal data.

  • Research Article
  • Cite Count Icon 7
  • 10.1109/tcsvt.2016.2642825
Multimodal Visual Data Registration for Web-Based Visualization in Media Production
  • Apr 1, 2018
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Hansung Kim + 3 more

Recent developments of video and sensing technology have led to large volumes of digital media data. Current media production relies on videos from the principal camera together with a wide variety of heterogeneous source of supporting data [photos, light detection and ranging point clouds, witness video camera, high dynamic range imaging, and depth imagery]. Registration of visual data acquired from various 2D and 3D sensing modalities is challenging because current matching and registration methods are not appropriate due to differences in structure, format, and noise characteristics for multimodal data. A combined 2D/3D visualization of this registered data allows an integrated overview of the entire data set. For such a visualization, a Web-based context presents several advantages. In this paper, we propose a unified framework for registration and visualization of this type of visual media data. A new feature description and matching method is proposed, adaptively considering local geometry, semiglobal geometry, and color information in the scene for more robust registration. The resulting registered 2D/3D multimodal visual data are too large to be downloaded and viewed directly via the Web browser, while maintaining an acceptable user experience. Thus, we employ hierarchical techniques for compression and restructuring to enable efficient transmission and visualization over the Web, leading to interactive visualization as registered point clouds, 2D images, and videos in the browser, improving on the current state-of-the-art techniques for Web-based visualization of big media data. This is the first unified 3D Web-based visualization of multimodal visual media production data sets. The proposed pipeline is tested on big multimodal data set typical of film and broadcast production, which are made publicly available. The<br/>proposed feature description method shows two times higher precision of feature matching and more stable registration performance than existing 3D feature descriptors.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1038/s41598-022-19019-5
Network-based integration of multi-omics data for clinical outcome prediction in neuroblastoma
  • Sep 14, 2022
  • Scientific Reports
  • Conghao Wang + 4 more

Multi-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/jmse12030513
An Adaptive Multimodal Data Vessel Trajectory Prediction Model Based on a Satellite Automatic Identification System and Environmental Data
  • Mar 20, 2024
  • Journal of Marine Science and Engineering
  • Ye Xiao + 4 more

Ship trajectory prediction is essential for ensuring safe route planning and to have advanced warning of the dangers at sea. With the development of deep learning, most of the current research has explored advanced prediction methods based on historical spatio-temporal Automatic Identification System (AIS) data. However, environmental factors such as sea wind and visibility also affect ship navigation in real-world maritime shipping. Therefore, developing reliable models utilizing multimodal data, such as AIS and environmental data, is challenging. In this research, we design an adaptive multimodal vessel trajectory data prediction model (termed AMD) based on satellite AIS and environmental data. The AMD model mainly consists of an AIS-based extraction network, an environmental-based extraction network, and a fusion block. In particular, this work considers multimodal data such as historical spatio-temporal information and environmental factors. Time stamps and distances are correlated with AIS and environmental data, and a multilayer perceptron and gated recurrent unit networks are used to design multimodal feature extraction networks. Finally, the fusion block realizes the fusion output of multimodal features to improve the reliability of the AMD model. Several quantitative and qualitative experiments are conducted using real-world AIS and multimodal environmental datasets. Numerous experimental results prove that prediction performance using multimodal data can ensure satisfactory accuracy and reliability while exhibiting a positive impact on improving maritime transport services.

  • Research Article
  • Cite Count Icon 16
  • 10.3390/fire7040104
Fire Detection in Urban Areas Using Multimodal Data and Federated Learning
  • Mar 22, 2024
  • Fire
  • Ashutosh Sharma + 6 more

Fire chemical sensing for indoor detection of fire plays an essential role because it can detect chemical volatiles before smoke particles, providing a faster and more reliable method for early fire detection. A thermal imaging camera and seven distinct fire-detecting sensors were used simultaneously to acquire the multimodal fire data that is the subject of this paper. The low-cost sensors typically have lower sensitivity and reliability, making it impossible for them to detect fire at greater distances. To go beyond the limitation of using solely sensors for identifying fire, the multimodal dataset is collected using a thermal camera that can detect temperature changes. The proposed pipeline uses image data from thermal cameras to train convolutional neural networks (CNNs) and their many versions. The training of sensors data (from fire sensors) uses bidirectional long-short memory (BiLSTM-Dense) and dense and long-short memory (LSTM-DenseDenseNet201), and the merging of both datasets demonstrates the performance of multimodal data. Researchers and system developers can use the dataset to create and hone cutting-edge artificial intelligence models and systems. Initial evaluation of the image dataset has shown densenet201 as the best approach with the highest validation parameters (0.99, 0.99, 0.99, and 0.08), i.e., Accuracy, Precision, Recall, and Loss, respectively. However, the sensors dataset has also shown the highest parameters with the BILSTM-Dense approach (0.95, 0.95, 0.95, 0.14). In a multimodal data approach, image and sensors deployed with a multimodal algorithm (densenet201 for image data and Bi LSTM- Dense for Sensors Data) has shown other parameters (1.0, 1.0, 1.0, 0.06). This work demonstrates that, in comparison to the conventional deep learning approach, the federated learning (FL) approach performs privacy-protected fire leakage classification without significantly sacrificing accuracy and other validation parameters.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1109/access.2019.2955958
MSPL: Multimodal Self-Paced Learning for Multi-Omics Feature Selection and Data Integration
  • Jan 1, 2019
  • IEEE Access
  • Zi-Yi Yang + 3 more

Rapid advances in high-throughput sequencing technology have led to the generation of a large number of multi-omics biological datasets. Integrating data from different omics provides an unprecedented opportunity to gain insight into disease mechanisms from different perspectives. However, integrative analysis and predictive modeling from multi-omics data are facing three major challenges: i) heavy noises; ii) the high dimensions compared to the small samples; iii) data heterogeneity. Current multi-omics data integration approaches have some limitations and are susceptible to heavy noise. In this paper, we present MSPL, a robust supervised multi-omics data integration method that simultaneously identifies significant multi-omics signatures during the integration process and predicts the cancer subtypes. The proposed method not only inherits the generalization performance of self-paced learning but also leverages the properties of multi-omics data containing correlated information to interactively recommend high-confidence samples for model training. We demonstrate the capabilities of MSPL using simulated data and five multi-omics biological datasets, integrating up three omics to identify potential biological signatures, and evaluating the performance compared to state-of-the-art methods in binary and multi-class classification problems. Our proposed model makes multi-omics data integration more systematic and expands its range of applications.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.