MultiCapsNet: A General Framework for Data Integration and Interpretable Classification.

Lifei Wang,Rui Nie,Xuexia Miao,Jiang Zhang,Zhang Zhang,Jun Cai

doi:10.3389/fgene.2021.767602

Abstract

The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.

Highlights

Recent advances in experimental biology have generated huge amounts of data
The results show that the area under the curve” (AUC) of the MultiCapsNet model is 0.94, 0.99, and 0.97, respectively, in the classification categories of “ambiguous”, “fail”, and “somatic” (Figure 4A)
We demonstrated that the proposed MultiCapsNet model performed well in the variant call classification

Summary

Introduction

Recent advances in experimental biology have generated huge amounts of data. More detectable biological targets and various new measuring methods produce data at an unprecedented speed. There is an urgent need for generation methods to deal with large, heterogeneous and complex data sets (Camacho et al, 2018). RNA sequence data as real-value vectors could be processed by simple feed forward neural network, which is a component of more complex models, such as auto-encoder (AE) (Lin et al, 2017; Chen et al, 2018), variational auto-encoder (VAE) (Ding et al, 2018), and Generative adversarial network (GAN) (Lopez et al, 2018). New probabilistic generative models with more interpretability, such as variational inference neural networks, are applied to scRNAseq data for dimension reduction (Ding et al, 2018)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Jan 18, 2022
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

DataSheet2.PDF
-
-
--
30 Nov 2021
30 Nov 2021

DataSheet1.XLSX
-
-
--
30 Nov 2021
30 Nov 2021

Neural Network and Random Forest Models in Protein Function Prediction.
Kai Hakala ... Farrokh Mehryary
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 19
Kai Hakala, et. al.Kai Hakala ... Farrokh Mehryary
11 Dec 2020
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 19

Performance Evaluation of Machine Learning and Deep Learning-Based Models for Predicting Remaining Capacity of Lithium-Ion Batteries
Sang-Hyun Lee
Applied Sciences | VOL. 13
Sang-Hyun LeeSang-Hyun Lee
10 Aug 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics