Abstract

BackgroundConvolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space.ResultsPh-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron.ConclusionPh-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

Highlights

  • Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images

  • The 10×5−fold CV Data Analysis Protocol (DAP) has been applied on instances of the synthetic datasets and on the inflammatory bowel disease (IBD) datasets, comparing the performance with standard learning algorithms such as linear Support Vector Machines (SVM) and Random Forest (RF), and with a standard Multi Layer Perceptron (MLPNN) [54]

  • As expected [55], no classification task can be reliably tackled by Ph-Convolutional Neural Network (CNN) using the IBD dataset alone: the very small sample size causes the neural network to overfit after just a couple of epochs

Read more

Summary

Introduction

Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. Metagenomics features are endowed with a hierarchical structure provided by the phylogenetic tree defining the bacterial clades. We aim to exploit the phylogenetic structure to enable adopting the Convolutional Neural Network (CNN) DL architecture otherwise not useful for omics data: we name this novel solution Ph-CNN. The operation is based on the matricial structure of a digital image and, in particular, the concept of neighbours of a given pixel. Using the same architecture for non-image data requires the availability of an analogous proximity measure between features

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call