Abstract

As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models.

Highlights

  • The microbiome in human is involved in a large number of human essential functions, such as metabolism, nutrient intake and energy generation

  • For idiopathic central precocious puberty (ICPP) data (Figure 2B), the map produced by t-distributed stochastic neighborhood embedding (t-SNE) with Aitchison distance contains a few points that are clustered with the wrong group, probably due to more complex composition and more distinct individual differences in gut microbiota

  • We proposed a classification approach based on t-SNE, taking into account the compositional characteristic of microbiome data

Read more

Summary

INTRODUCTION

The microbiome in human is involved in a large number of human essential functions, such as metabolism, nutrient intake and energy generation. Since the number of the sequence reads is difficult to generate for each sample in an experiment, the microbiome data is often required to be converted to the relative abundance for deeper analysis, resulting in compositional microbiome data (McMurdie and Holmes, 2014; Weiss et al, 2017). The t-SNE method does not provide a built-in way to map new data points to the corresponding low-dimensional representation, and it is hardly utilized for classification or regression tasks (Maaten, 2009). Some studies have attempted to cope with this out-of-sample extension problem by using neural networks for feature extraction and perform classification on the mapped low-dimensional space from t-SNE (Maaten, 2009; Oliveira et al, 2018). Using the low-dimensional features as input, three commonly used methods-logistic regression (LR), support vector machine (SVM), and decision tree (DT) are applied for classification in this study

METHODS
RESULTS
DISCUSSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.