Abstract

Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier.

Highlights

  • The latest advancements in bioinformatics along with next-generation sequencing technologies have made metagenomic analysis affordable and approachable

  • We describe results based on the primary data that consists of species abundance of metagenomic samples obtained from the 23 cities

  • We demonstrate the application of the ensemble classifier on the data generated from a taxonomy-free approach

Read more

Summary

Introduction

The latest advancements in bioinformatics along with next-generation sequencing technologies have made metagenomic analysis affordable and approachable. Data from metagenomic sequencing technologies empower accurate estimation of the abundance of microbial communities in samples from different locations and environments. Some previous studies have successfully mined the gut microbiome for extracting information related to the geolocation of the microbiome samples (Suzuki and Worobey, 2014; Clarke et al, 2017; Xia et al, 2019). The abundance of gut microbes such as Firmicutes and Bacteroides is associated with the samples collected from varying latitudes (Suzuki and Worobey, 2014). Microbiome samples collected from urban environments can be a potential source of information for geolocation predictions. Microbiome data analyzed by several teams that participated in previous critical assessment of massive data analysis (CAMDA) challenge corroborate this idea. Harris et al (2019) use both

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call