Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

Disha Tandon,Sharmila S Mande,Mohammed Monzoorul Haque,Niyaz Ahmed

doi:10.1371/journal.pone.0154493

Disha Tandon, Sharmila S Mande + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0154493

Copy DOI

Journal: PloS one	Publication Date: Apr 28, 2016
Citations: 20	License type: CC BY 4.0

Affiliation: Tata Consultancy Services (India)

Abstract

The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM) software (customised for deriving 'microbial association rules' from microbiome data) is freely available for download from the following link: http://metagenomics.atc.tcs.com/arm.

Highlights

Recent advances in high-throughput sequencing technologies have enabled life sciences researchers to investigate structure and functional relationships between various organisms constituting a microbial ecosystem
To eliminate spurious correlation artefacts that may result due to differences in sequencing and/or sampling depth, the input abundance matrix is usually subjected to various normalization techniques prior to computing correlation coefficients
While the first group comprised gut microbiome samples taken from subjects prior to the administration of specific prebiotic supplements, the second and third groups had samples obtained during the administration and post-administration phase respectively

Summary

Introduction

Recent advances in high-throughput sequencing technologies have enabled life sciences researchers to investigate structure and functional relationships between various organisms constituting a microbial ecosystem (referred to as microbiome). Deciphering such relationships and interpreting them in the context of the studied environment is a prime objective of contemporary microbiome research initiatives. To eliminate spurious correlation artefacts that may result due to differences in sequencing and/or sampling depth, the input (taxa) abundance matrix is usually subjected to various normalization techniques prior to computing correlation coefficients. Rows (in the abundance matrix) corresponding to organisms having (a) null observations in a majority of samples, or (b) low abundance values (below a specified threshold), are usually purged from the abundance matrix prior to analysis

Objectives

Methods

Results

Discussion

Conclusion