Abstract

In the realm of bioinformatics, we frequently encounter discrete data, particularly microbiome taxa count data obtained through 16S rRNA sequencing. These microbiome datasets are commonly characterized by their high dimensionality and the ability to provide insights solely into relative abundance, necessitating their classification as compositional data. Analyzing such data presents challenges due to their confinement within a simplex. Additionally, microbiome taxa counts are subject to influence by various biological and environmental factors like age, gender, and diet. Thus, we have developed a novel approach involving regression-based mixtures of logistic normal multinomial models for clustering microbiome data. These models effectively categorize samples into more homogeneous subpopulations, enabling the exploration of relationships between bacterial abundance and biological or environmental covariates within each identified group. To enhance the accuracy and efficiency of parameter estimation, we employ a robust framework based on variational Gaussian approximations (VGA). Our proposed method's effectiveness is demonstrated through its application to simulated and real datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.