Abstract
The heterogeneity of microbial flora structures plays an important role in the health and disease of the host. With respect to the temporal and spatial heterogeneity of the flora structure, both unsupervised and supervised learning algorithms have been developed. Because of the similarity of the characteristics of the flora data and the text data, in this paper, we investigate the temporal heterogeneity of the flora structure by applying the latent Dirichlet allocation (LDA) probability topic model for unsupervised learning. We then use system and K -Means clustering to compare these two methods. Two kinds of data sources of Beipingding monkey vaginal flora (MVB) and minimal hepatic encephalopathy (MHE) bacteria heterogeneity operational taxonomic unit (OTUs) data sets are analyzed by the Monte Carlo LDA model with the folding Gibbs sampling. We used the LDA model to divide the 27 samples and 77 sample OTUs in the MVB and MHE data sources, respectively, into six topics and four topics, which differ from the number of clusters (5, 3, and 4, 3) divided by system and K -Means clustering. In addition, experimental results show that the classification similarity of sample diversity, pH value with the physiological data-pH in MVB samples and the similarity of α value in MHE, the classification similarity of the pH and the α is consistent with the classification characteristics of LDA model. As such, the LDA model classifies the OTUs data sets more accurately with respect to the degree of aggregation of the samples. More importantly, the LDA model can also identify representative OTUs in each topic. Compared with the system clustering and K -Means clustering methods, the LDA model can not only quantify the heterogeneity of the flora structure more effectively, but also identify the corresponding heterogeneity of the OTUs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.