Abstract

Background: A large number of microbial species have been detected in human faecal samples, with many of the species having high correlations with each other. Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to see whether individuals can be clustered into two classification groups based on predetermined criteria: control and disease group. However, there might be a combination of other components that better distinguish diseased individuals from healthy controls. Genetic algorithms (GA) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and GA for identifying sets of bacterial species associated with high body mass.Results: GA has selected the subsets of the principal components (PCs) produced by PCA. The prediction models built with theses PCs produced much higher area under the curve (AUC) values compared to the models built using top PCs which explained the most variance in the sample. The combinatorial effect of the identified bacterial species that contributed the most to the PCs may be associated with body mass.Conclusions: The proposed algorithm overcomes the limitation of using PCA for prediction modelling. The application of the algorithm on an obesity study has shown the value of applying GA for selecting PC subsets from PCA to improve prediction models. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.