Abstract
In recent years, the microbiome has been recognized as an important factor associated with cardiovascular disease (CVD), which is the leading cause of human mortality worldwide. Disparities in gut microbial compositions between individuals with and without CVD were reported, whereby, we hypothesized that utilizing such microbiome-based data for training with supervised machine learning (ML) models could be exploited as a new strategy for evaluation of cardiovascular health. To test our hypothesis, we analyzed the metagenomics data extracted from the American Gut Project. Specifically, 16S rRNA reads from stool samples of 478 CVD and 473 non-CVD control samples were analyzed using five supervised ML algorithms: random forest (RF), support vector machine with radial kernel (svmRadial), decision tree (DT), elastic net (ENet) and neural networks (NN). Thirty-nine differential bacterial taxa (LEfSe: LDA > 2) were identified between CVD and non-CVD groups. ML classifications, using these taxonomic features, achieved an AUC (area under the receiver operating characteristic curve) of ~0.58 (RF). However, by choosing the top 500 high-variance features of operational taxonomic units (OTUs) for training ML models, an improved AUC of ~0.65 (RF) was achieved. Further, by limiting the selection to only the top 25 highly contributing OTU features to reduce the dimensionality of feature space, the AUC was further significantly enhanced to ~0.70 (RF). In summary, this study is the first to demonstrate the successful development of a ML model using microbiome-based datasets for a systematic diagnostic screening of CVD.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.