In recent years, statistics and machine learning methods have been widely used to analyze the relationship between human gut microbial metagenome and metabolic diseases, which is of great significance for the functional annotation and development of microbial communities. In this study, we proposed a new and scalable framework for image enhancement and deep learning of gut metagenome, which could be used in the classification of human metabolic diseases. Each data sample in three representative human gut metagenome datasets was transformed into image and enhanced, and put into the machine learning models of logistic regression (LR), support vector machine (SVM), Bayesian network (BN) and random forest (RF), and the deep learning models of multilayer perceptron (MLP) and convolutional neural network (CNN). The accuracy performance of the overall evaluation model for disease prediction was verified by accuracy (A), accuracy (P), recall (R), F1 score (F1), area under ROC curve (AUC) and 10 fold cross-validation. The results showed that the overall performance of MLP model was better than that of CNN, LR, SVM, BN, RF and PopPhy-CNN, and the performance of MLP and CNN models was further improved after data enhancement (random rotation and adding salt-and-pepper noise). The accuracy of MLP model in disease prediction was further improved by 4%-11%, F1 by 1%-6% and AUC by 5%-10%. The above results showed that human gut metagenome image enhancement and deep learning could accurately extract microbial characteristics and effectively predict the host disease phenotype. The source code and datasets used in this study can be publicly accessed in https://github.com/HuaXWu/GM_ML_Classification.git.
Read full abstract