Simulating anaerobic digestion (AD) using a machine learning (ML) is important for guiding methane production in practice. But the performance of the ML is affected by operational conditions and microbial types and quantities, which lead to low accuracy of the traditional model. In this study, a combined ML algorithm SMOTER-GA-RF was constructed by expanding the number of genomic data, introducing oversampling technique (SMOTER) and genetic algorithm (GA) to increase prediction accuracy, which the test set R2 increased 36.22%. The normalized root mean square error and percent bias of the validation set (D2) were 8.92% (<10%) and 4.13% (<15%), indicating the accuracy of prediction. Simultaneously, the predictive accuracy of the validation sets (V1–V5) in the presence of Gaussian white noise remained high, confirming the robustness of the SMOTER-GA-RF. Furthermore, similar ML models were collected and compared, and it was found that SMOTER-GA-RF constructed in this paper had best effect. Finally, important factors affecting the AD performance of straw were analyzed using SHAP. The hydraulic retention time was controlled at approximately 40 d, volatile fatty acid should be controlled within 1000 mg/L, and the relative abundances of the important archaea Methanoculleus, Methanobacterium and Methanosarcina in the second stage of AD were controlled at 5–10%, <18%, and <5%.