Cellulose and hemicellulose are key cross-linked carbohydrates affecting bioethanol production in maize stalks. Traditional wet chemical methods for their detection are labor-intensive, highlighting the need for high-throughput techniques. This study used Fourier transform infrared (FTIR) spectroscopy combined with machine learning (ML) algorithms on 200 large-scale maize germplasms to develop robust predictive models for stalk cellulose, hemicellulose and holocellulose content. We identified several peak height features correlated with three contents, used them as input data for model building. Four ML algorithms demonstrated higher predictive accuracy, achieving coefficient of determination (R2) ranging from 0.83 to 0.97. Notably, the Categorical Boosting algorithm yielded optimal models with coefficient of determination (R2) exceeding 0.91 for the training set and over 0.81 for the test set. The approach combined FTIR spectroscopy with ML algorithms offers a precise and high-throughput tool for predicting stalk cellulose, hemicellulose and holocellulose contents, benefiting maize genetic breeding for bioenergy and biofuels.
Read full abstract