AbstractWhile the thermoelectric (TE) materials have attracted significant attention in recent years, the design and discovery of new TE materials with optimal carrier concentration and band gap remains a great challenge. Herein, we report the development of machine learning (ML) methods to predict TE materials with introducing physically meaningful simple descriptors. Specifically, we use the number of electrons, Pauling electronegativity and relative atomic mass as the basic physical variables and compute 242 descriptors in 64 categories to characterize the molecular information of a TE material. Multiple stepwise regression is employed to reduce the dimensionality in the developed ML models, and 5 and 4 important features for the band gap and carrier concentration is selected, respectively. The important features are used as input of a total number of 19 ML methods to select the optimal ML models for the prediction of band gap and carrier concentration, respectively. It is shown that the least square support vector machines method is the best model for the prediction of the band gap, while the back propagating artificial neutral network model exhibits the best performance in predicting the carrier concentration values. This work provides novel theoretical guidance for the rapid prediction properties of TE materials. The simple descriptors we defined can accurately predict the band gap and carrier concentration of quaternary TE materials.