<p indent="0mm">With the introduction of Materials Genome Initiative (MGI) in 2011, high-throughput material computation and material prediction have gradually become a hot topic in materials research. MGI has strong guiding significance and high practical application value as the deepening of the concept of environmental protection, low-consumption, faster, and high-throughput. MGI aims to convert the traditional “Trial Experiment” to “Computation to Experiment”, which has become the trend of materials research. There are two main ways to realize the MGI, the first one is the development of quantum chemical calculations and statistical mechanics, while the other one is the establishment of Quantitative Structure-Property Relationship (QSPR). Among them, the QSPR method has been widely used in recent years due to its excellent predictive properties. Combining data-driven and machine learning algorithms, the QSPR method has undergone major changes today, the most prominent of which is the change in the calculation method used in the QSPR method. Reasonable use of calculation methods in QSPR can greatly increase the speed of QSPR model formation and the accuracy of prediction, which is critical to predicting new materials. The calculation methods are the main factors that determine the accuracy and construction speed of the QSPR model. Choosing the best calculation method is very important to improve the prediction accuracy and calculation speed of the QSPR model. Therefore, this article first summarizes the QSPR model, briefly introduces the calculation steps of the QSPR method, focuses on the calculation methods of molecular descriptors and model optimization algorithms in QSPR, and analyzes the advantages and disadvantages of each method. Finally, this article summarizes the application trend of various calculation methods in QSPR in recent years, and concludes as follow: (1) There are many kinds of molecular descriptor calculation software, and most of the software calculation speed is quite fast. At the same time, the calculated number of molecular descriptors is pretty large, which can achieve better calculation results. However, each software still has its own limitations. Researchers can choose one or some software to calculate molecular descriptors according to their needs. (2) The preprocessing algorithms are mainly screening algorithms, including statistical methods, informatics methods, and biological methods. Evolutionary algorithms, which are developed by biological methods, are increasingly being used because of their excellent global screening capabilities. While the number of molecular descriptors obtained by calculation increase significantly, the global influence of molecular descriptors has been gradually emphasized. Therefore, in recent years, evolutionary algorithms have gradually become a popular choice for molecular descriptor screening. (3) The optimization methods mainly include machine learning algorithms, which can be classified into linear methods, nonlinear methods, and hybrid methods. The first two methods have their own advantages for linear methods can show directly the importance of each molecular descriptor and the nonlinear methods’ models are more accurate. In order to improve the efficiency of algorithm parameter calculation, in recent years, more and more global optimization algorithms have been introduced into machine learning algorithms as parameter optimization methods, which forms the third method—Hybrid optimization method.
Read full abstract