Abstract

Linear models, such as force constant (FC) and cluster expansions, play a key role in physics and materials science. While they can in principle be parametrized using regression and feature selection approaches, the convergence behavior of these techniques, in particular with respect to thermodynamic properties is not well understood. Here, we therefore analyze the efficacy and efficiency of several state-of-the-art regression and feature selection methods, in particular in the context of FC extraction and the prediction of different thermodynamic properties. Generic feature selection algorithms such as recursive feature elimination with ordinary least-squares (OLS), automatic relevance determination regression, and the adaptive least absolute shrinkage and selection operator can yield physically sound models for systems with a modest number of degrees of freedom. For large unit cells with low symmetry and/or high-order expansions they come, however, with a non-negligible computational cost that can be more than two orders of magnitude higher than that of OLS. In such cases, OLS with cutoff selection provides a viable route as demonstrated here for both second-order FCs in large low-symmetry unit cells and high-order FCs in low-symmetry systems. While regression techniques are thus very powerful, they require well-tuned protocols. Here, the present work establishes guidelines for the design of protocols that are readily usable, e.g., in high-throughput and materials discovery schemes. Since the underlying algorithms are not specific to FC construction, the general conclusions drawn here also have a bearing on the construction of other linear models in physics and materials science.

Highlights

  • Linear models such as force constant (FC) and cluster expansions are widely used in materials science, physics, and chemistry to describe the thermodynamic behavior of real materials

  • We present a comparison of linear regression methods and the direct enumeration approach for the extraction of FCs of different order, including second-order FCs for large systems of low symmetry such as defects, third-order FCs for the prediction of the thermal conductivity, as well as higher-order FCs for bulk and surface systems

  • We demonstrate the application of these FC models for studying anharmonic effects, both in the framework of Boltzmann transport theory and molecular dynamics (MD) simula

Read more

Summary

Introduction

Linear models such as force constant (FC) and cluster expansions are widely used in materials science, physics, and chemistry to describe the thermodynamic behavior of real materials. Their computational efficiency and mathematical simplicity are appealing for applications in high-throughput calculations and machine learning, which requires methods for efficient and automatized model construction. Regression techniques in combination with regularization have received a lot of attention for model building, often under the title compressive sensing (CS)[3,4] The latter is in principle a task in sparse signal recovery that is usually approached by finding solutions to an underdetermined linear system. The problem of solving the linear system is, completely independent of CS and CS itself is not a solver

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call