Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships

Steven B Torrisi,Junko Yano,Linda Hung,Santosh K Suram,Joseph H Montoya,Matthew R Carbone,Brian A Rohr,Yang Ha

doi:10.1038/s41524-020-00376-6

Abstract

X-ray absorption spectroscopy (XAS) produces a wealth of information about the local structure of materials, but interpretation of spectra often relies on easily accessible trends and prior assumptions about the structure. Recently, researchers have demonstrated that machine learning models can automate this process to predict the coordinating environments of absorbing atoms from their XAS spectra. However, machine learning models are often difficult to interpret, making it challenging to determine when they are valid and whether they are consistent with physical theories. In this work, we present three main advances to the data-driven analysis of XAS spectra: we demonstrate the efficacy of random forests in solving two new property determination tasks (predicting Bader charge and mean nearest neighbor distance), we address how choices in data representation affect model interpretability and accuracy, and we show that multiscale featurization can elucidate the regions and trends in spectra that encode various local properties. The multiscale featurization transforms the spectrum into a vector of polynomial-fit features, and is contrasted with the commonly-used “pointwise” featurization that directly uses the entire spectrum as input. We find that across thousands of transition metal oxide spectra, the relative importance of features describing the curvature of the spectrum can be localized to individual energy ranges, and we can separate the importance of constant, linear, quadratic, and cubic trends, as well as the white line energy. This work has the potential to assist rigorous theoretical interpretations, expedite experimental data collection, and automate analysis of XAS spectra, thus accelerating the discovery of new functional materials.

Highlights

Rapid extraction of structure-property relationships is critical to the discovery of functional materials
Visual description of our workflow. a The materials that we study consist of 3d transition metal oxide structures drawn from the Materials Project (MP) database[7] and Open Quantum Materials Database (OQMD)8. b The inputs to our machine learning (ML) models are X-ray absorption near-edge structure (XANES) spectra computed using FEFF 940, either downloaded from the MP or computed using the same set of parameters. c The local properties to be predicted from spectra are the coordination number, the mean nearest-neighbor distance, and the Bader charge. d The models we train are random forests, where features are either the entire spectra projected onto a uniformly spaced 100-point energy grid, or the coefficients of overlapping polynomials fit to partitions of the spectra
This work represents an advance in the scope of ML applications for XANES and the use of feature ranking for generating XANES insights

Summary

INTRODUCTION

Rapid extraction of structure-property relationships is critical to the discovery of functional materials. X-ray absorption spectroscopy (XAS)[14,15] is a characterization technique that is sensitive to local electronic and atomic structure, and has been important for discovering and understanding functional materials for a wide range of energy applications, such as CO2 capture by metal oxide nanoparticles[16,17], solar water splitting[18], and catalysis[19,20,21] It is suitable as a local probe thanks to its general robustness, large signal-to-noise ratio[22], element specificity, and unique sensitivity to the chemical environments of absorbing atoms[23,24,25,26,27].

RESULTS

DISCUSSION

Findings

10 CODE AVAILABILITY