Feature extraction and a neural network model are applied to predict defect types and concentrations in experimental anatase TiO2 samples. A dataset of TiO2 structures with vacancies and interstitials of oxygen and titanium is built, and the structures are relaxed using energy minimization. The features of the calculated pair distribution functions (PDFs) of these defected structures are extracted using linear methods (principal component analysis and non-negative matrix factorization) and non-linear methods (autoencoder and convolutional neural network). The extracted features are used as inputs to a neural network that maps feature weights to the concentration of each defect type. The performance of this machine learning pipeline is validated by predicting defect concentrations based on experimentally measured TiO2 PDFs and comparing the results to brute-force predictions. A physics-based initialization of the autoencoder has the highest accuracy in predicting defect concentrations. This model incorporates physical interpretability and predictability of material structures, enabling a more efficient characterization process with scattering data.
Read full abstract