Abstract

Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression and can be used conveniently to reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science, showing a variety of examples including elemental carbon, porous silicate frameworks, organic molecules, amino acid conformers, and molecular materials.

Highlights

  • Over the past decade, there has been a tremendous increase in the use of data-driven and machine learning (ML) methods in materials science, ranging from the prediction of materials properties [1,2,3,4], to the construction of interatomic potentials [5,6,7,8] and searches for new candidate materials for a particular application [9,10,11,12]

  • In this paper we provide a comprehensive overview of linear- and kernel-based methods for supervised and unsupervised learning, showing an example of their application to elucidate and predict structure-property relations in solid-state NMR

  • We discuss a simple combination of principal component analysis and linear regression, Principal covariates regression (PCovR) [18], that has as yet received far less attention than in our opinion it deserves

Read more

Summary

22 October 2020

Keywords: machine learning, structure-property maps, kernel methods, materials science, physical chemistry Supplementary material for this article is available online Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.

Introduction
Background methods
Linear methods We begin by discussing models of the form:
Linear regression
Feature-space PCovR
Kernel principal component analysis
Extensions to principal covariates regression
Full kernel PCovR
Sparse kernel PCovR
Examples
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.