Abstract

Selecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent with materials domain knowledge. Herein, a feature selection method embedded with materials domain knowledge named NCOR-FS is proposed to select higher quality features. The method translates materials domain knowledge about highly correlated features into Non-Co-Occurrence Rules (NCORs), which allows to quantify the degree to which NCORs are violated by feature subsets and to design optimization process for FS method based on swarm intelligence algorithm. Experiments on seven datasets show that compared with multiple other FS methods commonly used in materials, NCOR-FS selects the feature subset with more appropriate number of highly correlated features, which improves the prediction accuracy and interpretability of the ML model. NCOR-FS can be applied to any materials systems, and the idea of embedding domain knowledge into data-driven algorithm is expected to facilitate constructing extensive machine learning models embedded with materials domain knowledge.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call