Abstract

Selecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent with materials domain knowledge. Herein, a feature selection method embedded with materials domain knowledge named NCOR-FS is proposed to select higher quality features. The method translates materials domain knowledge about highly correlated features into Non-Co-Occurrence Rules (NCORs), which allows to quantify the degree to which NCORs are violated by feature subsets and to design optimization process for FS method based on swarm intelligence algorithm. Experiments on seven datasets show that compared with multiple other FS methods commonly used in materials, NCOR-FS selects the feature subset with more appropriate number of highly correlated features, which improves the prediction accuracy and interpretability of the ML model. NCOR-FS can be applied to any materials systems, and the idea of embedding domain knowledge into data-driven algorithm is expected to facilitate constructing extensive machine learning models embedded with materials domain knowledge.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.