Abstract

The exponential growth and success of machine learning (ML) has resulted in its application in all scientific domains including material science. Advancement in experimental techniques has led to an increase in the volume of material science data encouraging material scientists to investigate data-driven solutions to scientific problems. While the resources available to get started with ML are ever increasing, there is little literature on traversing through the space of decisions that need to be made to implement a robust and trustworthy ML solution. A lack of such resources leads to researchers wading through articles and papers trying to determine the best approach for their problem and sometimes also falling prey to pitfalls in a real-world scenario. This paper aims to act as a guide for researchers who want to strategically approach a ML solution to their problem through the use of domain knowledge and systematic evaluation of the major aspects of a ML pipeline. We focus on four aspects of the ML pipeline: (1) problem formulation, (2) data curation, (3) feature representation and model selection, and (4) model generalizability and real-world performance. In each case, we discuss the space of decisions, provide examples from scientific literature, and illustrate how different choices can affect the outcome through a case study of predicting compressive strength of uniaxially pressed molecular solid, 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) samples. Using a similar approach of critical thinking along with rigorous evaluation and diagnostics, researchers can be assured of the reliability of predictions from their ML models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call