Abstract

Recent advances in machine learning (ML) based methodologies have accelerated the prediction of the physical properties of materials. These ML models, however, rely on large amounts of simulated or experimental data to make a reliable prediction. This dependence on large amounts of data can be a roadblock to building ML models since collecting the data is prohibitively expensive and time-consuming. In this work, we propose two sampling strategies to reliably train machine learning models in the lowest amounts of data. Our algorithms alleviate the need to generate large datasets to train machine learning models. We demonstrate the effectiveness of these sampling strategies by improving the performance of Crystal Graph Convolutional Neural Network (CGCNN) on four different datasets. Using the proposed strategies, we can reach the benchmark performance of CGCNN models in fewer data samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call