Abstract
Machine Learning has thrived on the emergence of data-driven materials science. However, the materials datasets acquired at existing research efforts have significant imbalance issues. This paper investigated the data imbalance for the glass-forming ability of ternary alloy systems, which consists of abundant, low-fidelity high-throughput data, and sparse, high-fidelity traditional experimental data. We demonstrated a new method to handle the data imbalance and trained artificial neural network (ANN) models on the original vs. balanced datasets. The ANN model trained on the balanced dataset solved the overfitting issue suffered by the model trained on the original dataset. More importantly, the generalizability in predicting the new alloy system was improved in the data-balanced model, evidenced by the leave-one-alloy-system-out validation. Our work highlights the importance of handling data imbalance in material datasets to solve the overfitting issues of machine learning models and further enhance generalizability in predicting the characteristics of the new material systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.