Abstract
Data analytics approaches are increasingly often used to facilitate property-specific materials discovery. The uncertainties in these approaches can be greatly affected by the fidelity of the data sets that are used to train the data models. Therefore, data curation is an essential step for obtaining well-constrained model predictions. This can be a challenging task, especially for data sets that are too large for human quality control. We developed MATCOR, an open source, user-friendly, easily adaptable software to facilitate the data curation process. MATCOR processes lists of material identifiers in either AFLOW or Materials Project format and searches for the best matching materials entry in the other database. This is a non-trivial task due to differences in labeling and/or non-unique usage of material labels. MATCOR uses a combination of characteristics such space group, compound formula, crystal structure and use of Hubbard-U to provide the best possible comparison between databases. The capabilities of MATCOR are demonstrated for density, elastic properties, magnetic properties, and band gap correlations between AFLOW and Materials Project. We find that density shows the highest correlation among the tested properties, 93% of verified densities agree to within ±2%. Bulk- and shear-moduli showed deviations of less than ±10% for 80.6% and 65.1% of the materials, respectively. The classification of materials as non-magnetic/paramagnetic and metallic/gapped are consistent among the two databases for 91% and 69% of the materials, respectively. These examples show that MATCOR can be used to automate and thereby accelerate the data curation process prior to materials discovery through data analytical models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.