In this talk, I will present our group’s effort to build a high-throughput infrastructure for the automated execution of materials science computations and analysis. This work is an extension of open-source Python packages that have been developed in Materials Project [1]: (1) pymatgen [2] for structure representation and input/output files generation and handling, (2) FireWorks [3] for managing workflows over computing resources, and (3) Custodian [4] for monitoring inevitable errors during the simulations and applying on-the-fly fixes. In our work, we benefit from these three libraries to build well-tested workflows for speeding up the prediction and screening of materials properties relevant to various applications with a focus on battery systems. At the backend, the infrastructure interfaces with the Gaussian [5] software which enables electronic structure calculations of chemical systems, and LAMMPS [6] open-source code for molecular dynamics (MD) simulations. In addition to its ability to handle the generation of input/data files and parsing of output files from the mentioned computational software, the infrastructure allows management of the collected data and storing it in MongoDB [7], a NoSQL database program using JSON-like documents with flexible schema. Examples of implemented Gaussian-based workflows include the calculation of electrostatic partial charges, NMR chemical shift, binding energy, ionization potential, electron affinity, and bond dissociation energy. Each derived property is saved in its own collection with auxiliary information like molecular metadata (smiles representation, chemical formula, ...), which makes it easy to query and data-mine structure-property relationships. The user can tune the calculations by overriding default workflow parameters, for example, the functional and basis set, by-passing selected steps, or packing many jobs over multiple nodes for supercomputing resources. LAMMPS workflows allow the execution of MD simulations in different ensembles and analysis of the dumped trajectories for various dynamical and structural properties. The infrastructure enables coupling first-principles calculations with classical MD simulations to address chemical and physical phenomena occurring in a given system at different length and time scales. This approach aids in the development of databases required for training machine-learning models to accelerate the design and screening of materials with optimized properties for various materials science applications.References Jain, A., et al., Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. Apl Materials, 2013. 1(1): p. 011002.Ong, S.P., et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 2013. 68: p. 314-319.Jain, A., et al., FireWorks: a dynamic workflow system designed for high‐throughput applications. Concurrency and Computation: Practice and Experience, 2015. 27(17): p. 5037-5059.Custodian. <https://github.com/materialsproject/custodian>.Frisch, M.J., et al., Gaussian 16 Rev. C.01. 2016: Wallingford, CT.Plimpton, S., Fast parallel algorithms for short-range molecular dynamics. 1993, Sandia National Labs., Albuquerque, NM (United States).MongoDB Inc., M., 2014.
Read full abstract