Abstract

Genarris is an open source Python package for generating random molecular crystal structures with physical constraints for seeding crystal structure prediction algorithms and training machine learning models. Here we present a new version of the code, containing several major improvements. A MPI-based parallelization scheme has been implemented, which facilitates the seamless sequential execution of user-defined workflows. A new method for estimating the unit cell volume based on the single molecule structure has been developed using a machine-learned model trained on experimental structures. A new algorithm has been implemented for generating crystal structures with molecules occupying special Wyckoff positions. A new hierarchical structure check procedure has been developed to detect unphysical close contacts efficiently and accurately. New intermolecular distance settings have been implemented for strong hydrogen bonds. To demonstrate these new features, we study two specific cases: benzene and glycine. Genarris finds the experimental structures of the two polymorphs of benzene and the three polymorphs of glycine. Program summaryProgram Title: Genarris 2.0Program Files doi:http://dx.doi.org/10.17632/grx6mz4pjn.1Licensing provisions: BSD-3 ClauseProgramming language: Python, CExternal routines/libraries: Spglib, ASE, pymatgen, SciPy, mpi4py, scikit-learn, PyTorch, FHI-aims.Nature of problem: Molecular crystal structure prediction.Solution method: Genarris 2.0 generates molecular crystal structures over the 230 space groups, on general and special Wyckoff positions, using physical constraints. Down-sampling of the generated structures may be performed subsequently, based on molecular crystal packing descriptors and an unsupervised machine learning algorithm. Lastly, ab initio structure relaxation may be performed for the final pool. Depending on the user-defined workflow implemented, Genarris may be used to generate diverse molecular crystal datasets to seed evolutionary algorithms or to train machine learning algorithms or as a standalone crystal structure prediction method.Restrictions: For crystal structure generation, the molecule of interest must be semi-rigid with no bond rotational degrees of freedom.Unusual features: Genarris 2.0 is a highly distributed program, making use of MPI for Python parallelization. The user has the ability to design and implement workflows by executing a user-defined list of procedures. Genarris 2.0 offers new features including a machine learning model for estimating the molecular volume in the solid state from the single molecule structure, structure generation in special Wyckoff positions of space groups, hierarchical structure checks including rigorous treatment of non-orthogonal structures, and clustering and down-selection workflows combining first principles simulations with machine learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call