We introduce the third installment of the COMPAS Project - a COMputational database of Polycyclic Aromatic Systems, focused on peri-condensed polybenzenoid hydrocarbons. In this installment, we develop two datasets containing the optimized ground-state structures and a selection of molecular properties of ∼39k and ∼9k peri-condensed polybenzenoid hydrocarbons (at the GFN2-xTB and CAM-B3LYP-D3BJ/cc-pvdz//CAM-B3LYP-D3BJ/def2-SVP levels, respectively). The manuscript details the enumeration and data generation processes and describes the information available within the datasets. An in-depth comparison between the two types of computation is performed, and it is found that the geometrical disagreement is maximal for slightly-distorted molecules. In addition, a data-driven analysis of the structure-property trends of peri-condensed PBHs is performed, highlighting the effect of the size of peri-condensed islands and linearly annulated rings on the HOMO-LUMO gap. The insights described herein are important for rational design of novel functional aromatic molecules for use in, e.g., organic electronics. The generated datasets provide a basis for additional data-driven machine- and deep-learning studies in chemistry.
Read full abstract