In this study, we examined target subsets extracted from the MDL Drug Data Report (MDDR)1 to identify specific molecular shape profiles that are representative for compounds active on those targets. Normalized Principal Moments of Inertia Ratios (NPRs)2 have been used to describe molecular shape of small molecules in a finite triangular descriptor space. The clustering behavior of the MDDR target subsets in a cell-based triangular system shows a significant difference compared to randomly sampled datasets and proves the capability of the NPR descriptor to provide information. For some of the target subsets, certain parts of the descriptor space are unlikely to be occupied by bioactive compounds. All analyzed datasets show a generally biased distribution of molecular shapes: the majority of their compounds exhibit a rod-like character. The influence of the employed 3D conformer generators on this distribution has been assessed as well as the capability of multiple conformations of compounds to increase the shape space covered.
Read full abstract