Abstract
Analysis of the variability of molecular descriptors in large compound databases has recently been carried out using both the Shannon entropy (SE) and differential Shannon entropy (DSE) concepts that reduce descriptor distributions to their information content (SE analysis) and detect intrinsic differences between descriptor settings in compound databases (DSE analysis). Here it is shown that a combination of SE and DSE calculations, termed SE-DSE analysis, makes it possible to identify molecular descriptors most sensitive to systematic differences in databases consisting of synthetic, drug-like, and natural molecules. Descriptors with consistently high information content are detected, and database-specific differences are quantified. Different sets of only very few descriptors were found to be most responsive to principal differences between synthetic, natural, and drug-like molecules. Descriptors with DSE values furthest away from zero are likely to best distinguish between compounds with different characteristics. SE-DSE analysis also reveals that a number of descriptors are not sensitive to compound class-specific features, despite their complexity and consistently high information content.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Chemical Information and Computer Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.