Unsupervised Classification of Chemical Compounds

P Guttiérrez Toscano,F H C Marriott

doi:10.1111/1467-9876.00146

Unsupervised Classification of Chemical Compounds

P Guttiérrez Toscano, F H C Marriott

Open Access

https://doi.org/10.1111/1467-9876.00146

Copy DOI

Journal: Journal of the Royal Statistical Society Series C: Applied Statistics	Publication Date: Jun 1, 1999
Citations: 4

Affiliation: University of Oxford

#Low-dimensional Space #Large Data Sets + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

SUMMARY Clustering chemical compounds of similar structure is important in the pharmaceutical industry. One way of describing the structure is the chemical ‘fingerprint’. The fingerprint is a string of binary digits, and typical data sets consist of very large numbers of fingerprints; a suitable clustering procedure must take account of the properties of this method of coding, and must be able to handle large data sets. This paper describes the analysis of a set of fingerprint data. The analysis was based on an appropriate distance measure derived from the fingerprints, followed by metric scaling into a low-dimensional space. An approximation to metric scaling, suitable for very large data sets, was investigated. Cluster analysis using two programs, mclust and AutoClass-C, was carried out on the scaled data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Journal of the Royal Statistical Society Series C: Applied Statistics

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.