Abstract

An important feature of structural data especially those from structural determination and protein-ligand docking programs is that their distribution could be both uniform and non-uniform. Traditional clustering algorithms developed specifically for non-uniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and non-uniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy certain requirements. The applications of the algorithm to a diverse set of data from NMR structure determination, protein-ligand docking and simulation show that it is superior to the previous clustering algorithms for the identification of the correct but minor clusters. The algorithm should be useful for the identification of correct docking poses and for speeding up an iterative process widely used in NMR structure determination.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call