Abstract

Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm—‘K-medians’ is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call