Cluster Predictive Model Using Affinity Propagation Algorithm to Group Mushroom 5.8s rRNA Sequences

P Sudhasini,B Ashadevi

doi:10.17485/ijst/v15i41.1341

Abstract

Background: The main emphasis of the article is biological information on a distinct species of mushroom (Phylum Basidiomycota) data collection of 5.8s rRNA sequences. Macrofungi from the phylum Basidiomycota are predominantly used as therapeutic mushrooms in several countries. During the rainy season, hundreds of macrofungal basidiocarps were discovered in Tamilnadu. The internal transcribed spacer (ITS) and 5.8S rRNA gene sequence markers, which have been collected from NCBI, were used to isolate at least thirty of these strains that fall under the Basidiomycota kingdom (suborders of Polyporales, Hymenochataeles, and Russuales), which have the therapeutic properties of the Basidiomycota kingdom. Objectives: This article’s main objective is to organise the sequences according to similarity utilising multiple sequence alignment and an algorithmic perspective. Methods: In this paper, we use 30 30 pairwise similarity matrix data of these thirty 5.8s rRNA mushroom sequences obtained using the clustal omega tool to develop an affinity propagation approach. As a continuation of earlier work, this will be evaluated against k-means, hierarchical clustering based on the ideal cluster, and time and space complexity. Findings: The affinity propagation algorithm typically discourages providing the initial number of clusters; therefore, the optimal number of cluster values and grouping of clustered results obtained from the affinity propagation algorithm are also the same as the results obtained from the previous existing research work using the kmeans, hierarchical agglomerative clustering algorithm. Novelty: The overall suggested technique involves applying the cluster validation metrics Silhouette score, Calinski-Harabasz Index, and Davies-Bouldin Index methodologies to find the ideal number of clusters. The CD-hit Clustering tool does not offer these metrics, and the Cluster Omega tool does not support this kind of extension work. This follow-up work assists bioinformatics researchers in obtaining favourable results by utilising the existing software prior to working in wet laboratories; rather than wasting a lot of chemical resources, this result will open the door for a targeted approach. Keywords: Affinity propagation; Cluster metrics; Kmeans; mushroom sequences; Bioinformatics; Data science; Silhouette score; Calinski-Harabasz index; Davies-Bouldin index

Full Text