Abstract

In the present paper, the cluster analysis as a form of unsupervised learning is implemented for human protein class prediction. The data related to human protein is accessed from Human Protein Reference Database (HPRD). From HPRD, the sequences related to ten molecular classes are obtained. For each of the molecular class five amino acid sequences are obtained. Then with the help of various web based tools, SDFs (Sequence derived Features) are extracted for each sequence. By analyzing the variation in the values of the obtained SDFs, priorities are assigned to them. Because each sequence has some value for each of the SDF, so obtained data is a complete weighted bipartite graph consisting of two independent set of nodes i.e. one set of all the sequences and second of all SDFs. Then bipartite graph is represented into the memory with adjacency weight matrix. On the basis of values of input SDFs and by considering priority of each of the SDF, clusters of the data available in the adjacency matrix are generated. Then those clusters are backtracked to predict the class of the entered sequence. General Terms Bioinformatics, Machine Learning, Human Protein Class Prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call