Abstract

AbstractBioinformatics deals with the information technology as applied to management and analysis of biological data. In the field of bioinformatics, data mining helps researchers to mine large amount of biomolecular data. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be an important molecule in all living organisms. They involve virtually in all cell functions. Protein sequence motifs are short fragments of conserved amino acids that transcend in protein sequences. Identifying such motifs is one of the challenging tasks in the area of bioinformatics. Data mining is one such technique to explore sequence motifs. These protein motifs are identified from the segments of protein sequences. All generated sequence segments may not be significant to find sequence motifs. The generated sequence segments have no classes or labels. Hence, Singular Value Decomposition (SVD) entropy technique is adopted as preprocessing method to select sequence segments. The Adaptive Fuzzy C-Means clustering method is performed on the selected segments to obtain granules. Then Bisecting K-Means is applied on each granule to obtain the specified number of clusters. These cluster centroids are given as input to the K-Means algorithm to cluster each granule separately. The result obtained using new initialization technique is then compared with random initialization for K-Means clustering. The comparative results show that new seed selection technique performs better than random initialization. This proposed method identifies significant motif patterns.KeywordsSequence MotifsHSSPSVDBisecting K-MeansAdaptive Fuzzy C-Means

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.