Abstract

Sequence clustering has become an important topic that experts in data mining are currently investigating. However, clustering quality is typically significantly affected by both the selection of initial centers and the mean sequences. In this study, the sequence clustering algorithm based on weighted vector identification (SCAWVI) algorithm is developed based on sequence element composite similarity and the weight of a sequence in its corresponding cluster. Based on the weighted sequence element, all sequences in the sequence database are preprocessed into M-dimensional weighted vector identifications. Then, using Huffman-based initial clustering centers optimization algorithm, the initial clustering centers are optimized. In addition, the weighted vector identification and the weight of a sequence in its corresponding cluster are used to update the clustering centers. The theoretical experimental results and the analysis results in this study show that the SCAWVI algorithm has a higher rate of accurate results in its clustering results and higher execution efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.