Research and Application of Clustering Algorithm for Text Big Data.

Zi Li Chen

doi:10.1155/2022/7042778

Abstract

In the era of big data, text as an information reserve database is very important, in all walks of life. From humanities research to government decision-making, from precision medicine to quantitative finance, from customer management to marketing, massive text, as one of the most important information carriers, plays an important role everywhere. The text data generated in these practical problems of humanities research, financial industry, marketing, and other fields often has obvious domain characteristics, often containing the professional vocabulary and unique language patterns in these fields and often accompanied by a variety of “noise.” Dealing with such texts is a great challenge for the current technical conditions, especially for Chinese texts. A clustering algorithm provides a better solution for text big data information processing. Clustering algorithm is the main body of cluster analysis, K-means algorithm with its implementation principle is simple, low time complexity is widely used in the field of cluster analysis, but its K value needs to be preset, initial clustering center random selection into local optimal solution, other clustering algorithm, such as mean drift clustering, K-means clustering in mining text big data. In view of the problems of the above algorithm, this paper first extracts and analyzes the text big data and then does experiments with the clustering algorithm. Experimental conclusion: by analyzing large-scale text data limited to large-scale and simple data set, the traditional K-means algorithm has low efficiency and reduced accuracy, and the K-means algorithm is susceptible to the influence of initial center and abnormal data. According to the above problems, the K-means cluster analysis algorithm for data sets with large data volumes is analyzed and improved to improve its execution efficiency and accuracy on data sets with large data volume set. Mean shift clustering can be regarded as making many random centers move towards the direction of maximum density gradually, that is, moving their mean centroid continuously according to the probability density of data and finally obtaining multiple maximum density centers. It can also be said that mean shift clustering is a kernel density estimation algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational intelligence and neuroscience	Publication Date: Jun 8, 2022
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Research and Application of Clustering Algorithm for Text Big Data.

Abstract

Talk to us

Similar Papers

More From: Computational intelligence and neuroscience

Lead the way for us

Similar Papers

Real-time fault detection approach of software under big data environment
Xianrui Jian
-
Xianrui JianXianrui Jian
01 Jan 2015
01 Jan 2015

K-Means Clustering Algorithm–Based Functional Magnetic Resonance for Evaluation of Regular Hemodialysis on Brain Function of Patients with End-Stage Renal Disease
Yan Cheng ... Yan Yu
Computational and Mathematical Methods in Medicine | VOL. 2022
Yan Cheng, et. al.Yan Cheng ... Yan Yu
21 Jun 2022
Computational and Mathematical Methods in Medicine | VOL. 2022

Improvement of K-means clustering algorithm based on MIP optimization
Wenbing Chang ... Shenghan Zhou
Journal of Physics: Conference Series | VOL. 1053
Wenbing Chang, et. al.Wenbing Chang ... Shenghan Zhou
01 Jul 2018
Journal of Physics: Conference Series | VOL. 1053

Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization
Yang Xianfeng ... Liu Pengfei
International Journal of Database Theory and Application | VOL. 8
Yang Xianfeng, et. al.Yang Xianfeng ... Liu Pengfei
30 Jun 2015
International Journal of Database Theory and Application | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research and Application of Clustering Algorithm for Text Big Data.

Abstract

Talk to us

Similar Papers

More From: Computational intelligence and neuroscience