Abstract
The article deals with one of the important tasks of artificial intelligence – machine processingof natural language. The solution of this problem based on cluster analysis makes it possibleto identify, formalize and integrate large amounts of linguistic expert information under conditionsof information uncertainty and weak structure of the original text resources obtained fromvarious subject areas. Cluster analysis is a powerful tool for exploratory analysis of text data,which allows for an objective classification of any objects that are characterized by a number offeatures and have hidden patterns. A review and analysis of modern modified algorithms for agglomerativeclustering CURE, ROCK, CHAMELEON, non-hierarchical clustering PAM, CLARAand the affine transformation algorithm used at various stages of text data clustering, the effectivenessof which is verified by experimental studies, is carried out. The paper substantiates therequirements for choosing the most efficient clustering method for solving the problem of increasing the efficiency of intellectual processing of linguistic expert information. Also, the paper considersmethods for visualizing clustering results for interpreting the cluster structure and dependencieson a set of text data elements and graphical means of their presentation in the form ofdendograms, scatterplots, VOS similarity diagrams, and intensity maps. To compare the quality ofthe algorithms, internal and external performance metrics were used: "V-measure", "AdjustedRand index", "Silhouette". Based on the experiments, it was found that it is necessary to use ahybrid approach, in which, for the initial selection of the number of clusters and the distribution oftheir centers, use a hierarchical approach based on sequential combining and averaging the characteristicsof the closest data of a limited sample, when it is not possible to put forward a hypothesisabout the initial number of clusters. Next, connect iterative clustering algorithms that providehigh stability with respect to noise features and the presence of outliers. Hybridization increasesthe efficiency of clustering algorithms. The research results showed that in order to increase thecomputational efficiency and overcome the sensitivity when initializing the parameters of clusteringalgorithms, it is necessary to use metaheuristic approaches to optimize the parameters of thelearning model and search for a global optimal solution.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.