Abstract

The bibliography of scientific and technical literature is a collection of all the key information in scientific and technical literature. The clustering technology for scientific and technical literature includes feature extraction and clustering analysis. In this paper, based on the traditional sentence embedding model, we propose a method combining the SentenceBERT model with the improved k-means algorithm to improve the efficiency of clustering of the information in the bibliography of scientific and technical literature. Clustering experiments were conducted using the information in the bibliography of scientific and technical literature collected from the China National Knowledge Infrastructure (CNKI) database, and the adjusted mutual information (AMI) values and silhouette coefficients (SC) were significantly improved, which verified the effectiveness of the method in this paper.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call