Abstract

Science and technology are highly inheritable undertakings, and any scientific and technological worker can make good progress without the experience and achievements of predecessors or others. In the face of an ever-expanding pool of literature, the ability to efficiently and accurately search for similar works is a major challenge in current research. This paper uses Latent Dirichlet Allocation (LDA) topic model to construct feature vectors for the title and abstract, and the bag-of-words model to construct feature vectors for publication type. The similarity between the feature vectors is measured by calculating the cosine values. The experiment demonstrated that the precision, recall and WSS95 scores of the algorithm proposed in the study were 90.55%, 98.74% and 52.45% under the literature title element, and 91.78%, 99.58% and 62.47% under the literature abstract element, respectively. Under the literature publication type element, the precision, recall and WSS95 scores of the proposed algorithm were 90.77%, 98.05% and 40.14%, respectively. Under the combination of literature title, abstract and publication type elements, the WSS95 score of the proposed algorithm was 79.03%. In summary, the study proposes a robust performance of the literature screening (LS) algorithm based on the LDA topic model, and a similar LS system designed on this basis can effectively improve the efficiency of LS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.