Clustering of Research Documents - A Survey on Semantic Analysis and Keyword Extraction

Srikesh Rajesh Nair,Akshay Anto Vadakkan,Aditya G Pillai,Gokul G,Mg Thushara

doi:10.1109/i2ct51068.2021.9418197

Abstract

Natural Language Processing (NLP) has always been a major focus of companies and institutions alike. There have been many significant contributions in this field of Computer Science. As every year publications are contributed in the different research domain, there is a need for keyword extraction and document clustering. This helps in organizing the publications of an Institution. The paper aims at giving an overview of various keyword extraction and document clustering algorithms. This study contributes to the area of text and document-based search engines. Also, the study helps to identify the domain of research documents and to categorize them based on their semantics. The paper aims in giving a comparative study of different keyword extraction algorithm and clustering algorithm by which one can model a new prototype for document-based search engines and categorizing the documents based on various mainstream research domains of Computer Science.

Full Text