Abstract
Textual documents are growing rapidly through the internet in today’s modern technology era. Electronic structured databases archive offline and online documents, e-mails, webpages, blog and social network posts. Without appropriate ranking and demand clustering when there is classification without any specifics, it is quite difficult to retain and access these documents. K-means is one of the methods that is frequently used for clustering. In terms of determining the proximity of meaning or semantics between data, the distance-based K-means method still has flaws. To get around this issue, semantic similarity can be estimated by measuring the level of similarity between objects in a cluster. This research provides a method for clustering documents based on semantic similarity. The approach is carried out by defining document synopses from the IMDB and Wikipedia databases using the NLTK dictionary, and we provide a semantic-based K-means clustering approach that assesses not only the similarity of the data represented as a vector space model with TFIDF, but also the semantic similarity of the data Precision, recall, and F-measure, we demonstrate how well the semantic-based K-means clustering technique works using experimental findings from the IMDB and Wikipedia top 100 movies datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.