Abstract
Blogs are among the fastest growing space among the user generated content over the internet. It is fast becoming the tool for information dissemination, and communication. Blogs provide a platform for information sharing, discussions, and expression of reader‘s reactions to the blog post. Clustering of blogs greatly simplify blog searching and browsing by organizing them into similar groups. The Blogs are generally organized using tags. In this paper, we have studied the effect of considering other relevant neighborhood contexts and adding the extracted information to the original tag set carried by the blog. The added semantics is extracted by disambiguating all the synsets for the important terms/ or key phrases within the blog. This work reports the study of measuring similarity, on enhanced blog features and subsequently grouping of all blog articles based on the semantics of the tags they carry. We propose to include the semantics extracted from the title, body, and comments of a blog post to its original tagset in clustering blog documents and evaluate the hypothesis that adding extracted semantics from these blog constituents improves the cluster quality. For clustering k-means algorithm is used. The experimental results obtained confirm our hypothesis that adding the semantics improves better clusters. The approach first extracts the relevant features from the target blog corpus, title and comments. The other senses represented by the relevant keywords are discovered by using a general purpose semantics extractor. All the synsets of the relevant keywords are extracted from the WORDNET. The extracted keyword senses are then appended to the base tagsets. A semantic similarity measure is used for computing the semantic similarity among the documents. Clusters are obtained based on it. The two clusters output are compared. General Terms Clustering, blog mining, blog, text mining, semantic similarity
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.