Abstract

Due to the advancement of Web 2.0 technologies, a large volume of Web opinions is available on social media sites such as Web forums and Weblogs. These technologies provide a platform for Internet users around the world to communicate with each other and express their opinions. Analysis of developing Web opinions is potentially valuable for discovering ongoing topics of interests of the public like terrorist and crime detection, understanding how topics evolve together with the underlying social interaction between participants, and identifying important participants who have great influence in various topics of discussions. Nonetheless, the work of analyzing and clustering Web opinions is extremely challenging. Unlike regular documents, Web opinions are short and sparse text messages with noisy content. Typical document clustering techniques with the goal of clustering all documents applied to Web opinions produce unsatisfactory performance. In this paper, we investigated the density-based clustering algorithm and proposed the scalable distance-based clustering technique for Web opinion clustering. We conducted experiments and benchmarked with the density-based algorithm to show that the new algorithm obtains higher microaccuracy and macroaccuracy. This Web opinion clustering technique enables the identification of themes within discussions in Web social networks and their development, as well as the interactions of active participants. We also developed interactive visualization tools, which make use of the identified topic clusters to display social network development, the network topology similarity between topics, and the similarity values between participants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.