Abstract
The complexity of the Internet has significantly increased along with its expansions. In particular, social networking has resulted in user clusters pertaining to many communities at different levels. Studies of online communities are a growing area of research user preferences can be categorized including dangerous groups. This work uses pre-processing techniques including stemming, stop words removals, and tokenizations of data using unigram, bigram, and 1–3 g representations. TF-IDFs (Term Frequency-Inverse Document Frequencies) and Word Embeds are used in feature Extractions. Perspectives generate hierarchical data structures which are then combined using consensus matrices followed by computations of dissimilarities between observed sets. This work employs FS-HC (fuzzy similarity based Hierarchical Clustering) to generate dendrograms for views. Finally, consensus matrices are generated by integrating several hierarchical agglomerations using transitive consensus matrix generations. They contain representative data of generated dendrograms. Performance metrics including precisions, recalls, f-measures, accuracies, clustering coefficients, conductance, and contractions are used to evaluate outcomes of benchmarked clustering algorithms. The proposed approach attains higher accuracy value of about 92 % when compared to other existing algorithms. The required data for community discoveries in social media were gathered from Twitter.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.