Abstract

The complexity of the Internet has significantly increased along with its expansions. In particular, social networking has resulted in user clusters pertaining to many communities at different levels. Studies of online communities are a growing area of research user preferences can be categorized including dangerous groups. This work uses pre-processing techniques including stemming, stop words removals, and tokenizations of data using unigram, bigram, and 1–3 g representations. TF-IDFs (Term Frequency-Inverse Document Frequencies) and Word Embeds are used in feature Extractions. Perspectives generate hierarchical data structures which are then combined using consensus matrices followed by computations of dissimilarities between observed sets. This work employs FS-HC (fuzzy similarity based Hierarchical Clustering) to generate dendrograms for views. Finally, consensus matrices are generated by integrating several hierarchical agglomerations using transitive consensus matrix generations. They contain representative data of generated dendrograms. Performance metrics including precisions, recalls, f-measures, accuracies, clustering coefficients, conductance, and contractions are used to evaluate outcomes of benchmarked clustering algorithms. The proposed approach attains higher accuracy value of about 92 % when compared to other existing algorithms. The required data for community discoveries in social media were gathered from Twitter.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call