Scalable resource description framework clustering: A distributed approach for analyzing knowledge graphs using minHash locality sensitive hashing

Pratik Agarwal,Bam Bahadur Sinha

doi:10.1002/cpe.6966

Abstract

AbstractWeb is becoming rich in data. Some of the sources from where these data are originating includes Blogs, YouTube, Twitter, Emails, E‐commerce, Banking, sensors, and the Internet of Things. But these data are structured in a very poor fashion. The content of the web is becoming heterogeneous in nature both in terms of compendium and structure. It can be said that these data are human‐readable data but the main motive is to draw inferences from these data which is only possible if it can be made machine‐accessible. Clustering is considered an important task to organize these data and draw meaningful inferences from these data. In this paper, a clustering approach is proposed that can be applied to knowledge graphs and the possibility of applying Locality Sensitive Hashing is explored. Given the size of linked data, it is observed that this approach can be effective and scalable in comparison to other clustering approaches such as Hierarchical clustering, K‐Means clustering, and K‐Medoid clustering in discovering different communities that are defined by the link structure of the graph. The experimental results on different types of Linked Data sources justify the efficacy of the proposed model in terms of scalability and efficiency.

Full Text