Abstract

Existing approaches for text clustering are either agglomerative, divisive or based on frequent itemsets. However, most of the suggested solutions do not take the semantic associations between words into account and documents are only regarded as bags of unrelated words. Indeed, traditional text clustering methods usually focus on the frequency of terms in documents to create connected homogenous clusters without considering associated semantic which will of course lead to inaccurate clustering results. Accordingly, this research aims to understand the meanings of text phrases in the process of clustering to make maximum usage and use of documents. The semantic web framework is filled with useful techniques enabling database use to be substantial. The goal is to exploit these techniques to the full usage of the Resource Description Framework (RDF) to represent textual data as triplets. To come up a more effective clustering method, we provide a semantic representation of the data in texts on which the clustering process would be based. On the other hand, this study opts to implement other techniques within the clustering process such as ontology representation to manipulate and extract meaningful information using RDF, RDF Schemas (RDFS), and Web Ontology Language (OWL). Since Text clustering is an indispensable task for better exploitation of documents, the use of documents may be more intelligently conducted while considering semantics in the process of text clustering to efficiently identify the more related groups in a document collection. To this end, the proposed framework combines multiple techniques to come up with an efficient approach combining machine learning tools with semantic web principles. The framework allows documents RDF representation, clustering, topic modeling, clusters summarizing, information retrieval based on RDF querying and Reasoning tools. It also highlights the advantages of using semantic web techniques in clustering, subject modeling and knowledge extraction based on processes of questioning, reasoning and inferencing.

Highlights

  • Existing approaches for text clustering are either interconnected chunks of information

  • Several methods consider the text as a dissociated bag of words, ignoring the semantics in texts. This is what we aim to tackle by handling unstructured textual data within the context of an Resource Description Framework (RDF) model in order to preserve the semantic relationships in the text, linking it to the important knowledge base in our case DBpedia and include a semantic similarity measure to compute the distance between RDF triples

  • This paper showed how semantic web techniques can be used for the textual documents clustering and exploration

Read more

Summary

Introduction

Existing approaches for text clustering are either interconnected chunks of information. Since Text clustering is an indispensable task for better exploitation of documents, the use of documents may be more intelligently conducted while considering semantics in the process of text clustering to efficiently identify the more related groups in a document collection To this end, the proposed framework combines multiple techniques to come up with an efficient manipulation based on a semantic web model needs to be explored and strongly highlighted. Representation, clustering, topic modeling, clusters summarizing, information retrieval based on RDF querying and Reasoning tools It highlights the advantages of using semantic web techniques in clustering, subject modeling and knowledge extraction based on processes of questioning, reasoning and inferencing. This work aims to use the semantic web approach for a semantic text clustering using graph-based representation model RDF with the respect of the linked data principles. We propose a system that is an integrated set of techniques in which the textual documents are transformed into an RDF graphs representation and divided into homogenous

Objectives
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.