Abstract

The works that used graphs to represent documents has referred to the richness of these expressive tools. However, the exploited graph theory could be of great interest concerning the evaluation of similarity between these documents, both in documentary classification and the information retrieval. In structural classification of the documents, object of this work, the similarity measure is a crucial step. In many applications, this step results in a subgraph isomorphism problem. This problem is known in graph theory by a combinatorial explosion. To get around this problem, we propose to consider a graph as a set of paths that compose it. The matching, paths allows reducing the combinatorial cost. We propose a structural measure based on the sub-graph isomorphism and we discuss the quality of our classifier, especially the separation of classes. We’d like to show that our measure is structural, not a “surface measure” and evaluate our approach on a corpus of multimedia documents extracted, randomly, from the INEX 2007 corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call