Abstract

In order to improve the browsing activity in a documentary database, we propose a conceptual approach for multi-level restructuring of categorized documents in a corpus. Starting from a manual and static organized corpus, based on the domain ontology, we derive new dynamically generated structures embedded in the static one. We use a conceptual recursive indexing method based on the selection of the minimal number of concepts covering either a document or a subset of documents corresponding to a sub-corpus. Hence, our system provides an additional browsing feature to the user, by dynamically providing the system with a conceptual structure of clusters of documents. For illustration, you may find in the figure an application to Arabic financial news for a particular ontology. Therefore, one finds sub-category under the category . Also, under the category, etc. In parallel with the classical browser system, indexing words, provided for each level, give the user more details about the file's content, as well as the category content, before further exploration. Our approach improves human-computer interaction by decreasing the browsing time. Assessment of the proposed method proves that combining manual documents categorizations, with the automatic feature generations, gives a flexible and effective structured browsing interface to the users. Finally, low-level features help for incrementally placing new documents in the right category, by using suitable supervised classification methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.