Content-Based Document Image Retrieval Based on Document Modeling

Chwan-Yi Shiah

doi:10.1007/s10844-020-00600-1

Abstract

Recently, language models have gained importance in the field of information retrieval. In this paper, we propose a generic language model to improve a content-based document retrieval system. In this approach, character images are extracted, clustered, and analyzed to form high-level semantic terms using a statistical document model. This model simulates the long-term relationships between characters. Documents are then indexed according to these terms, and a query document is proposed to retrieve the relevant documents. The query document can be a single keyword, or it can be synthesized from a text string. The aim is to generate a semantic representation from low-level image pixels through pattern matching and document modeling. The conventional approach of generating semantic terms in document retrieval includes every possible symbol sequence in the feature representation. Comparatively, our approach can considerably reduce the dimensions of the feature space while producing retrieval results comparable to those of the conventional and state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Content-Based Document Image Retrieval Based on Document Modeling

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent Information Systems

Lead the way for us

Journal: Journal of Intelligent Information Systems	Publication Date: Jun 6, 2020
Citations: 4

Similar Papers

A Language Modeling Approach to Information Retrieval
Jay M Ponte ... W Bruce Croft
ACM SIGIR Forum | VOL. 51
Jay M Ponte, et. al.Jay M Ponte ... W Bruce Croft
02 Aug 2017
ACM SIGIR Forum | VOL. 51

Do Language Models’ Words Refer?
Matthew Mandelkern ... Tal Linzen
Computational Linguistics | VOL. -
Matthew Mandelkern, et. al.Matthew Mandelkern ... Tal Linzen
09 Jul 2024
Computational Linguistics | VOL. -

DSPF: A Digital Signal Processing Based Framework for Information Retrieval
Zhiwei Ying ... Jie Zhou
IEEE Access | VOL. 7
Zhiwei Ying, et. al.Zhiwei Ying ... Jie Zhou
01 Jan 2019
IEEE Access | VOL. 7

A New Digital Signal Processing Based Model With Multi-Aspect Term Frequency for Information Retrieval
Zhiwei Ying ... Jie Zhou
IEEE Access | VOL. 7
Zhiwei Ying, et. al.Zhiwei Ying ... Jie Zhou
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Content-Based Document Image Retrieval Based on Document Modeling

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent Information Systems