Abstract

Due to the rapid increase of different digitised documents, there has been significant attention dedicated to document image retrieval over the past two decades. Finding discriminative and effective features is a fundamental task for providing a fast and more accurate retrieval system. Texture features are generally fast to compute and are suitable for large volume data. Thus, in this study, the effectiveness of texture features widely used in the literature of content-based image retrieval is investigated on document images. Twenty-six different texture feature extraction methods from four main categories of texture features, statistical, transform, model, and structural-based approaches, are considered in this research work to compare their performance on the problem of document image retrieval. Three document image datasets, MTDB, ITESOFT, and CLEF_IP with various content and page layouts are used to evaluate the twenty-six texture-based features on document image retrieval systems. The retrieval results are computed in terms of precision, recall and F-score, and a comparative analysis of the results is also provided. Feature dimensions and time complexity of the texture-based feature methods are further compared. Finally, some conclusions are drawn and suggestions are made about future research directions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call