Abstract

This paper’s core objective is to develop and validate a new neurocomputing model to classify document images in particularly demanding hard conditions such as image distortions, image size variance and scale, a huge number of classes, etc. Document classification is a special machine vision task in which document images are categorized according to their likelihood. Document classification is by itself an important topic for the digital office and it has several usages. Additionally, different methods for solving this problem have been presented in various studies; their respectively reached performance is however not yet good enough. This task is very tough and challenging. Thus, a novel, more accurate and precise model is needed. Although the related works do reach acceptable accuracy values for less hard conditions, they generally fully fail in the face of those above-mentioned hard, real-world conditions, including, amongst others, distortions such as noise, blur, low contrast, and shadows. In this paper, a novel deep CNN model is developed, validated and benchmarked with a selection of the most relevant recent document classification models. Additionally, the model’s sensitivity was significantly improved by injecting different artifacts during the training process. In the benchmarking, it does clearly outperform all others by at least 4%, thus reaching more than 96% accuracy.

Highlights

  • Classifying images in general, including document images, is one of the most popular tasks in computer vision [1]

  • Today, the usage of Convolutional Neural Networks (CNN) can be found for solving various types of tasks related to data processing and classification such as sickness detection [18,19], image classification [20,21], street view image classification [22], remote sensing data classification [23], lidar image classification [24], data compression [25], and many other areas

  • That system performed reasonably even for limited training data; for example, we can mention Chen et al [41], which propose a method based on Scale-invariant feature transform (SIFT) descriptors to classify documents

Read more

Summary

Introduction

Classifying images in general, including document images, is one of the most popular tasks in computer vision [1]. During the very recent years, several studies have tried to provide robust models for classifying document images or other types of images in the presence of various artifacts Those studies can be grouped in two different approaches. Today, the usage of CNN can be found for solving various types of tasks related to data processing and classification such as sickness detection [18,19], image classification [20,21], street view image classification [22], remote sensing data classification [23], lidar image classification [24], data compression [25], and many other areas The architecture of this network is based, as its name suggests, on convolution.

Related Works
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call