Abstract

In this article we present a web service framework providing automatic document processing methods to the public. Furthermore, an assessment environment and sample applications using this framework are briefly described. Research on Document Image Analysis (DIA) focuses mainly on developing and refining automatic processing steps, e.g. text line extraction, binarization, and layout analysis. While many state-of-the-art methods perform satisfactorily, the algorithms applied to obtain the results are not easily accessible for other researchers. Making the source code available is often not sufficient as it typically requires a cumbersome installation of required libraries and reading long manuals about the usage. We present a new approach for making methods available to researchers in the digital humanities without detailed knowledge of the algorithms. For our approach we propose a RESTful web service architecture, the current state of the art in online web communication. For a developer this reduces the steps needed to access a method to sending and receiving HTTP requests with Java Script Object Notification data, removing all installation steps. We will build on standards such as the Text Encoding Initiative and the International Image Interoperability Framework. Thus, methods hosted on DivaServices can be integrated easily into document processing workflows by any software engineer in computer science, but also the digital humanities without specific knowledge of the mathematical and algorithmic details of DIA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call