Abstract

<span lang="EN-GB">Binarization</span><span lang="EN-GB"> of historical documents nowadays is very important as digital archiving has become the best and preferred solution for the retrieval and storage of valuable archives. However, the process becomes more challenging due to the degradation of historical documents. Hence, this paper described a method on binarization of historical documents using the learning concept. Support vector machine (SVM) learning was used as a classifier in this work. After training some images with the help of ground truth images, a model was developed. Testing images then used the model to segregate each pixel as text or non-text. The grey level and RGB values were chosen as descriptors for a particular pixel and comparisons were made between these two descriptors. The intensities of the local neighbourhood for every pixel were used in the experiment. To compare these descriptors, standard dataset HDIBCO2014, DIBCO2012 and DIBCO2016 were used in the training and testing phase. The results from the experiment clearly showed that grey level values gave better performance compared to RGB values.</span>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.