Automatic recognition system for document digitization in nuclear power plants

Elisa Ou,Minhee Kim,Po-Ling Loh,Todd Allen,Robert Agasie,Kaibo Liu

doi:10.1016/j.nucengdes.2022.111975

Abstract

With the increasing number of data-driven models in nuclear applications, large volumes of numerical data are requiblack to accurately model and pblackict the health status of a plant component. However, many historical operation logs that contain useful information are not fully utilized due to the lack of a systematic approach of digitization. To overcome this issue, this study proposes an automatic pipeline for extracting information from handwritten tabular documents collected from nuclear power plants. In our pipeline, we first denoise scanned documents with morphological operations, and then extract relevant parts from individual pages using both traditional computer vision and neural network methods. Handwriting recognition is applied to obtain text and numbers. As the most challenging step is how to crop only relevant information, the main focus of our paper is to detect tables and cells from scanned handwritten documents. We evaluate the efficiency and accuracy of our proposed method on handwritten operational reports obtained from a real-world case study. The results demonstrate the high accuracy and practicality of our proposed method.

Full Text