Abstract
Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.