Unstructured Document Information Extraction Method with Multi-Faceted Domain Knowledge Graph Assistance for M2M Customs Risk Prevention and Screening Application

Fengchun Tian,Yingcheng Lin,Zhenlong Wan,Ruilong Liu,Haochen Wang,Ran Liu,Di Lv

doi:10.3390/electronics13101941

Fengchun Tian, Yingcheng Lin + Show 5 more

Open Access

PDF Available

https://doi.org/10.3390/electronics13101941

Copy DOI

Export

Save

Cite

Journal: Electronics	Publication Date: May 15, 2024
License type: CC BY 4.0

Affiliation: Chongqing University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

As a crucial national security defense line, the existing risk prevention and screening system of customs falls short in terms of intelligence and diversity for risk identification factors. Hence, the urgent issues to be addressed in the risk identification system include intelligent extraction technology for key information from Customs Unstructured Accompanying Documents (CUADs) and the reliability of the extraction results. In the customs scenario, OCR is employed for M2M interactions, but current models have difficulty adapting to diverse image qualities and complex customs document content. We propose a hybrid mutual learning knowledge distillation (HMLKD) method for optimizing a pre-trained OCR model’s performance against such challenges. Additionally, current models lack effective incorporation of domain-specific knowledge, resulting in insufficient text recognition accuracy for practical customs risk identification. We propose a customs domain knowledge graph (CDKG) developed using CUAD knowledge and propose an integrated CDKG post-OCR correction method (iCDKG-PostOCR) based on CDKG. The results on real data demonstrate that the accuracies improve for code text fields to 97.70%, for character type fields to 96.55%, and for numerical type fields to 96.00%, with a confidence rate exceeding 99% for each. Furthermore, the Customs Health Certificate Extraction System (CHCES) developed using the proposed method has been implemented and verified at Tianjin Customs in China, where it has showcased outstanding operational performance.

Full Text