Accurate classification of import and export goods’ Harmonized System (HS) codes is essential for ensuring tax security. Applying text classification technologies for HS code classification can significantly enhance the prevention and control of customs tax risks. However, the goods text is a semi-structured one that involves multi-domain Chinese professional vocabulary, which poses challenges for current classification models. These models often suffer from inadequate text representation and imprecise feature extraction. To address these challenges, we propose a novel classification model ERNIE-BiLSTM-Channel attention–Spatial attention (EBLCS). This model integrates ERNIE (Enhanced Representation through Knowledge Integration) with a Bidirectional Long Short-Term Memory Network (BiLSTM) and employs multi-scale attention mechanisms. The ERNIE-BiLSTM model provides a more comprehensive and accurate representation of the goods text, effectively capturing the global features of the text. By introducing channel attention and spatial attention mechanisms, greater weights are assigned to important words and word embedding dimensions, significantly enhancing the model’s ability to perceive key information. The experimental results on a customs dataset demonstrate that the EBLCS model consistently outperforms various baseline models across all evaluation metrics, effectively enhancing the performance of HS code classification.
Read full abstract