Abstract

At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Therefore, in this paper, we construct a cigarette tasting dataset (CT2023) and propose a novel Chinese text classification method based on ERNIE and Comparative Learning for Low-Resource scenarios (ECLLR). Firstly, to address the issues of limited vocabulary diversity and sparse features in cigarette tasting text, we utilize Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms, supplementing the discriminative features of the original text. Secondly, ERNIE is employed to obtain sentence-level vector embedding of the text. Finally, contrastive learning model is used to further refine the text after fusing the keyword features, thereby enhancing the performance of the proposed text classification model. Experiments on the CT2023 dataset demonstrate an accuracy rate of 96.33% for the proposed method, surpassing the baseline model by at least 11 percentage points, and showing good text classification performance. It is thus clear that the proposed approach can effectively provide recommendations and decision support for cigarette production processes in tobacco companies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.