In light of the significance of regulatory authorities and the rising demand for information disclosure, a vast amount of information on food safety news reports is readily accessible on the Internet. The extraction of such information for precise classification and provision of appropriate safety alerts based on their respective categories has emerged as a challenging problem for academic research. Given that most food safety-related events in news reports comprise lengthy text, the pre-trained language models currently employed for text analysis are generally limited in their capability to handle long documents. This paper proposes a long-text classification model utilising hierarchical Transformers. We categorise information in long documents into two distinct types: (1) multiple text chunks meeting the length constraint and (2) essential sentences within long documents, such as headings, paragraph start and end sentences, etc. Initially, our proposed model utilises the text chunks as input to the BERT model. Then, it concatenates the output of the BERT model with the important sentences from the document and use them as input to the Transformer model for feature transformation. Finally, we utilise a classifier for food safety news classification. We conducted several comparative experiments with various commonly used text classification models on a dataset constructed from publicly available information on food regulatory websites. Our proposed method outperforms existing methods, establishing itself as the leading approach in terms of performance.
Read full abstract