Abstract
Food safety event detection is a technique used to discover food safety events by monitoring online news. In general, a set of keywords are extracted as features to represent news, and then the news is clustered to generate events. The most popular method for news feature extraction is Term Frequency-Inverse Document Frequency (TF-IDF), however, it has some defects such as being prone to the “dimension disaster”, low computational efficiency, and a lack of semantic information. In addition, Latent Dirichlet Allocation (LDA) is also widely used in news representation. Despite its low dimension, it still suffers from some drawbacks such as the need to set a predefined number of clusters and has difficulty recognizing new events. In this paper, a method based on multi-feature fusion is proposed, which combines the TF-IDF features, the named entity features, and the headline features to represent the news. Based on the representations, the incremental clustering method is used to cluster the news documents and to detect food safety events. Compared with the traditional methods, the proposed method achieved higher Precision, Recall, and F1 scores. The proposed method can help regulatory authorities to make decisions and improve the reputation of the government, whilst reducing social anxiety and economic losses.
Highlights
Topic Detection and Tracking (TDT) is an information processing technique for the information flow on news media [1], which can detect the appearance of new topics and track their reappearance and evolution [2], whilst helping people deal with the problem of the internet information explosion [3]
We proposed the concept of news “fusion feature”, which fuses multiple features together, including the Term Frequency-Inverse Document Frequency (TF-IDF) features, the named entity features, and the headline features
This paper proposes a food safety event detection method based on multi-feature fusion, and the process was as follows: (1) preprocessing the news data; (2) TF-IDF is used to calculate the weight of each word in the news document, the first M words with the largest weight of each news document are selected to form a feature words set W; (3) the named entities in the news document are document are selected to form a feature words set W; (3) the named entities in the news document are recognized by using the Bi-directional Long Short-Term Memory (Bi-LSTM)-Convolutional Neural Networks (CNN)-Conditional Random Field (CRF) framework [35] to form the set E, the joint recognized the Bi-LSTM-CNN-CRF
Summary
Topic Detection and Tracking (TDT) is an information processing technique for the information flow on news media [1], which can detect the appearance of new topics and track their reappearance and evolution [2], whilst helping people deal with the problem of the internet information explosion [3].Topic detection is a sub-task of TDT, which can help decision makers find meaningful topics or events in a timely manner [4] and has attracted a great deal of attention in many application areas, such as public opinion monitoring, emergency management, decision-making support systems, and online reputation monitoring [5,6,7,8]. Examples include the horsemeat scandal that occurred in Europe [10], rat meat that was found in famous snacks in Korea [11], and the melamine, Sudan red egg, the gutter oil scandals that occurred in China [12,13,14]. These events caused huge economic losses and brought anxiety to the public, and seriously undermined the reputation of the relevant governments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.