Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance

Sarah Valentin,Renaud Lancelot,Mathieu Roche

doi:10.1016/j.inpa.2022.03.004

Abstract

Several internet-based surveillance systems have been created to monitor the web for animal health surveillance. These systems collect a large amount of news dealing with outbreaks related to animal diseases. Automatically identifying news articles that describe the same outbreak event is a key step to quickly detect relevant epidemiological information while alleviating manual curation of news content. This paper addresses the task of retrieving news articles that are related in epidemiological terms. We tackle this issue using text mining and feature fusion methods. The main objective of this paper is to identify a textual representation in which two articles that share the same epidemiological content are close. We compared two types of representations (i.e., features) to represent the documents: (i) morphosyntactic features (i.e., selection and transformation of all terms from the news, based on classical textual processing steps) and (ii) lexicosemantic features (i.e., selection, transformation and fusion of epidemiological terms including diseases, hosts, locations and dates). We compared two types of term weighing (i.e., Boolean and TF-IDF) for both representations. To combine and transform lexicosemantic features, we compared two data fusion techniques (i.e., early fusion and late fusion) and the effect of features generalisation, while evaluating the relative importance of each type of feature. We conducted our analysis using a corpus composed of a subset of news articles in English related to animal disease outbreaks. Our results showed that the combination of relevant lexicosemantic (epidemiological) features using fusion methods improves classical morphosyntactic representation in the context of disease-related news retrieval. The lexicosemantic representation based on TF-IDF and feature generalisation (F-measure = 0.92, r-precision = 0.58) outperformed the morphosyntactic representation (F-measure = 0.89, r-precision = 0.45), while reducing the features space. Converting the features into lower granular features (i.e., generalisation) contributed to improving the results of the lexicosemantic representation. Our results showed no difference between the early and late fusion approaches. Temporal features performed poorly on their own. Conversely, spatial features were the most discriminative features, highlighting the need for robust methods for spatial entity extraction, disambiguation and representation in internet-based surveillance systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Processing in Agriculture	Publication Date: Mar 28, 2022
Citations: 2	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance

Abstract

Talk to us

Similar Papers

More From: Information Processing in Agriculture

Lead the way for us

Similar Papers

Comparison of early and late fusion techniques for movie trailer genre labelling
J.H Mervitz ... M.H.O Kloppers
-
J.H Mervitz, et. al.J.H Mervitz ... M.H.O Kloppers
01 Jul 2020
01 Jul 2020

Clinical and health-related quality-of-life outcomes after early and late spinal fusion in pediatric patients with congenital scoliosis at 10-year follow-up.
Noriaki Kawakami ... Kazuki Kawakami
Spine deformity | VOL. 9
Noriaki Kawakami, et. al.Noriaki Kawakami ... Kazuki Kawakami
23 Nov 2020
Spine deformity | VOL. 9

Interpretability of a Deep Learning-Based Prediction Model for Mandibular Osteoradionecrosis.
L Humbert-Vidan ... A.P King
International Journal of Radiation Oncology*Biology*Physics | VOL. 117
L Humbert-Vidan, et. al.L Humbert-Vidan ... A.P King
01 Oct 2023
International Journal of Radiation Oncology*Biology*Physics | VOL. 117

Automatic Emotion Recognition Using Temporal Multimodal Deep Learning
Bahareh Nakisa ... Vinod Chandran
IEEE Access | VOL. 8
Bahareh Nakisa, et. al.Bahareh Nakisa ... Vinod Chandran
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance

Abstract

Talk to us

Similar Papers

More From: Information Processing in Agriculture