Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts

Shunli Wang,Rui Li,Huayi Wu

doi:10.1016/j.cmpb.2023.107474

Abstract

Background and ObjectiveWith the rapid development of information dissemination technology, the amount of events information contained in massive texts now far exceeds the intuitive cognition of humans, and it is hard to understand the progress of events in order of time. Temporal information runs through the whole process of beginning, proceeding, and ending of events, and plays an important role in many natural language processing applications, such as information extraction, question answering, and text summary. Accurately extracting temporal information from Chinese texts and automatically mapping the temporal expressions in natural language to the time axis are crucial to understanding the development of events and dynamic changes in them. MethodsThis study proposes a method integrating machine learning with linguistic features (IMLLF) for extraction and normalization of temporal expressions in Chinese texts to achieve the above objectives. Linguistic features are constructed by analyzing the expression rules of temporal information, and are combined with machine learning to map the natural language form of time onto a one-dimensional timeline. The web text dataset we build is divided into five parts for five-fold cross-validation, to compare the influence of different combinations of linguistic features and different methods. In the open medical dialog dataset, based on the training model obtained from the web text dataset, 200 disease descriptions are randomly selected each time for three rounds of experiments. ResultsThe F1 of multi-feature fusion is 95.2%, which is better than the single-feature and double-feature combination. The results of experiments showed that the proposed IMLLF method can improve the accuracy of recognition of temporal information in Chinese to a greater extent than classical methods, with an F1-score of over 95% on the web text dataset and medical conversation dataset. In terms of the normalization of time expressions, the accuracy of the IMLLF method is higher than 93%. ConclusionsIMLLF has better results in extracting and normalizing time expressions on the web text dataset and the medical conversation dataset, which verifies the universality of IMLLF to identify and quantify temporal information. IMLLF method can accurately map the time information to the time axis, which is convenient for doctors to intuitively see when and what happened to the patient, and helps to make better medical decisions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts

Abstract

Talk to us

Similar Papers

More From: Computer Methods and Programs in Biomedicine

Lead the way for us

Journal: Computer Methods and Programs in Biomedicine	Publication Date: Mar 11, 2023
Citations: 3

Similar Papers

Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach.
Xiaoyi Pan ... Yongyi Gong
JMIR medical informatics | VOL. 8
Xiaoyi Pan, et. al.Xiaoyi Pan ... Yongyi Gong
27 Jul 2020
JMIR medical informatics | VOL. 8

Domain-sensitive Temporal Tagging for Event-centric Information Retrieval

-

01 Jan 2015
01 Jan 2015

Temponym Tagging
Erdal Kuzey ... Jannik Strötgen
-
Erdal Kuzey, et. al.Erdal Kuzey ... Jannik Strötgen
01 Jan 2015
01 Jan 2015

Multilingual and cross-domain temporal tagging
Jannik Strötgen ... Michael Gertz
Language Resources and Evaluation | VOL. 47
Jannik Strötgen, et. al.Jannik Strötgen ... Michael Gertz
08 May 2012
Language Resources and Evaluation | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts

Abstract

Talk to us

Similar Papers

More From: Computer Methods and Programs in Biomedicine