Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

Liang-Ching Chen,Kuei-Hu Chang,Shu-Ching Yang

doi:10.4025/actascitechnol.v44i1.60486

Liang-Ching Chen, Kuei-Hu Chang + Show 1 more

Open Access

https://doi.org/10.4025/actascitechnol.v44i1.60486

Copy DOI

Abstract

Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Acta Scientiarum. Technology	Publication Date: Jul 28, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

Abstract

Talk to us

Similar Papers

More From: Acta Scientiarum. Technology

Lead the way for us

Similar Papers

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study.
Amy Y X Yu ... Zhongyu A Liu
JMIR medical informatics | VOL. 9
Amy Y X Yu, et. al.Amy Y X Yu ... Zhongyu A Liu
04 May 2021
JMIR medical informatics | VOL. 9

Portability of natural language processing methods to detect suicidality from clinical text in US and UK electronic health records.
Marika Cusick ... Jyotishman Pathak
Journal of affective disorders reports | VOL. 10
Marika Cusick, et. al.Marika Cusick ... Jyotishman Pathak
01 Dec 2022
Journal of affective disorders reports | VOL. 10

NLP Applications for Big Data Analytics Within Healthcare
Aadarsh Choudhary ... Shubham Suman
-
Aadarsh Choudhary, et. al.Aadarsh Choudhary ... Shubham Suman
01 Jan 2021
01 Jan 2021

Identifying Suicidal Adolescents from Mental Health Records Using Natural Language Processing.
Sumithra Velupillai ... Johnny Downs
Studies in health technology and informatics | VOL. 264
Sumithra Velupillai, et. al.Sumithra Velupillai ... Johnny Downs
15 May 2019
Studies in health technology and informatics | VOL. 264

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

Abstract

Talk to us

Similar Papers

More From: Acta Scientiarum. Technology