Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

Chanjun Park,Jaehyung Seo,Hyeonseok Moon,Seolhwa Lee,Sugyeong Eo,Heuiseok Lim,Midan Shim

doi:10.3390/app12115545

Chanjun Park, Jaehyung Seo + Show 5 more

Open Access

https://doi.org/10.3390/app12115545

Copy DOI

Abstract

The machine translation system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation. One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 30, 2022
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Creating and Testing Specialized Dictionaries for Text Analysis
Роман Тарабань ... Талін Налбандян
East European Journal of Psycholinguistics | VOL. 6
Роман Тарабань, et. al.Роман Тарабань ... Талін Налбандян
30 Jun 2019
East European Journal of Psycholinguistics | VOL. 6

Identification of Emotional Expression With Cancer Survivors: Validation of Linguistic Inquiry and Word Count.
Michelle Mcdonnell ... Erin O'Carroll Bantum
JMIR Formative Research | VOL. 4
Michelle Mcdonnell, et. al.Michelle Mcdonnell ... Erin O'Carroll Bantum
30 Oct 2020
JMIR Formative Research | VOL. 4

Linguistic Inquiry and Word Count (LIWC)
Cindy K Chung ... James W Pennebaker
-
Cindy K Chung, et. al.Cindy K Chung ... James W Pennebaker
01 Jan 2012
01 Jan 2012

On Learning Psycholinguistics Tools for English-based Creole Languages using Social Media Data
Lo Pei-Chi ... Lim Ee-Peng
-
Lo Pei-Chi, et. al.Lo Pei-Chi ... Lim Ee-Peng
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

Abstract

Talk to us

Similar Papers

More From: Applied Sciences