Validation of Text Data Preprocessing Using a Neural Network Model

Hosung Woo,Wongyu Lee,Jamee Kim

doi:10.1155/2020/1958149

Abstract

Many artificial intelligence studies focus on designing new neural network models or optimizing hyperparameters to improve model accuracy. To develop a reliable model, appropriate data are required, and data preprocessing is an essential part of acquiring the data. Although various studies regard data preprocessing as part of the data exploration process, those studies lack awareness about the need for separate technologies and solutions for preprocessing. Therefore, this study evaluated combinations of preprocessing types in a text-processing neural network model. Better performance was observed when two preprocessing types were used than when three or more preprocessing types were used for data purification. More specifically, using lemmatization and punctuation splitting together, lemmatization and lowering together, and lowering and punctuation splitting together showed positive effects on accuracy. This study is significant because the results allow better decisions to be made about the selection of the preprocessing types in various research fields, including neural network research.

Highlights

Attempts have been made to increase work efficiency through studies using similarities between sentences
Existing data preprocessing studies have been mainly conducted in the field of data mining. ere have been studies that process web data to format them into an analytical form. ese studies did not explain the effect of data preprocessing on the algorithm as a method included in the process of preparing data for analysis [17,18,19]. ere is a study that analyzed the effect of data preprocessing on predictive ability, limited to numerical data in neural network models [4, 20]
Is study analyzed the effect of preprocessing through text data preprocessing of sentence models

Summary

Introduction

Attempts have been made to increase work efficiency through studies using similarities between sentences. Studying the similarity between sentences requires a deep understanding of the semantic and structural information of the language. Erefore, attempts have been made to learn a language model that computes probability distributions without extracting features. A method has been proposed that combines a word-embedding method, in which information about the meaning or structure of a word is expressed in terms of a real-time multidimensional vector and a deep belief network structure that uses a prelearning method [3]. To improve the prediction accuracy of a high-performance neural-network-based sentence model or a naturallanguage-based study, confidence in the data should be the highest priority. Erefore, it is necessary to investigate the data preprocessing features that should be selected for machine learning [4] as well as the effects of various preprocessing tasks on the performance of classification models [5,6,7] Data for research studies should be processed through a filtering step, in which the researcher himself conducts the preprocessing. erefore, it is necessary to investigate the data preprocessing features that should be selected for machine learning [4] as well as the effects of various preprocessing tasks on the performance of classification models [5,6,7]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Problems in Engineering	Publication Date: May 14, 2020
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Validation of Text Data Preprocessing Using a Neural Network Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering

Lead the way for us

Similar Papers

Study of broiler chicken responses to dietary protein and lysine using neural network and response surface models
A Faridi ... A Heravi Mousavi
British Poultry Science | VOL. 54
A Faridi, et. al.A Faridi ... A Heravi Mousavi
01 Aug 2013
British Poultry Science | VOL. 54

Synthetic event-related potentials: A computational bridge between neurolinguistic models and experiments
Victor Barrès ... Michael Arbib
Neural Networks | VOL. 37
Victor Barrès, et. al.Victor Barrès ... Michael Arbib
17 Oct 2012
Neural Networks | VOL. 37

Fault Diagnosis Method of Three-phase Inverter Based on Time Convolutional Neural Network
Gang Li ... Qun Guo
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 15
Gang Li, et. al.Gang Li ... Qun Guo
01 Aug 2022
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 15

Prediction of the waste stabilization pond performance using linear multiple regression and multi-layer perceptron neural network: a case study of Birjand, Iran
...
-
, et. al. ...
10 Jun 2016
10 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Validation of Text Data Preprocessing Using a Neural Network Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering