Feature Selection for Analyzing Data Errors Toward Development of Household Big Data at the Sub-District Level Using Multi-Layer Perceptron Neural Network

Sumitra Nuanmeesri

doi:10.3991/ijim.v16i05.22523

Abstract

This research aims to analyze the patterns of data errors in order to fulfill the data required for household big data development at the sub-district level in Thailand. Feature Selection and Multi-Layer Perceptron Neural Network were applied, while the data imbalance was solved by the SMOTE method and the comparison between the CFS feature selection method and Information Gain (IG) feature selection method. Afterward, the datasets were classified the data errors by the Multi-Layer Perceptron Neural Network. Each model’s effectiveness was measured by the 10-fold cross-validation method. The research results revealed that the suitable data size after being adjusted data imbalanced was 400%. Once the data had been processed for developing the model, it was found that after being adjusted data size towards the application of the SMOTE, CFS feature selection technique, and classified data errors by the Multi-Layer Perceptron Neural Network, the model provided the highest level of effectiveness in data errors classification with an accuracy of 98.29 %. Moreover, the application could effectively classify data errors and display the household big data at the highest level. The application evaluation results given by the experts and the users had an average mean of 4.69 and higher, a standard deviation of 0.47 and lower, which has the level of effectiveness of 93.78% and higher, while interquartile range values not over 1, a quartile deviation of no more than 0.5.

Highlights

The development of big data in the field of health, economics, environment, activities, developments, and household demographics is crucial for community development
Community demographics are considered a big data prototype linked with the national big data system, facilitating the data processing cycle and reflecting the genuine problems embedded in the data
Dealing with data errors is challenging for big data, including missing data, incorrect input, typo error, inconsistent data, or violated attribute dependency

Summary

Introduction

The development of big data in the field of health, economics, environment, activities, developments, and household demographics is crucial for community development This is because comprehensive and accurate data can demonstrate the community’s genuine problems and demands in which the governmental agencies or responsible figures such as village leaders, subdistrict administrators, local people themselves, researchers, and the business sector can take advantage to solve the problems. One of the most common problems while collecting community data is that the local people are hesitant to provide information. Even though both public and private sectors have tried to collect data from the local communities, local people rarely understand the overall picture because the analyzed data has not been accessible for the local people They are reluctant to provide further information. Governmental agencies, researchers, and the business sector can make use of this information for supporting and developing the communities in the future

Synthetic minority over-sampling technique

Feature selection

Multi-layer perceptron neural network

Literature review

Methodology

Data preprocessing

Handling imbalanced data by SMOTE

Feature selection by CFS and IG

Model creation by multi-layer perceptron neural network

Effectiveness evaluation of the model

Development and deployment of the application

Research results

Effectiveness evaluation results of the application

Conclusion

Findings

Author

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature Selection for Analyzing Data Errors Toward Development of Household Big Data at the Sub-District Level Using Multi-Layer Perceptron Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Mobile Technologies (iJIM)

Lead the way for us

Journal: International Journal of Interactive Mobile Technologies (iJIM)	Publication Date: Mar 8, 2022
License type: CC BY 4.0

Similar Papers

Information gain and divergence-based feature selection for machine learning-based text categorization
Changki Lee ... Gary Geunbae Lee
Information Processing and Management | VOL. 42
Changki Lee, et. al.Changki Lee ... Gary Geunbae Lee
03 Aug 2005
Information Processing and Management | VOL. 42

Thai Water Buffalo Disease Analysis with the Application of Feature Selection Technique and Multi-Layer Perceptron Neural Network
S Nuanmeesri ... W Sriurai
Engineering, Technology & Applied Science Research | VOL. 11
S Nuanmeesri, et. al.S Nuanmeesri ... W Sriurai
11 Apr 2021
Engineering, Technology & Applied Science Research | VOL. 11

Improving the Avoidant Personality Disorder Prediction for Higher Education Using SMOTE-ENN and Multi-Layer Perceptron Neural Network
Sumitra Nuanmeesri ... Lap Poomhiran
TEM Journal | VOL. -
Sumitra Nuanmeesri, et. al.Sumitra Nuanmeesri ... Lap Poomhiran
29 May 2023
TEM Journal | VOL. -

Complexity Analysis of Multilayer Perceptron Neural Network Embedded into a Wireless Sensor Network
Gursel Serpen ... Zhenning Gao
Procedia computer science | VOL. 36
Gursel Serpen, et. al.Gursel Serpen ... Zhenning Gao
01 Jan 2014
Procedia computer science | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Selection for Analyzing Data Errors Toward Development of Household Big Data at the Sub-District Level Using Multi-Layer Perceptron Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Mobile Technologies (iJIM)