Abstract
This research aims to analyze the patterns of data errors in order to fulfill the data required for household big data development at the sub-district level in Thailand. Feature Selection and Multi-Layer Perceptron Neural Network were applied, while the data imbalance was solved by the SMOTE method and the comparison between the CFS feature selection method and Information Gain (IG) feature selection method. Afterward, the datasets were classified the data errors by the Multi-Layer Perceptron Neural Network. Each model’s effectiveness was measured by the 10-fold cross-validation method. The research results revealed that the suitable data size after being adjusted data imbalanced was 400%. Once the data had been processed for developing the model, it was found that after being adjusted data size towards the application of the SMOTE, CFS feature selection technique, and classified data errors by the Multi-Layer Perceptron Neural Network, the model provided the highest level of effectiveness in data errors classification with an accuracy of 98.29 %. Moreover, the application could effectively classify data errors and display the household big data at the highest level. The application evaluation results given by the experts and the users had an average mean of 4.69 and higher, a standard deviation of 0.47 and lower, which has the level of effectiveness of 93.78% and higher, while interquartile range values not over 1, a quartile deviation of no more than 0.5.
Highlights
The development of big data in the field of health, economics, environment, activities, developments, and household demographics is crucial for community development
Community demographics are considered a big data prototype linked with the national big data system, facilitating the data processing cycle and reflecting the genuine problems embedded in the data
Dealing with data errors is challenging for big data, including missing data, incorrect input, typo error, inconsistent data, or violated attribute dependency
Summary
The development of big data in the field of health, economics, environment, activities, developments, and household demographics is crucial for community development This is because comprehensive and accurate data can demonstrate the community’s genuine problems and demands in which the governmental agencies or responsible figures such as village leaders, subdistrict administrators, local people themselves, researchers, and the business sector can take advantage to solve the problems. One of the most common problems while collecting community data is that the local people are hesitant to provide information. Even though both public and private sectors have tried to collect data from the local communities, local people rarely understand the overall picture because the analyzed data has not been accessible for the local people They are reluctant to provide further information. Governmental agencies, researchers, and the business sector can make use of this information for supporting and developing the communities in the future
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have