In the results of the election polls, the candidates' approval ratings are not only a major concern of the people, but also an important factor in establishing the election strategy of the party concerned. However, if there is an inaccurate or distorted result of the election poll, it can cause a big problem because of its influence, and the controversy about the poll often arises. There are many cases in which the actual election results are different from the results of the polls, and political parties and research institutes, which are related to the polls, are making efforts to improve the accuracy of the election forecasts. The purpose of this study is to improve the accuracy of election prediction by replacing non-response through machine learning algorithm using the survey data of the 20th Presidential Election, the 2021 Busan Mayor By-election, and the 8th National Simultaneous Local Election (Busan Metropolitan City, Nam-gu, Gimhae Mayor). For this purpose, first, the data excluding the non-response (I do not know/don't know) was separated into train data and the data selected as non-response (I do not know/don't know) was separated into test data. Second, Random Forest, XGBoost, and LightGBM methods were applied, and only the top three questions were used by deriving the importance of questions when predicting candidates through train data. Third, through the model learned based on the train data, the non-response of the test data was replaced with the response of one candidate, and the corresponding data and the train data were combined to calculate the percentage of votes for each candidate. As a result of comparing the actual election results with the percentage of votes for each candidate by replacing the non-response, it was confirmed that the Random Forest method is suitable for the model for improving the accuracy of election prediction through non-response substitution.
Read full abstract