This research paper presents a classification model for predicting the region to which a country belongs based on its sustainability scores and other related features. The dataset used in this study comprises comprehensive data on sustainability and progress towards achieving the Sustainable Development Goals (SDGs) for various countries. The primary objective is to understand regional trends in sustainability and assess countries' progress in sustainable development. The research begins with data preparation and preprocessing steps, including merging datasets, handling missing values, and standardizing features. Exploratory data analysis is performed to visualize the distribution of the target variable (region) and the distributions of numeric features related to SDG scores. Additionally, relationships between these features are explored using correlation matrices and pair plots. Several machine learning models are employed to classify countries into their respective regions. The models used include Random Forest, Support Vector Machine (SVM) with a linear kernel, K-Nearest Neighbors (KNN), Logistic Regression, Decision Tree, and SVM with a radial basis function (RBF) kernel. Each model is trained on the dataset, and their performance is evaluated in terms of accuracy, precision, recall, and F1-score. The results demonstrate the effectiveness of these models in accurately classifying countries into regions based on sustainability scores and other attributes. The findings reveal that Random Forest, K-Nearest Neighbors, Decision Tree, and SVM with RBF kernel achieve exceptionally high accuracy, suggesting their suitability for regional classification based on sustainability metrics. Logistic Regression and SVM with a linear kernel also provide competitive results. In conclusion, this research contributes to understanding regional trends in sustainability by utilizing machine learning models to predict the regions of countries based on their sustainability scores and associated features. Such predictive models can be valuable tools for policymakers and organizations seeking to assess and address regional disparities in sustainable development progress.
Read full abstract